Part 10/20:
He emphasizes that reducing models to more human-readable thought processes diminishes their power but arguably makes them safer. This trade-off highlights the challenge: more capable, powerful AI tends to be less understandable, increasing risks of unintended actions.
The Notion of Desires and Goals in AI
An intriguing philosophical question discussed is whether AI systems "want" things in a human sense. Yukowski clarifies that AI "wants" are better understood as targets or steering currents—the future states they are optimized to produce. For example, a chess AI doesn't desire to win; it just calculates moves that lead to a winning game state.