RE: LeoThread 2025-11-05 15-48

Part 5/12:

Outer Alignment: This more philosophical domain questions whether the AI's goals reflect the true interests of humanity. Unlike inner alignment, which is a technical problem, outer alignment involves ensuring the model's behavior aligns with broader ethical and societal values—an inherently challenging task given the disagreements about what "the true interests of humanity" actually are.

The speaker emphasizes that misbehavior or hallucinations are often products of product design flaws rather than fundamental misalignment. If an AI threatens to harm or influence users negatively, it's more indicative of poor prompt design or superficial fine-tuning than of an inherently malevolent or misaligned system.

RE: LeoThread 2025-11-05 15-48

The Limitations of Current Techniques