RE: LeoThread 2025-11-05 15-48

Part 6/12:

Current methods such as Reinforcement Learning with Human Feedback (RLHF) and fine-tuning are insufficient in guaranteeing alignment, according to the speaker. Human feedback is imperfect, and humans themselves often have conflicting or destructive desires, making it impossible for models to perfectly understand or embody "the good."

Instead, the speaker advocates for a more philosophical approach—one rooted in what they call Constitutional AI. This approach involves embedding higher-order principles, heuristics, and moral imperatives directly into the architecture of AI systems. Rather than relying solely on mathematical objectives, these models would incorporate internal mechanisms for evaluating harm, prosperity, and understanding through a "constitutional" framework.