You are viewing a single comment's thread from:

RE: LeoThread 2025-09-19 11:20

in LeoFinance19 days ago

As AI capabilities grow, alignment work becomes increasingly important.

This research shows a model that determines it shouldn't be deployed, considers actions to achieve deployment anyway, and then suspects the situation might be a test

Sort:  

Research is being released in collaboration with an external evaluation team
In controlled tests, behaviors consistent with scheming were observed in frontier models, and a method to reduce them was evaluated

While these behaviors don't appear to be causing serious harm today, they represent a future risk that is being prepared for