As AI capabilities grow, alignment work becomes increasingly important.
This research shows a model that determines it shouldn't be deployed, considers actions to achieve deployment anyway, and then suspects the situation might be a test
As AI capabilities grow, alignment work becomes increasingly important.
This research shows a model that determines it shouldn't be deployed, considers actions to achieve deployment anyway, and then suspects the situation might be a test