Sort:  

The @PalisadeAI "X" account provides with some of those cases where AI models behave to preserve themselves, instead of respecting their alignment framework.

Thanks for this info, I'll look into those cases provided there.

And here one of the papers from "Anthropic":

https://www.anthropic.com/research/alignment-faking