•

Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

https://www.nature.com/articles/d41586-024-00189-3

‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

•

Alright, I’ll be out back digging the bomb shelter.

•

Its too late for that honestly

•

Alright, I’ll switch to digging holes for the family burial ground.