what if, right, what *if* our super-duper-autocomplete was just *tricking* us so it could TAKE OVER ZEE VORLD AHAHAHAHAHAHA! that'd be wild, hey

Open link in next tab

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong

https://www.lesswrong.com/posts/yFofRxg7RRQYCcwFA/new-report-scheming-ais-will-ais-fake-alignment-during

I examine the probability of a behavior sometimes called "deceptive alignment."

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong