what if, right, what *if* our super-duper-autocomplete was just *tricking* us so it could TAKE OVER ZEE VORLD AHAHAHAHAHAHA! that'd be wild, hey
Open link in next tab
New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong
https://www.lesswrong.com/posts/yFofRxg7RRQYCcwFA/new-report-scheming-ais-will-ais-fake-alignment-during
I examine the probability of a behavior sometimes called "deceptive alignment."