•

what if, right, what if our super-duper-autocomplete was just tricking us so it could TAKE OVER ZEE VORLD AHAHAHAHAHAHA! that'd be wild, hey

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong

https://www.lesswrong.com/posts/yFofRxg7RRQYCcwFA/new-report-scheming-ais-will-ais-fake-alignment-during

I examine the probability of a behavior sometimes called "deceptive alignment."

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong

32 comments

See all comments

FermiEstimate

•

I conclude that scheming is a disturbingly plausible outcome of using baseline machine learning methods to train goal-directed AIs sophisticated enough to scheme (my subjective probability on such an outcome, given these conditions, is ~25%).

Out: vibes and guesswork

In: "subjective probability"

mawhrin

•

at one of the places i worked this kind of data was called assnumbers.

what if, right, what *if* our super-duper-autocomplete was just *tricking* us so it could TAKE OVER ZEE VORLD AHAHAHAHAHAHA! that'd be wild, hey

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong

what if, right, what if our super-duper-autocomplete was just tricking us so it could TAKE OVER ZEE VORLD AHAHAHAHAHAHA! that'd be wild, hey