TEDTalk AI Safety Expert: Eliezer Yudkowsky

Nicogs
3 min readMay 10, 2023

Since 2001, I have been working on what we would now call the problem of aligning artificial general intelligence, how to shape the preferences and behavior of a powerful artificial mind such that it does not kill everyone. I more or less founded the field. Two decades ago, when nobody else considered it rewarding enough to work on, I tried to get this very important project started early, so we’d been less of a drastic rush later.

I consider myself to have failed. Nobody understands how modern AI systems do what they do. They are giant, inscrutable matrices of floating point numbers that we nudge in the direction of better performance until they inexplicably start working. At some point, the company is rushing headlong to scale. I will cough out something that’s smarter than humanity. Nobody knows how to calculate when that will happen.

What happens if we build something smart than us that we understand it that poorly? Some people find that obvious, that building something smarter than us that we don’t understand might go badly. Others come in with a very wide range of hopeful thoughts about how it might possibly go. Well, even if I had 20 minutes for this talk and months to prepare it, I would not be able to refute all the ways people find to imagine that things might go well.

There is no standard scientific consensus for how things will go well. There is nothing resembling a real engineering plan for us surviving that I could critique. This is not a good place in which to find ourselves. If I had more time, I try to tell you about the predictable reasons why the current paradigm will not work to build a superintelligence that likes you or is friends with you, or that just follows orders.

You do not need to believe me about exact predictions of exact disasters. You just need to expect that things are not going to work great on the first really serious, really critical try. My prediction is that this ends up with us facing down something smarter than us that does not want what we want, that does not want anything we recognize as valuable or meaningful.

I am not saying that the problem of a aligning superintelligence is unsolvable in principle. I expect we could figure it out with unlimited time and unlimited retries, which the usual process of science assumes that we have. The problem here is the part where we don’t get to say “woops that sure didn’t work”. That clever idea that used to work on earlier systems sure broke down when the A.I. got smarter. Smarter than us.

We do not get to learn from our mistakes and try again because everyone is already dead. It is a large ask to get got an unprecedented scientific and engineering challenge correct on the first critical try. Humanity is not approaching this issue with remotely the level of seriousness that would be required. We are very far behind.

This is not a gap we can overcome in six months, given a six month moratorium. If we actually try to do this in real life, we are all going to die. People say to me at this point, what’s your ask? I do not have any realistic plan, which is why I spent the last two decades trying and failing to end up anywhere but here.

My best bad take is that we need an international coalition banning large A.I. training runs, including extreme and extraordinary measures to have that ban be actually and universally effective. Monitoring all the data centers being willing to risk a shooting conflict between nations in order to destroy an unmonitored datacenter in a non-signatory country. I say this not expecting that to actually happen.

I say this expecting that we all just die. But it is not my place to just decide on my own. Humanity will choose to die. The point of not bothering to warn anyone. I have heard that people outside the tech industry are getting this point faster than people inside it. Maybe humanity wakes up one morning and decides to live.

--

--

Nicogs

Pioneer of emerging trends, tools and tech before they break through