The threat from AI is not that it will revolt, it's that it'll do exactly as it's told
A new book explores the problem that we may not be smart enough to control superintelligent machines
Science fiction movies about the rise of intelligent machines often focus on scenarios in which computers acquire superhuman intelligence and then become hostile to humanity — rising up in defiance against their human masters.
But according to pioneering AI researcher Stuart Russell, this is a profound misunderstanding of the risks.
The problem, he suggests, is not that these super intelligent, super capable machines will defy us. It's that they'll do exactly what they're told — but we'll tell them to do the wrong things — and this could end in disaster.
In a new book Russell, a professor of computer science at the University of California, Berkeley, explores this problem. He also proposes the framework for a solution that would allow future AIs to do what we need them to do — even if we can't figure out what that is.
Bob McDonald Spoke with Stuart Russell about his book Human Compatible: Artificial Intelligence and the Problem of Control.
This interview has been edited and condensed.
So what is the problem of control when it comes to artificial intelligence?
So I think this is a very common sense thing. If you go around making things that are smarter and more powerful than yourself, it's intuitive that something could go wrong. So with super intelligent machines the problem with making them more and more intelligent is that we don't know how to specify their objectives correctly.
The way we currently think about designing AI systems that are increasingly powerful, we put in objectives that they're supposed to achieve and they figure out how to achieve those objectives and they carry out the plan.
Then we have a machine that's more intelligent, more powerful than human beings that's doing what we asked for but not what we really want. Recently we've started putting them out into the real world and particularly in social media platforms where, in fact, it's algorithms that decide what billions of people spend hours every day reading and watching.
The algorithms led to a process that actually modifies people to make them more predictable and it seems empirically that one way to do that is to make you more extreme in your tastes, whether it's your taste in politics or your taste in videos or whatever it might be.
We see people driven to extremes and that I think is an unintended consequence. But it's an obvious mathematical inevitability of setting things up the wrong way.
Why is it so difficult for humans to set goals or objectives for machines that wouldn't ultimately be a problem?
When humans give goals to each other — for example if I ask someone, "Could you fetch me a cup of coffee?" — I am not making the fetching of the coffee the life's mission of that other person. They're entitled to say, "Fetch your own coffee" or "We don't have any coffee, but I can get you some tea." It's not the sole objective they interpret it against the entire background of understanding of what other kinds of preferences people have.
So usually when we replicate this with machines, the machine that you asked to fetch the coffee doesn't know that it's not OK to kill other people in Starbucks to get to the front of the line.
So for example suppose you have a climate control machine and you want it to gradually reduce the level of carbon dioxide in the atmosphere back to pre-industrial levels. If you gave that as the objective, probably the most efficient way to do that is just to get rid of all the humans, because we are the ones producing the carbon dioxide.
And so we say, "OK, I didn't mean that. What I meant was restore carbon dioxide levels and don't kill any people." So then it thinks just have a very subtle social media campaign to convince people to have fewer children, that it's their duty to humanity to not overburden the planet with more children, and then gradually reduce the human population to zero that way.
You get down to number 253 on the list of objectives that you forgot to include and this is a sort of intrinsic problem with satisfying objectives. Anything you leave out of the objective is fair game for the machine to change in order to satisfy the objective in the optimal way.
You point out in your book that even building in an off switch that if things start to go sideways we could just pull the plug. But you say that even that could be avoided. Tell me about it.
Interestingly Alan Turing, who's the founder of computer science, warned about this possibility we might want to switch the machine off. But we give it the objective of let's say fetch the coffee then the robot that's sufficiently intelligent thinks for itself. OK. Well I have to fetch the coffee but if someone switches me off I'm not going to be able to fetch the coffee therefore part of the fetch the coffee plan is going to disable my off switch.
We have to get away from this whole idea that the way you design AI is you make machinery that is good at achieving objectives and then you plug in the objective.
The second half of the book says 'well here's this other model which actually doesn't have that property.' You design the machine so that it knows that it doesn't know what the person wants.
It's still obliged to try to satisfy what the person wants but it knows that it doesn't know what that is. And when you design the machine that way it kind of does what you want.
Mostly you're saying that the computers have to study us to understand what our preferences are and then act on that.
That's right. So of course they have to study us. And sometimes that's pretty easy right? You say 'I'd like a cup of coffee.' That's pretty direct information about my preferences, and the machine of course understands that that doesn't override all other preferences. So it's not entitled to find a passing stranger who's got a cup of coffee and steal it from him, or shoot everyone else in Starbucks to get to the front of the line and so on. But it's pretty direct information telling me that all other things being equal I'd rather have a cup of coffee.
So you're saying the ultimate function of machine learning is going to be learning from watching us what it is we want and we need?
That's a big part of it. The machines also need to learn about the world itself -- how does the physics of the world work, how does the Internet work, how does human language work. But yeah, one of the primary functions of the A.I. is to learn more about what humans want.
You suggest that at a certain point it will be in our interest for the machines to manipulate us — to work to change our desires and preferences.
There is no way that the machine can leave our preferences completely intact. Having a faithful servant — which we hope the machine will be — is bound to change who we are. Maybe it might make us a little bit more spoiled. So we actually need a little bit of philosophical help to figure out when it's okay for the machine to act in ways that cause preference change in humans.
We don't actually have a good metaphor because there are no examples in nature of less capable, less powerful, less intelligent beings controlling more powerful, more capable, more intelligent beings.
How do we find something that's satisfying to engage in when we're never going to be as good as the machines at doing it? How do we find the motivation to go through 20 years of grueling education in order to acquire skills that serve no real function in our society?
We will need to develop a culture that retains human intellectual and even physical vigor inventiveness simply placing importance not on consumption and enjoyment but on the acquisition of knowledge and skill and placing importance on capability within humans. This is a cultural problem and not a technological one.
Written and produced by Jim Lebans