Teaching software to predict handshakes, hugs and kisses

MIT researchers use TV clips to train algorithm to guess what interacting humans might do next

Image | The Office handshake vs high five

Caption: MIT researchers used clips from TV shows to train an algorithm to predict what would happen next in human interactions - a hug, a kiss, a handshake or a high-five. This clip from The Office confused the algorithm a bit. (MITCSAIL/YouTube)

It sounds like the premise of a science fiction movie, but researchers at MIT's Computer Science and Artificial Intelligence Laboratory are teaching computers to see into the future.
They've created an algorithm that can forecast hugs, kisses, and high-fives — before they even happen.
CBC Radio technology columnist Dan Misener explains how computers are getting better and better at understanding and anticipating human behaviour.

What have these MIT researchers created?

They've developed an algorithm that can look at a photo of two people and predict what's going to happen next.
For instance, if I showed this software a photo of you and me meeting on the street, it can anticipate whether we're likely to hug, kiss, shake hands or high-five.

Image | MIT handshake algorithim

Caption: Using YouTube videos, MIT researchers trained an algorithm to correctly guess whether a handshake, hug, kiss or high-five would follow in a situation with 43 per cent accuracy - not perfect, but far better than chance. (MITCSAIL/YouTube)

This is the sort of thing humans are pretty good at. Based on a lifetime of watching people interact, we can read social situations and body language and make educated guesses about how two people are likely to interact.
But of course, computers don't have a lifetime of experience, and they don't have common sense. That's the challenge the researchers are trying to tackle.
They're tackling it using artificial intelligence, deep learning, and computer vision — all significant trends in technology right now.

How did they teach the algorithm to predict hugs?

If you want to train a computer to recognize human behaviour, you need to feed it lots of examples.
So according to MIT researcher Carl Vondrick, they turned to a massive collection of recorded human behaviour — YouTube.
"We had downloaded 600 hours of video from YouTube," he said.
And the algorithm learned about human behaviour by watching YouTube videos. Some of the videos were clips from TV shows like Desperate Housewives and The Office. Others were user-generated. And with each clip the algorithm watches, it learns something about human behaviour.

Embed | YouTube

Open Full Embed in New Tab (external link)Loading external pages may require significantly more data usage.
"So it's learning, for example, that when someone's hand is outstretched, that means a handshake is going to come," Vondrick said. "Or maybe that [if] two heads are very close together, that might suggest a kiss."
Once the researchers trained the algorithm, it could make predictions or forecasts a few seconds in the future. So they could pause a video, and ask the algorithm — what do you think is going to happen next? Will they shake hands? Will they kiss? Will they hug? Or will they high-five?

How accurate is it?

Vondrick said right now, the system is about 43 per cent accurate at predicting one of those four actions. That's a lot better than the one-in-four odds you'd have if you simply guessed randomly.
But it's not nearly as good as human beings. Again, humans have intuition, and we're pretty good at picking up on context, body language and all sorts of things that aren't captured in video.
That said, human intuition isn't always right. For instance, there are those moments where one person offers a handshake, but the other person leans in for a hug. And both people feel slightly awkward. So even humans aren't perfect at predicting the future.
Vondrick thinks his system can get much more accurate, and the key will be exposing it to more video. They're using machine learning techniques, so the system should get better the more training data it has.

What could this algorithm be used for?

Predicting high-fives is interesting, but practically speaking, it's not all that useful.
But Vondrick said with enough training data, the algorithm could be used to predict any kind of human behaviour.
He said he's particularly excited about potential health care applications:"So if we can predict that someone's going to fall, or maybe hurt themselves, or maybe start a fire, maybe we can alert emergency responders, or even try to prevent the accident from happening."
This is possible, because sometimes there are indicators(external link) that show up long before you actually fall.
And if technology can help monitor for those signs, that could be a real win.