Teaching software to predict handshakes, hugs and kisses
MIT researchers use TV clips to train algorithm to guess what interacting humans might do next
It sounds like the premise of a science fiction movie, but researchers at MIT's Computer Science and Artificial Intelligence Laboratory are teaching computers to see into the future.
They've created an algorithm that can forecast hugs, kisses, and high-fives — before they even happen.
CBC Radio technology columnist Dan Misener explains how computers are getting better and better at understanding and anticipating human behaviour.
What have these MIT researchers created?
They've developed an algorithm that can look at a photo of two people and predict what's going to happen next.
For instance, if I showed this software a photo of you and me meeting on the street, it can anticipate whether we're likely to hug, kiss, shake hands or high-five.
But of course, computers don't have a lifetime of experience, and they don't have common sense. That's the challenge the researchers are trying to tackle.
They're tackling it using artificial intelligence, deep learning, and computer vision — all significant trends in technology right now.
- Deep Learning Godfather says machines learn like toddlers
-
Facial recognition tech is allowing stores to reward customers
How did they teach the algorithm to predict hugs?
If you want to train a computer to recognize human behaviour, you need to feed it lots of examples.
So according to MIT researcher Carl Vondrick, they turned to a massive collection of recorded human behaviour — YouTube.
"We had downloaded 600 hours of video from YouTube," he said.
And the algorithm learned about human behaviour by watching YouTube videos. Some of the videos were clips from TV shows like Desperate Housewives and The Office. Others were user-generated. And with each clip the algorithm watches, it learns something about human behaviour.
"So it's learning, for example, that when someone's hand is outstretched, that means a handshake is going to come," Vondrick said. "Or maybe that [if] two heads are very close together, that might suggest a kiss."
Once the researchers trained the algorithm, it could make predictions or forecasts a few seconds in the future. So they could pause a video, and ask the algorithm — what do you think is going to happen next? Will they shake hands? Will they kiss? Will they hug? Or will they high-five?
How accurate is it?
Vondrick said right now, the system is about 43 per cent accurate at predicting one of those four actions. That's a lot better than the one-in-four odds you'd have if you simply guessed randomly.
But it's not nearly as good as human beings. Again, humans have intuition, and we're pretty good at picking up on context, body language and all sorts of things that aren't captured in video.
That said, human intuition isn't always right. For instance, there are those moments where one person offers a handshake, but the other person leans in for a hug. And both people feel slightly awkward. So even humans aren't perfect at predicting the future.
Vondrick thinks his system can get much more accurate, and the key will be exposing it to more video. They're using machine learning techniques, so the system should get better the more training data it has.
What could this algorithm be used for?
Predicting high-fives is interesting, but practically speaking, it's not all that useful.
But Vondrick said with enough training data, the algorithm could be used to predict any kind of human behaviour.
He said he's particularly excited about potential health care applications:"So if we can predict that someone's going to fall, or maybe hurt themselves, or maybe start a fire, maybe we can alert emergency responders, or even try to prevent the accident from happening."
This is possible, because sometimes there are indicators that show up long before you actually fall.
And if technology can help monitor for those signs, that could be a real win.