A Blog by Jonathan Low

 

Aug 24, 2017

How Microsoft Is Trying To Make Voice Assistants Respond Like Humans

More data from more sources, including data providing an understanding of context. JL

Dave Gershgorn reports in Quartz:

Voice-activated software is currently useful for answering simple fact-based questions. Microsoft is focus(ing) on information retrieval: How does a human know a good answer? What does human exchange sound like? How important is context? It includes data about levels of stress, emotion, engagement, and satisfaction. It contains transcripts, audio, video, and data. Algorithms can be given a better idea of how humans would accomplish an information driven goal since they have data to understand what makes a human satisfied with an answer.
As more and more of us ask the virtual personal assistants that live on our phones what the weather will be tomorrow or the height of the Chrysler Building (1,046 feet), it’s clear that the voice-activated software is currently only really useful for answering simple fact-based questions.
Microsoft, whose virtual assistant Cortana operates on Windows and as a cross-platform mobile app, is trying to smarten up our dumb bots by making a new dataset available to the public, letting future AI analyze how humans would do the same tasks that virtual assistants handle every day. The dataset (pdf) consists of 22 pairs of humans talking to each other—one person without the internet asking for information, and another taking those questions and trying to come up with a good response.
Data has been shown to be a factor that unlocks new capabilities for artificial intelligence research. When Fei-Fei Li built the largest image dataset for machine learning at the time, called ImageNet, it gave deep learning a platform to become the dominant form of AI we see today. Similar to Microsoft’s dataset, when Facebook wanted chatbots to learn how to negotiate with humans in English, it trained the bots on human transcripts.
The Microsoft dataset focuses specifically on information retrieval: How does a human know when they’re supplied a good answer? What does a natural human exchange sound like? How important is context? It also differs from other conversational datasets (like one from the makers of Siri, which catalogued transcripts between travel agents and prospective vacationers), because it includes data about the participants’ levels of stress, emotion, engagement, and satisfaction. It contains transcripts, audio, video, and the aforementioned data. Microsoft did not mention expanding the database over time.
Here’s an example of one of the questions:
Imagine that you recently began su‚ffering from migraines. You heard about two possible treatments for migraine headaches, beta-blockers and/or calcium channel blockers, and you decided to do some research about them. At the same time, you want to explore whether there are other options for treating migraines without taking medicines, such as diet and exercise.
The researchers classified this as a low difficulty, high complexity problem—one where the information might be easily available, but there are many factors to consider. The humans then talk through getting more information until the information seeker is satisfied with the result.
The trick now for AI researchers is to design systems that can make use of this complex data. Today, virtual assistants like Alexa analyze what a person is saying, but not how they say it or whether the command has multiple parts. It’s a glorified Google search, as a result of the simple datasets mentioned before. Simple data teaches machines simplistic ideas.
But now, future algorithms can be given a much better idea of how two humans would accomplish an information-driven goal together, since they have data to understand what kind of information would make a human feel satisfied with an answer, and which wouldn’t. So next time you get frustrated with Siri or Alexa, know that soon they might be able to tell.

0 comments:

Post a Comment