Matt Simon reports in Wired:
Robots' creepiness is not only in the face and in the gaze. It's also in the voice, the way it speaks. Tonality is very important. Linking semantics with tonality goes wrong, poorly integrated in a way that never happens with humans.The chatbot that also had an avatar was annoying to people.It gave the same response as the text one, but in the case of the text chatbot, the participants found it competent. But to (the) one that had a face and gaze, the response was negative. Conversations with the text-based chatbot were twice as long.
Call it the Great Convergence of Creepiness. The first bit, the uncanny valley, we’re all familiar with by now: If a humanoid robot looks realistic but not quite realistic enough, it freaks us out. So far that idea has been applied almost entirely to robot faces and bodies, but it’s less known as a phenomenon in robot voices.
Except, that is, to Kozminski University roboticist Aleksandra Przegalinska, also a research fellow at MIT. Przegalinska is bringing a scientific ear to the booming economy of chatbots and voice assistants like Alexa. WIRED sat down with her at SXSW this week to talk about the monumental challenge of replicating human intonation, why the future of humanoid robots may not be particularly bright, and what happens when you let students teach a chatbot how to talk.
This conversation has been edited for length and clarity.WIRED: So why study robot voices, of all things?Przegalinska: When you think about robots, the creepiness is not only in the face and in the gaze, although that's very powerful. It's also very often in the voice, the way it speaks. The tonality itself is a very important thing here. That's why we got interested in chatbots, and so we built our own.The chatbot was talking to my students for a whole year, mainly learning from them, so you can gather what kind of knowledge it got in the end! (How many curse words!) They were humiliating it constantly. Which is perhaps part of the uncanny valley, because when you think about it, why are they being so nasty to chatbots? Maybe they're nasty because the chatbot is just a chatbot, or maybe they're nasty because they're insecure—is there a human inside that thing? What's going on with that?That happens with physical robots too. There was a study in Japan where they put a robot in a mall to see what kids would do with it, and they ended up kicking it and calling it names.With kids—I have a 6-year-old—it's a jungle. They are on that level where nature is still strong and culture is not so strong. When you create a very open system that is going to learn from you, what do you want it to learn? My students always talk to that chatbot, and they're so hateful.Maybe it's cathartic for them. Maybe it's like therapy.Maybe it's therapy related to the fact that you're processing these uncanny-valley feelings. So you're angry, and you're not sure what it is you’re interacting with. I feel that these weird relationships with chatbot assistants—where they're super polite, and people are just throwing garbage at them—is a weird situation, as if they were some lower-level humans.Chatbots can take different forms, right? It can be just text-based or come with a digital avatar.We found that the chatbot that also had an avatar was very annoying to people. It gave in most cases the same response as the text one, but the differences in reaction were huge. In the case of the text chatbot, the participants found it very competent to talk to about various topics. But another group had to interact with one that had a face and gaze, and in terms of the affective response, it was very negative. People were constantly stressed out. Conversations with the text-based chatbot were usually twice as long.What about how your chatbot behaved? How was it as a conversationalist?Whenever you had a conversation, the chatbot would try to mirror what the other person was saying. For instance, if you said you hated sports, and the conversation was long enough, the chatbot would say, “I hate sports too.”So it could be lying to you.Of course. Constantly. It was also flipping a lot. So for instance, you had one interaction where it presented itself as Republican, and you had another interaction where it presented itself as a Democrat and a very progressive person. Hating sports and loving sports. Hating certain nationalities. It was interesting to see, but it was signaling certain potential dangers related to these interactions. When you think of yourself as a company, let's say you build yourself a chatbot. You're Nike, and then the chatbot says it hates sports. What would you do about that?Or worse, it gets racist.Which happens, actually. I think our chatbot was still very controllable in many ways, and it was surprising to us to see how frequently it was flipping. We did curate some of the content it was presenting, but then it diverged from that so easily through interactions with other people.Beyond the semantics, when it comes to current robot voices, what is throwing people off specifically?Even if it's a short sentence, bots finish it in such a way as if it's a long one. It’s so conclusive in a way, it sounds like you expect a long statement and then the sentence ends. So there's a problem with understanding the tonality and context of what you're saying. So linking the semantics with the tonality, that's the part that goes wrong.What about the extra level of complexity when that intelligence is embodied in a physical robot like Sophia, which most people know from her talk show appearances?Maybe the problem is integrating it all together. We know that systems like that are very modular, in the sense that there's a system responsible for moving the head and another one for the smiling. All these modules sometimes are poorly integrated in a way that never happens with humans, or at least very rarely. I think that's the uncanny valley, the delays in the responses. It requires really big computational power. But I have no doubt that that’s the future. Maybe not for this company, maybe not with this particular case. Unless humanoid robots get abandoned altogether. That's also an option. I think it's possible.Really? Why would you say that?Because I think that if you have some sort of system that's easily classifiable as a machine but is still super smart and responsive, perhaps that's enough. Why would you care? It could even be a box that leans forward or backward, making those little gestures that indicate that it knows what kind of emotion that is. Perhaps people want something that looks like a vacuum cleaner and speaks to them, rather than have a Sophia, which is already so disturbing.
0 comments:
Post a Comment