Why sticking a head on a robot doesn’t make it any more social

Mass adoption of social robots in our everyday spaces needs the development of human-robot dialogue toolkits for the non-expert

In case you’re unfamiliar with the Gartner hype cycle, it’s a framework used by the industry analysts to situate technology along the axes of maturity and visibility. The hype is at its highest when visibility peaks at a fairly immature stage characterized by speculation on how the technology will evolve in the years to come. An example of this cycle is the one below where Intelligent Agents are at the peak of inflated expectations. What would be your guess as to the year it was published.

When I came across it in my research my guess was somewhere between 2004 to 2006.

I was therefore pretty surprised to learn that it came out in 1995, a decade earlier than what I had imagined. That’s two years before Microsoft deployed their Office Assistant ‘Clippy’ (the paper clip) which, for those of you old enough to remember, most probably evokes a notion of disgust and/or annoyance as it clumsily followed you around the screen, forever getting in the way and not really of much assistance at all.

Now, if you’ve been following the hype around Conversational Agents (also often called Intelligent Assistants), and the sequence of investments in that field over the last year, you might be inclined to say that the peak of inflated expectations for Intelligent Assistants is closer to 2016-2017 than to 1995.
So why the renewed popularity in something that was all the rage over 20 years ago? After some thought there are two main reasons that lie behind this revival. The first is that, back in the 90’s the large majority of the population was quite simply not used to communicating with text-based chat. Since then, it’s gradually spread from being a relatively niche, private channel with friends to one that’s now being used to communicate with businesses and family as well as still friends and convenient for ordering food, trouble shooting, organizing school activities and seeing how grandma and grandpa are doing.

The second reason for the interest is the number of tools that now exist to allow non-technical experts to very quickly create their own dialogue scenarios. Although the current state of technology doesn’t yet allow you to create totally open-domain dialogues (there’s ongoing competition to do something similar), people have realized that most dialogues where there’s money to be made, can be viewed as simple transactional dialogues. This basically means that, after finding out what your intent is (book a hotel, order a pizza), the agent only has to fill a list of pre-defined slots (name, topping, card number, address, etc.) in order to execute a fulfillment (place an order, call a taxi). The machine learning and natural-language processing (NLP) techniques required to do this can be quickly trained, assuming that your dialogue fits into this constrained type. The list of tools to do this is constantly growing and includes (Google), (Facebook), Alexa skill-set, (Baidu), Watson Conversation (IBM), Line (Naver), Snips[1] and etc.

Now, just hold on to this picture of the state-of-the-art in conversational agents for a second while we briefly consider social robots. Contrary to industry robots in manufacturing, which typically function out of the reach of workers for safety reasons, social robots are autonomous robots that interact with people and share the same physical space as them such as the home.


Examples of these robots (such as Knightscope or Savioke) share a common trait: they discourage any natural-language interaction by removing any semblance to a head. Current intents to deploy humanoid robots in a social environment (of which there are many) have had very limited success, mostly because they have to explicitly encode any possible utterance a human could come up with. We’re pretty far removed from that requirement as today most human-robot dialogue systems are generally still at the stage of Interactive Voice Recognition (IVR) where you “Say 'book' if you want to book a flight”. These systems have been around for about 40 years.

Now let’s unify the first thread of conversational agents with the second one of social robots. To allow human robots to interact more easily with humans, I’m convinced we need to learn from the success of chatbots and develop toolkits that allow non-NLP experts to easily create, compile and deploy new dialogue scenarios. For this, we need to abstract away from the technical complexity to allow the experts in each domain (vs experts in NLP), to quickly design their new use-cases. Something similar to what’s being done with business processes.

It’s also close to what conversational agent toolkits have done for text-based interaction, but adding all the complexity of prosody and non-verbal behaviour. We’ve started contributing to this direction at NAVER LABS, by proposing a framework that selects natural and contextualized conversational fillers to overcome inter-turn silences and make the interaction more natural [1]. We’ve also started to investigate the possibility of automating the creation of non-verbal behaviour based on an utterance [2]. Although these are just baby-steps, the widespread adoption of dialogues with humanoid robots will take many such baby-steps. Each one should be appropriately packaged and made available to the community for use and experimentation. Only then will we be able to create the breadth of topics for human-robot dialogue that will permit the long-term use of social robots in all sorts of everyday scenarios, with or without a talking head.



[1] Context-aware selection of multi-modal conversational fillers in human-robot dialogue
Matthias Gallé, Ekaterina Kynev, Nicolas Monet, Christophe Legras
26th IEEE International Symposium on Robot and Human Interactive Communication

[2] BEAT-o-matic: a baseline for learning behavior expressions from utterances
Matthias Gallé, Ankuj Arora
ARMADA workshop at 26th IEEE International Symposium on Robot and Human Interactive Communication

Matthias Galle recently gave a Meet Up presentation on Conversational Robots at Station F, the world’s biggest start-up campus in Paris, France. 

Learn more about NAVER/LINE presence at Station F.

[1]     Naver is an investor of Snips via K-Fund 1 of Korelya Capital.