Why chatbots need a big push from deep learning

Revolutionary machine learning techniques, such as deep learning neural networks, have enjoyed tremendous progress within the span of just a few years. AI advances are turning problems previously thought to lie beyond the realm of what machines could tackle into commodities that are percolating into our everyday life.

Tailing the remarkable growth in popularity enjoyed by AI, a new generation of chatbots has recently flooded the market, and with them the promise of a world where many of our online interactions won’t happen on a website or in an app, but in a conversation. Helping turn this promise into reality is a combination of better user interfaces, the omnipresence of smartphones, and new, state of the art machine learning techniques.

Perhaps one of the main drivers behind this wave of novel AI applications is deep learning, an area of machine learning that, despite existing for roughly 50 years, has recently revolutionized fields such as computer vision and natural language processing (NLP). Nonetheless, despite its incredible performance, deep learning alone is not sufficient to solve the challenges faced by chatbots. Understanding context, disambiguating between subtle differences in language that can lead to wildly different meanings, employing logical reasoning, and most crucially, understanding the preferences and intent of the consumer, are just a few of the many challenging tasks a system must be able to perform in order to sustain conversation with a human.

The ability to answer complex questions using not only context, but also information beyond the confinements of the dialog, is indispensable for building truly powerful chatbots. To answer questions effectively, the bot needs to rely on information that was shared previously in the conversation, or even within other conversations between the bot and the consumer. Moreover, business goals and the intent of the consumer can influence the kind of response the bot will give.

If a modern conversation engine hopes to go beyond answering simple, one-level questions, it must blend the most prominent techniques emerging from the field of deep learning with solid statistics, linguistics, other machine learning techniques, and more structured classical techniques, such as semantic parsing and program induction.

The first stop in building an intelligent conversational system is data. In particular, deep learning is notorious for needing vast amounts of high quality data before it can unleash its true potential. But while we live in an era where endless streams of data are constantly being generated, most of it is too raw to be of immediate use for machine learning algorithms.

Unsupervised Learning, the subfield of machine learning devoted to extracting information from raw data, unassisted by humans, is likely a promising alternative. Among its many uses, it can be utilized to build an embedding model. In plain English, these techniques allow data to be represented in a less complex form, allowing patterns to be discovered more easily.

While unsupervised learning is already ubiquitous in machine learning, deep learning offers additional innovative ways to build — such embedding models — providing state of the art performance. Optimization of these techniques can alleviate the need for a lot of high quality and expensive labeled data, which is essential in getting artificially intelligent chatbots to perform well.

However, the standard approach in deep learning involves collecting a large, highly specific dataset, which is subsequently used to train a network with a mostly static architecture. Once trained, the network maps directly from input to a fixed set of outputs that are known in advance. Despite being the foundation of remarkably powerful systems, this approach isn’t flexible enough to handle the kind of information needed to carry a realistic conversation. This brings us to the next big obstacle in the way of truly human-like chatbots: the ability to maintain and reason with an internal model of the world.

We humans are constantly (and usually subconsciously) checking every new piece of information we receive from our surroundings against an internal model of the world — a model of what is normal and what is not, of how entities are related, how we can make logical inferences involving said entities, and so on. If, when driving, we see a ball rolling down the street, we immediately know we should slow down and remain in state of alert, looking out for the possibility that a distracted child will soon pop out of nowhere while chasing their ball. This kind of intuition is built on top of an understanding of how entities relate to each other, combined with the ability to make logical connections along a knowledge graph and come up with a conclusion that requires multiple reasoning steps.

This level of automatic and extremely broad reasoning still eludes AI researchers and is perhaps one of the last frontiers in the way of truly intelligent and autonomous AI agents, conversational bots included. To accomplish this goal, the ability to reason is central.

Finally, the ability to put it all together is yet another frontier waiting for a solution. Unlike a search engine where the user is content with being presented a list of matches ordered by relevance, a conversation engine must be more specific. Simply using NLP to identify a set of relevant information is insufficient. It should be able to parse the input, break it down, and present a response to the user that is not only clear and concise, but highly relevant to their taste — rinse and repeat.

We are still in the early stages of the AI-powered conversational revolution, and it is fair to assume some problems that seem insurmountable today will likely be solved in the coming years. We are quickly moving toward a world in which you will be able to have long and complex interactions with your AI assistants, which will not only understand what you want to say but will know your preferences and tailor your experience accordingly.

To do so, we must merge multiple disciplines, including deep learning, statistics, and others, building technology that blends consumer preferences, environment, and language into one piece of intelligent, flexible software.