Multilingual Chatbots Making Conversational AI Vernacular
2019 began with a report from Dentsu Aegis that highlighted three V’s that will drive digital media growth for the foreseeable future – Voice, Vernacular and Video.
Having already made considerable headway in the area of Voice bots, our team at Haptik turned their attention towards the second V – Vernacular.
A multilingual chatbot i.e. a bot that can converse with users in multiple languages, can be a tremendous asset to any organization. This particularly holds true in a highly linguistically diverse country like India. The digital revolution in India has exponentially broadened the Internet user base in the country to include large numbers of non-English speakers, who vastly outnumber English language speakers in the country.
Source: Indian Languages – Defining India’s Internet, a report by KPMG in India and Google, 2017
In fact, a 2017 report by KPMG and Google estimated that the number of Indian language Internet users will be 536 million by 2021; far out-stripping the projections of 199 million English language Internet users in India.
So multilingual chatbots are definitely shaping up to be an essential investment for any company operating in India; or indeed in any market with linguistically diverse customers.
Of course, there were a number of other factors that drove us to dedicate considerable resources towards the development of multilingual chatbots…
The need for multilingual chatbots
Automation is a word we swear by at Haptik. It is our endeavor to ensure that our bots carry out as many conversations as possible in a completely automated manner – with no need for human intervention.
Which did raise the question – what if a user entered a query in a language other than English? This was not idle speculation on our part. Our team had noted several instances of bot breaks caused by users typing queries in Hindi, and other vernacular languages.
This observation alone gave us two very powerful reasons for equipping our bots with multilingual capabilities. Firstly, it would ensure a far superior customer experience by eliminating bot breaks caused by users inputting queries in non-English languages. And secondly, equipping our bots with the ability to converse in vernacular languages would make them accessible to the vast number of Indians for whom English was not the language of choice, but who were increasingly turning to digital platforms and interfaces for a wide range of needs and wants.
Our enterprise partners too were keen on multilingual chatbots, as it would enable them to effectively respond to substantially higher volumes of queries through the bot, and reach out to a far greater number of people. A vernacular language chatbot would be particularly useful to a business seeking to make inroads in a specific regional market and/or build brand awareness among potential customers who speak a particular language.
Developing multilingual chatbots thus clearly became an imperative for us.
How do you build a multilingual chatbot?
Chatbot localization is not as easy as taking an English-language chatbot and translating all its content to a vernacular language.
A fully-functional multilingual chatbot needs to be able to decipher the language, understand exactly what the user wants, and respond naturally.
The first step towards building that capability in a multilingual chatbot is ensuring that it is able to recognize entities in Indian languages.
Named Entity Recognition
A bot’s ability to detect and understand words, and extract the relevant information from them, is crucial to its functioning.
Named Entity Recognition (NER) is a key component of our natural-language processing (NLP) application. It is what allows the bot to accurately identify entities in the inputted text such as date, time, location, quantities, names and product specifications.
At Haptik, we had already developed a robust NER system for our English language chatbots. We upgraded our NER system to enable our bots to detect entities in five Indian languages – Hindi, Marathi, Gujarati, Tamil, and Bengali.
Essentially, this meant that our bots would be able to identity and extract relevant information that users entered using these languages.
You can read more about how our NER system for Indian languages works here.
One Bot or Multiple Bots
Another early decision that needed to be taken was whether we would build one chatbot that had support for multiple languages, or whether we would build a separate chatbot for each language.
Ultimately, our team decided it would not be feasible to develop a new chatbot for every new language to be added. Particularly since any changes made to the chatbot in one language needed to be implemented across all languages. Maintaining and updating the bot would be far more efficient if changes could be made to a single bot and applied across all the languages it supported.
Essentially, this means that when our multilingual chatbot switches from one language to another, the only things that change are the User Responses, Bot Responses,
Haptik has one of the most robust Conversational AI platforms in the world. But such was the challenge of building multilingual chatbots that our team needed to make changes to our systems at multiple levels, in order to effectively build our multilingual capabilities.
From changes to our system backend, to our Machine Learning, to our Chat SDK’s, to our bot-builder tool – our team needed to carefully reassess our entire platform, and ensure that all components across the stack were language adaptable.
Our enterprise partners, who opted for multilingual capabilities, would need to make changes to their systems as well. Their API’s would need to be converted to the target language. Moreover, they would need to provide us with a list of relevant entity names in the target language, so that these could be added to our system.
Early challenges and improvements
The first multilingual chatbot we worked on was a Hindi-language version of a Customer Support Bot we had previously developed for one of our enterprise partners – Dr. Lal PathLabs, one of India’s leading diagnostics chains. You can read more about our previous work for Dr. Lal PathLabs here.
The choice of Hindi was prompted by the fact that we had access to a reasonably large amount of Hindi language data, as well as fluent Hindi language speakers.
You can find the Dr. Lal PathLabs multilingual chatbot here.
Note: For the purposes of the following discussion we have used ‘Hindi’, but the same challenges and processes apply to the use of any other language.
Switch Between English and Hindi
Initially, there was a concern that users might not use the bot’s Hindi language option because they would not be able to find it. This was based on the past experience of visitors to the Dr. Lal PathLabs website not being aware that they could easily switch to the Hindi language version of the website.
To ensure that the multilingual capabilities of our bot had high visibility and were easily discoverable, the language options were prominently displayed in the chat interface. Thus, users would always be reminded that they have the option to switch to their preferred language. Moreover, the default language option for the chatbot on the English and Hindi versions of the Dr. Lal PathLabs website was the corresponding language.
The bot understands that the phrase “Report do” means the same thing as “Give me the report” and responds accordingly.
Language detection can be a challenge with multilingual chatbots. This is because sometimes, a mixture of languages is used while inputting text. For instance, based on past experience, we knew that users from metro cities sometimes type in ‘Hinglish’ (a blend of English and Hindi). They would type in
Entities such as the names of cities or medical tests can be detected in English even in a Hindi language bot
Dr. Lal PathLabs also wanted certain entities (such as test names) to be primarily detected in English, and if possible understood in Hindi. To this end, we added robust language support to our entity detection framework. With this capability, our bot was now able to understand these entities irrespective of the language in which they were typed. This capability is now a fundamental part of our multilingual bot framework – enabling it to carry out multi-language entity detection for a variety of information such as names, cities or phone numbers. In practice, this means that you can enter a city name in English even while using the Hindi-language version of our bot, without causing a bot break.
Making the Language More Conversational
The Hindi bot uses transliterations of English words in place of their less commonly used Hindi counterparts, to make the language simpler and more conversational
Using the translation capabilities of our bot builder platform, we were able to quickly create a version of our existing English language bot, which contained translated Hindi versions of all the content in terms of expected user queries and bot responses. The problem was that the translations provided by the system could be too formal, and not representative of the way most Hindi-language speakers using the bot would speak/write.
From a technical standpoint, this was not really a difficult issue to resolve. But it did require a lot of research and internal discussion to determine how to make the language more conversational and colloquial. In fact, we even conducted a focus group of Hindi-language speakers to gain insights on how to go about this!
Making the language more casual also involved using transliterations of English words that are commonly used in place of their Hindi counterparts during casual conversation. For instance, using ‘lab’ in the Hindi script instead of the less commonly used Hindi word ‘Prayogshala’.
The Haptik advantage
Developing a multilingual chatbot has been a steep learning curve for our team. But the end result of our efforts and our platform-wide changes and upgrades has been a robust, efficient and scalable system for building vernacular language chatbots.
At Haptik, we pride ourselves on our “Build Once, Deploy Anywhere” philosophy – our bots can be deployed across multiple platforms, allowing our partners to have an omnichannel presence. And now, owing to our multilingual capabilities, Haptik’s bots can be used to proactively engage customers not just across platforms, but also across languages!
The Haptik advantage lies in the fact that we provide translation capabilities across our various bot attributes. On most other platforms, when you add a new language, you typically start with a blank slate and need to re-enter all the content in the desired language. Our platform, on the other hand, does all the heavy-lifting. It swiftly delivers a basic bot with content already available in the target language. Thus, you essentially have a bot that is ready for testing on Day 1 – a great starting point for any team that substantially speeds up the development process.
Moreover, we have now incorporated language across our entire product line. Our Chat SDK’s offer full language support as well. So our multilingual chatbots can now be easily integrated with any partner’s systems.
Moving forward, another area of interest is multilingual voice bots. As per Google, 30% of queries in India are voice-based, with voice queries in Hindi growing annually by 400%. We believe that while users would prefer text and tapping-enabled chats for longer and more complex conversations, voice bots could potentially be a better option when it comes to quicker conversations with shorter sentences. Needless to say, our team is currently hard at work building multilingual capabilities for our voice bots!
As Conversational AI becomes increasingly more prevalent across the length and breadth of India, the need for companies to adopt multilingual chatbot solutions will only grow. At Haptik, it is our continuous endeavor to ensure that the majority of customer service conversations in the country take place on an interface developed by us. And our multilingual capabilities will enable us to take a huge step forward towards that goal!