How We Built Out First Voice Bot


2017 kicked off the dawn of the Voice Assistant a.k.a the Voice Bot.

Alexa, Google Assistant and Siri, all won a place for themselves in customers hearts, homes and retail shelves everywhere.

Their popularity keeps growing by the day and the reason behind this hype is quite obvious. They make life so convenient! All you need do is talk, which is naturally easier than typing or tapping buttons on your phone to get things done.

But how can a company use a voice bot? What difference is this going to make for businesses?

Here’s our story from the forefront of this new vocal revolution that should answer what a voice bot can do for you.

What’s a Voice Bot?

A voice bot is a type of chatbot. Being a Conversational AI company, we had over 4+ years of experience building bots for a text-based interface.
But building for Voice was fairly different and a challenge that we were raring to work on. That’s when we received our first client request to build a voice bot for a fast-food chain.

Types of Voice Bots

There is no well-defined manual to follow when you’re developing new technology. Based on the interface, voice bots fall into the following two types:

1) Voice + Text Bots – Hybrid Voice + Chat Support Model


voice + text

If you’ve used Siri on your iPhone or Google Assistant on your Android phone, you know about Voice + Text bots. These are text-based bots with a layer of voice on top of it, and the input form is speech as well as text.

A lot of companies have lately started including voice support on their text-based bots, like Axis Bank’s customer support app: ‘Axis AHA’.

2) Voice Only Bots – Voice Controlled Devices

voice only


Alexa on Amazon’s Echo speakers and Google Assistant on Google Home devices are the most popular voice assistants in the market. These beautiful products can handle everything from simple tasks like setting alarms and playing music to more complex tasks like controlling your gadgets and turning your house into a smart home.

Our client required both these types of bots to deploy as an Alexa skill and a voice supported text-bot for their website and mobile apps.

Building The Bots

1) Voice + Text Bot – Hybrid Voice + Chat Support Model

We realised at the nascent stages of our first ‘voice project’ that text-based bots can’t directly be converted to a voice + text bot. Thinking voice-first was the key to laying the groundwork.


The biggest challenge was taking a ground-up approach to designing chat flows. Adding a voice layer on top of a text-only bot doesn’t mean simply adding the tech support for it, i.e., it’s not just a one-line code change to your bot.

❌ Bad Experience Design

When the voice script and chat script are the same. It’s easier for the user to read the text rather than wait for the bot to stop speaking in some cases. The voice script needs to be less verbose and to the point.

✅ Good Experience Design

Keep the voice script different from the chat script. Having the voice script more precise and to the point helps enhance the experience.


Voice + Text Bot: Good vs Bad Experience Design

Final Results

In our case, we already had a chatbot platform which was text-based and it was simply a matter of adding a layer of voice on top of it.

Here are our favourite Speech to Text and Text to Speech APIs:

1. Bing API – It offers a punctuation feature which isn’t there in Google’s speech-to-text API, which makes a distinct difference.

2. Amazon Polly – With Alexa dominating the voice bot world, we didn’t have to think twice about which API to use. It also helped that we’ve been using Amazon Polly for quite a while on our app! 

2) Voice Only Bot – Voice Controlled Device

In layman’s terms – an Alexa Skill is to an Alexa device what an iOS app is to an iPhone. As of March 2018, there are more than 30,000 Alexa Skills in the U.S alone.

While building an Alexa skill or Google Home command, the User Experience structure is very different from that of a text-based bot. This applies to voice + text bots too. This is because of the missing visual interface. The Echo Spot is the only Echo device with a screen, which has a low market share as of now.

  • Using only voice as the mode of communication can be tricky for the product designers in some ways. Taking information like addresses can be very easy for text-based bots but doing this on a voice controlled device just doesn’t work.

  • The industry-wide practice is to take this kind of data using the companion app of the voice device.

  • For example, the Zomato Alexa skill directly pulls a user’s address from his Zomato account. This can be managed through Alexa’s companion app on their phone.

  • Since this is not great UX, this process is not to be used too frequently. The point of having a voice-controlled product is lost if a user needs to use his phone too often.

❌ Bad Experience Design

No guidance for the next steps or re-prompt in case of no reply.

✅ Good Experience Design

It’s very important to guide the user in the right direction and help them fulfill their final goal. Following up with the user if there is no reply is essential. This mirrors ‘Quick Replies’ on most Messengers where you can simply tap a suggested option to send a message.


Voice Bot: Good vs Bad Experience Design

Final Results

Our team designed an Alexa Skill ground up for our client that can collect orders, describe the menu and even tell you funny chicken jokes!

We used our backend to make the Alexa bot which consumed all the APIs and converted that into information which can be presented to the end user. For instance, we utilize the Menu API and communicated an information-heavy food menu in a way which feels conversational and minimalistic to the end user.

Key Takeaways

  •  Voice support for a text-based bot is not an ON/OFF switch

    Even if you’re converting a chatbot into a voice bot, think voice-first bot from the beginning of the project.

  •  The communication that you use should be different

    You don’t speak to a person and write to a person in the exact same way. Don’t use the same chat script and voice script.

  •  The user should be able to understand the capability of the bot

    Making your product easy-to-use is essential. I didn’t start using Siri until I saw someone else use it correctly – here is the link to that ad if you’re interested

  •  Give hints on the next steps in the user’s journey

    It is very important to communicate the usability of the platform to your end user throughout the conversation.

  • Alexa only gives you 8 seconds

    If there is no interaction between the user and Alexa then the device automatically exits your skill. A prompt is sent after 8 seconds, the device exits the Alexa skill if the user fails to respond again.

Explore 7 remarkable Conversational IVR trends for the year 2021 and beyond

Which type of voice bot should your business have

Voice bots suit the modern consumer’s lifestyle extremely well. Making your services available on these platforms is a sure-fire way to reach a bracket of customers who wish to get things done as quickly and conveniently as possible. So if your business use case is something which can be conversational, like ordering food/groceries or even customer support, then certainly consider getting a voice bot.

Voice + Text Bot – Hybrid Voice + Chat Support Model

Your user wouldn’t have access to their Alexa device while travelling and would need to use their to complete a task. Also, if your bot is expected to communicate a lot of information to a user, like a menu from multiple restaurants or a long list which is information heavy, then you should leverage the visual interface of voice + text bots. The Echo Spot is evidence that a screen can be leveraged to enhance user experience.

Voice Only Bot – Voice Controlled Device

If you’re looking to add another channel through which you can do business then a Google Home command or an Alexa skill is the right step for you. It most definitely helps to have a large user-base. Uber, Zomato and Dominos are some companies who have seen great success with their Alexa Skills and Google Home commands because discoverability wasn’t a problem for them.

It helps to maintain a presence on both these platforms if your service is likely to be used both on-the-move and at home.

Conversational interfaces have been around for a while now, and voice bots are a great way for businesses to use automation and connect with the world at a human level.

We’ll soon be talking about the tech side of building our first voice bot. So, stay tuned.

Read on: Conversational IVR: Automate Customer Care Calls with AI


Considering a bot for your business or want to know about implementing a voice bot?