Voice or Text: Deciding and Designing a User-First Chatbot

"While building a bot, picking the right medium to convey your company’s voice is the first decision you’ll take."

Building a bot is a fairly new addition on most company’s to-do lists. That’s why when you add Voice Chatbots to the mix, things tend to get more confusing.

Instead of simply reacting to market hype, understanding the strengths and weaknesses of each medium (voice, chat or both) will give you the insight needed to build the best experience for your user.

So we’re sharing our advice from the front lines of designing conversational experiences for over 50+ clients. Here are some points to consider, when choosing whether to build a voice or text-based bot:


Some Basic Definitions

For the sake of clarity here’s how we define a few key terms:

Chatbot: A conversational interface, voice or text, that enables people to interact with computer programs (possibly human or Al aided) to get something done. Chatbot dictionary here.

1. Text-based Bots: A type of chatbot where the primary mode of communication is texting(i.e. chatting). Yes, this may involve media like images and videos, and UI elements like Quick Replies, Carousels, and more as well.

2. Voicebot: A type of chatbot where the primary mode of communication is voice.
Examples: Amazon Echo’s interface. For a detailed primer on the difference between various voice chatbots, read the story of our first voice chatbot.

Basically, chatbots are an umbrella term whereas Voicebots, Messenger bots, Virtual Assistant bots are all variants based on the platform and medium used.


Define the User Experience Before Deciding the Medium

If you’re considering a bot, it’s important to consider which user experience works best for your use case – voice, chat or both (hybrid).

Understanding your user-experience is key to deciding the medium. To understand how the user-experience defines the products, let’s take a look at an example.

The Example – Food Ordering Chatbot:

Using your voice to order food seems very convenient and natural. However, if you dig in deeper, you realize that it’s only convenient as long as you’re repeating a past order. You simply ask Alexa to open the app (for example Dominos) and say “repeat past order”.

The Problem:

If you need to look at (or hear in this case) the menu and build an order, a voice interface is quite ineffective.

First, it’s very cumbersome to listen to a menu instead of glancing through it. To test it out, ask a friend to read out a restaurant’s whole menu to you. Then try to remember which dish you’d like to order. You’ll find that once your friend went past 10 dishes, you would’ve already forgotten the first few ones. This makes it impossible to have a large menu on voice.

Second, voice recognition (today in 2018) does not understand a lot of food names – especially ethnic food names. Imagine being in a loud bar and the waiter not being able to understand what you like to order. Or, he feels like he has understood the order correctly but then brings in a dish whose name rhymes with the one you were trying to order.


Say goodbye to survey forms by using Conversational AI as your Feedback mechanism. Take a look at our all-new Feedback Bot now 🚀


On the other hand – if you send the images of a restaurant’s menu over chat, the user can take their time to weigh their options, and pick a dish more comfortably.

And looking again at the repeat past order scenario, it’s easier to ask Alexa to open the Dominos app and repeat past orders from a corner of your room; versus unlocking your phone, opening the chatbot app and then typing in “repeat past order”.

"Picking a medium that fits your bot’s use case is a lot about understanding its strengths and weaknesses."

But before you get to that, here are the few basic questions to answer:

1. Who are your users?

2. What’s their goal?

3. What’s your business’ goal?

4. Where do these goals overlap?


3 Guiding Principles to Consider When Picking Between Voicebots or Text-Based Bots

1. Volume of Information Transmission – Glancing>Listening>Reading

In terms of a medium’s ability to transmit information to a user: Looking/Glancing is the fastest, Listening is slightly slower, whereas Reading is the slowest way of communicating information.


That is, Glancing (or a visual interface or video or image) can transmit the highest amount of information in the shortest amount of time; Voice would be next. And, Reading after that!

While comparing voice and text, voice can convey more information in a shorter attention span of the user. This means, when writing copy, you can use longer sentences than you can in chat, and you still won’t lose the user’s attention.

2. Information to Be Conveyed – Text, Images or Other Media

Voice can only convey sonic information. Chat on the other hand can convey images, videos, text and user interface elements like quick replies and carousel.

A use case where a chatbot would work better than a voicebot would be shopping. Chat would allow the user to look at multiple images and compare products.

A use case where voice would work better than chat would be a music experience while driving. A voicebot would allow the user to choose and skip songs without having to take their eyes of the road.

3. User Journey – Linear or Non-Linear

Voicebots are a great choice if your user journey is fairly linear and Text-based bots work great if there many potential paths a user can take.

When there are a lot of options – like choosing from a menu, comparing different options, chat is a much better UX than listening to something. Since the Quick Replies and carousels are on the screen, and you don’t have to mentally remember which options are available.

On the other hand, if the user journey is linear, it’s easier to pick an option and keep moving in a Voice chatbot. For example, in a bot that helps users fix an appliance: With a voice interface, the user can be hands-free, and the bot can give out more information per step, compared to chat. The user can also simply say next to move on, versus picking up his phone and typing it at every step.




How Writing Differs For Voicebots and Text-Based Bots

We use different vocabulary when we speak to each other vs when we write to each other. Similarly, the copy for Voicebots will differ from Text-based bots. Sentences in Voicebots can also be longer because listening is easier and faster than reading.

Most of the verbal tone in a chatbot is conveyed by the choice of words. In a voice chatbot, a good chunk of the tone is added by modulation. As a bot builder, you need to keep prototyping with the platform you’re building, experimenting by adding commas and periods, to find the right tone for your sentence.


Finally, Which Bot Should You Pick: Voice or Text?

Like we mentioned at the beginning, interface user experience is king.

1. Figure where your user goals and business goals interlap – that’s the sweet spot which tells you what user experience to build.

2. Once you know what user experience to build, learn about the context of the user when they’re using your product, in simple words, see how, why and where your user is using your product. Then, finally, pick the interface that works best for them.

This post has been penned by Jagrat, UX Designer at Haptik.



Considering a bot for your business or want to know about implementing a voicebot?