Voicebot’s laughter

When we think about a bot, we might imagine a metallic device with a humanoid shape that does the work for us. However, a bot is simply a computer program, most often a set of programs, sometimes called modules. These programs are installed on a server, in a cloud or in a server room and work closely together to create the illusion of a single, coherent and intelligent organism such as a telephone caller. Voicebots can behave like human beings, although they are built to help people, rather than replace them totally.

 

The word bot is short for “robot” which comes from the Slavic word “robota”, which means hard work and effort. This is exactly what we can expect from such a bot: that it will do the hard work for us. So, the bot (or robot) is most often understood as an automatic system, performing repetitive, tedious work. There are two types of bots in the field of customer service automation: text bots, called chatbots, which talk for us in the so-called chat channel (Messenger, Whatsapp, service chat on the website, etc.). So called voicebots will talk for us in an audio channel, usually over the phone.

The most important of these programs or modules in case of voicebots are the following functions: speech recognition, interpretation of intentions, dialogue, data exchange, speech synthesis, telephone interface. I will briefly describe what those modules do.

The telephone interface module allows us to send a stream of telephone sound flowing to and from our caller’s phone to our server, where it is further processed. In the simplest case it connects to our telephone exchange or to the exchange of our supplier. For this purpose we use the standard SIP protocol used in this situation (although it is possible to use a number of other, less popular protocols).

Once we have this sound stream in our hands (to be more precise, it is simply a stream of bytes representing the sound flowing through the telephone line), we can process it. So we send it to the speech recognition module, which will allow us to convert this sound into ordinary human speech text.

Once we have the text of the speech, we can start to analyze it and draw out the intentions of the human being contained in it. The word “intention” itself is, contrary to appearances, a technical term often used in the industry. It is a generally understood meaningful part of human speech. A single statement may contain many intentions, and the task of the intention interpretation module is to properly extract these intentions from a human being’s statement.

For example, a person can answer the question “of course, great, no problem” and we would simply like to be informed that the person agrees, instead of all these words. Or when asked to evaluate the contact with a consultant on a scale of 1 to 5, the interlocutor tells us “in total, it seems to me that I can give a maximum of four”. – And we are interested in the fact that a) we have the answer to a given question and b) we can record the number four. And these are the intentions.

Once we have the intentions, we have to decide what the bot’s next statement will be. It may be the next question to the interlocutor or the answer to their question. Very often it is dependent on the intentions we have already expressed. A dialogue module is used to guide the conversation accordingly.

This module often leads the conversation in a non-linear way, i.e. not every conducted conversation will have the same course. The bot’s statements will depend on the information given by a human being, data from the client’s system and a number of other minor factors. The dialogue module will prepare a selected bot’s statement, or rather its template, which still needs to be filled with data appropriate for the conversation. For example, when we confirm the date of the visit, we read the interlocutor’s data (date, time, place, name, etc.).

Data such as the date of the visit, available appointments with a given doctor, shipment status or any other information are collected from the client’s system through the data exchange module. A good data exchange module can cooperate with various software interfaces on the client’s side. It can retrieve data from the CRM system through its API. It can read data from the database. It can even use simple spreadsheet files that are sent. Such a module should be flexible enough to adapt to the client’s system in any situation, as well as to react to communication problems with the client’s systems (which unfortunately sometimes happen).

The data is most often exchanged both ways – on one hand we collect from the client the information we want to pass on to the interlocutor (e.g. shipment status, amount of debt), and on the other hand we save the data collected from a human being (e.g. digital assessment of consultant’s work, declaration of repayment of debt within a certain period of time).

Having the bot’s statement formulated, we can process it into sound using the speech synthesis module. Such a module usually allows us to choose from a number of different male or female voices, we can also influence the tempo of the bot’s speech or the height of its voice. Some speech synthesis programs will even allow us to use such “human” sounds as breathing or laughter.

Properly prepared bot’s speech in the audio for can be sent to the caller by means of a telephone module and the whole cycle starts anew. Unless this is the end of the conversation.

To sum up, the voicebots designed for customer service will help us carry out various repetitive processes, relieving our team. In addition, the work of voicebots is scalable in the so-called real time, which means that in case of sudden traffic peaks, we are able to use the bots to take care of all customers in need. Thus, we can easily increase retention, reduce first contact time and handling time, thus optimizing call center costs.

Piotr Kempa

Head of AI Division