Building a voicebot

Piotr Kempa, creator of Primebot, is interviewed by Karolina Kania, project manager at Voice Contact Center.

Karolina Kania: What exactly is a voicebot? What does it consist of?

Piotr Kempa: A bot is a computer program that replaces a human being in selected activities. Technically, a bot is first of all an electronic “brain”, which, when appropriately taught by a human, is able to conduct a conversation, correctly understand and interpret the interlocutor’s statements, and appropriately direct the dialogue. This brain is also sometimes referred to as the engine. In order to communicate with the outside world it must be equipped with a number of interfaces. In the case of telephone voicebots we are talking about an interface to the world of telephone, a speech recognition module and a speech synthesis module. We should also not forget about the very important interface to the client system, which will allow the bot to retrieve, save data, provide information and generally handle business processes.

KK: So the bot, depending on the process it will handle, will each time have a slightly different form built on the same “engine”. This “engine” in Primebot is your own program. Are there different types of engines on the market dedicated to specific processes?

PK: Engines can be divided into two types because of the way they are taught. The first, classic and slightly older one is called “rule-based”. It is based on the fact that we create a set of rules for the bot, which are then matched to human speech. Based on the matching rules, the bot draws conclusions about what it heard. The second type is neural, based, as the name suggests, on neural networks and machine learning techniques.
In our experience, hybrid models – combining good features of rule-based and neural technologies – work well. Such engines are currently used in the bots provided by VCC.

KK: What speech synthesisers are available on the Polish market and which ones would you recommend for use with voicebots?

PK: In Poland we mainly use three providers. Two of them are leading providers of cloud computing services. The third one provides on-premise or locally installed solutions. It is worth having such solutions in your offer when cloud computing is not an option for the customer. All three services are perfectly suited for use in bots, and PRIMEBOT is already integrated with them. Also, our bot can, with certain “precautions”, be switched to a different voice or a different speech synthesis service practically at any time. In addition, for some time now, speech synthesis services have distinguished between classical and neural models. The latter are based on the latest advances in neural network technologies. In Poland, currently only one, probably the largest provider makes both classical and neural voices available (called there WaveNet). As far as we know, also other providers are working on new neural models of voices in Polish. Neural voices usually sound more natural, simply providing better quality of synthesis. This usually makes them a bit more expensive.

KK: As you mentioned you can also use voiceover recordings but is it worthwhile and effective?

PK: Of course we can use voiceover and in some cases it is a very good solution. Whether it is profitable depends on many factors, one of the most important of which is how much you will pay for the voiceover. In the case of voiceover recordings, it should be remembered that variable utterances may pose a problem. If we are to read out a number or even a sequence of numbers and letters (e.g. a registration number), then synthesis in such a case will often sound simply smoother. Additionally, some information simply cannot be read out with a voiceover, data which is unique to a given case or client, such as an email address or name.

KK: Piotr, what other elements build a bot?

PK: With the exception of the bot basics mentioned above, it is important to note that the bot platform should provide a number of other elements necessary to work. To list briefly: 1. a reporting interface, showing us call records, billings, statistical analysis and so on. 2. a dialer that handles outbound campaigns. 3. an interface to the customer’s system that supports writing and reading data in various formats and standards. 4) In some cases, a graphical bot interface that allows the client to create bots themselves is also useful.