Community
Popularity of voice bot frameworks using Natural Language Processing (NLP) and Artificial Intelligence (AI) is on the rise today. Even blockchain can’t compete with Alexa and Google Home anymore, in terms of the attention they are getting. They are in the press, on the radio and TV shows, basically everywhere. Techies and corporate executives, a while ago, quickly and rightfully recognized the potential that voice based user interaction can offer and have innovation teams in place, actively exploring and experimenting with the underlying technologies.
For example, TD Bank was the one of the very early adopters and released its Alexa skill back in November 2017, offering set of information mainly voice capabilities, like
Beyond these informational brochure-ware type skills, next innovation frontier could be in the field of sophisticated call centar automation and voice enabled banking services using voice assistants from the comfort of customers’ homes.
High Level Architecture
The Amazon’s Alexa, Google’s Google Home, Microsoft’s Cortana and Apple’s Home Pod are the best-known NLP voice assistant devices today. They all follow fairly similar basic architectures, as shown on the picture below, with:
Figure 1 - Generic Smart Speaker Framework Architecture
The Challenges Slowing Down Serious Innovation
Although all the mainstream voice bot frameworks look similar at the architectural level (described above), they are also fairly different, in terms of their basic features, like:
These platform differences significantly impede and limit ability of corporate innovators to offer voice bot skills which could potentially run on any device. Today, if corporation wants to enable voice interaction to as many of their customers, which may be using variety of smart speakers, ‘skill’ developers have to plan or separate development projects (or teams) to develop/port/test identical skills on specific devices that need to be supported. Alternative is to bet on one device and ignore the others. That’s all potentially very risky, limiting, inefficient and error prone.
There are 3rd party attempts like Jovo Framework that provide, as much as possible, platform independent development environment for Alexa and Google Home skills. Jovo seems very interesting as a framework and is worth playing and experimenting with (yes we are currently evaluating it). It offers decent abstraction layer for consolidation of:
Jovo is not ideal and may not be the answer though. Not all of the devices are supported by it … currently only Alexa and Google Home are supported (although these are ones that probably matter the most at the moment). Questions also arise about Jovo’s ability to efficiently keep up with latest developments of the supported underlying platforms, and its roadmap plans for including support for Microsoft and Apple devices. But with all of the existing fragmentation, something like Jovo could be your best shot at the moment, especially if you are looking at as much platform independence as you can get.
Time For Standard Voice Browsers Maybe?
In my opinion, instead of trying to address the current lack of standardization in voice assistant development space through 3rd party abstraction layers like Jovo, the better approach could be for device vendors to work together, potentially under W3C umbrella, in order to come up with the standard ‘voice conversation markup language’ (which could be a next generation of the already existing VoiceXML standard, with new additions, upgrades and contributions from Google, Amazon, Microsoft and Apple). Such voice conversation markup language would further be supported by standardized ‘Voice Browser’ execution environment, with built-in voice conversation markup language parser, content interpreter and standard conversation navigation manager ... implemented on top of each vendor's proprietary NLP services.
In a nutshell, it would be really great if developers developing voice skills, could describe:
in a platform independent manner, though the set of standard XML (HTML-like) tags, set of associated parameters (attributes) and ability to embed and/or address custom JavaScript code segments that would all be interpreted by a standardized Voice Browser layer. This is not new pattern, but one that very much emulates the same approach that exists for modern web browsing.
Compliant Voice Browser (like web browsers today) would provide standard, out of the box, standard set of voice conversation navigation commands like START, BACK, REPEAT, STOP, CALL, etc. with developers able to provide extensions and set of value-added voice conversations described in a highly portable voice conversation markup language and JavaScript code segments for custom intent handling.
Such standard voice browser environment and voice conversation markup language would likely enable significantly higher levels of adoption and penetration of voice enabled services by customers and ability for a lot more scalable and reusable code development by corporate developers. Everybody will benefit.
Let’s hope Amazon, Google, Microsoft and Apple can come together and start working on next generation of VoiceXML standard and supporting it in their next generation devices. Voice assistant development would be significantly simplified and in much better shape than what we have today, with existing fragmentation and 'voice assistant battles'.
I feel that even if one of the major voice assistants potentially takes this route, developer community would love it and embrace it. Others will likely have to follow then, as it happened in the world of web browser standardization and HTML
This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.
David Smith Information Analyst at ManpowerGroup
20 November
Konstantin Rabin Head of Marketing at Kontomatik
19 November
Ruoyu Xie Marketing Manager at Grand Compliance
Seth Perlman Global Head of Product at i2c Inc.
18 November
Welcome to Finextra. We use cookies to help us to deliver our services. You may change your preferences at our Cookie Centre.
Please read our Privacy Policy.