Preparing for the Next Round of Speech

Imagine you are flying down the highway and need to check your e-mail. The Blackberry isn’t practical right now and your West coast office doesn’t open for another two hours.

Instead, you pick up your headset, voice dial your office on your cell phone, speak a few commands and have your e-mail read back to you.

While you are there, you go ahead and forward those order confirmation messages to shipping. You also schedule a meeting for next week with engineering. If the engineer is not available at your preferred time, then you automatically find a time slot when you are both free.

The time when IT departments can provide this speech-based functionality is near. A convergence of technological advances, coupled with the elimination of major industry stumbling blocks, is paving the way for this vision to become a reality.

The Framework First

But to deploy speech applications, enterprises need to put into place a state-of-the-art speech platform with capabilities to function as an application server, a speech software resource, and a network connectivity hub.

For example, speech applications can be developed and configured to accept data feeds from a Microsoft CRM server, a hosted service like Salesforce.com or applications like Siebel or SAP.

The speech application should be deployed on a voice platform with easy-to-use development tools, advanced speech recognition and text-to-speech capabilities, and, last but not least, robust management and business intelligence functions.

Media and control connectivity needs should also be considered in order to connect the speech platform to a VoIP network or public switch telephone network (PSTN). It is also crucial to invest in the open standards-based technologies that make sense to ensure successful post installation management and support.

Technology giants like Microsoft are helping enterprises that want to deploy speech applications. In 2004, Microsoft unveiled the Microsoft Speech Server 2004, a single platform that combines Web technologies, speech-processing services and telephony capabilities, that allow enterprises to develop and deploy customized voice applications.

Platforms like Microsoft’s use open standards such as speech application language tags (SALT) that allow developers to easily build and deploy custom applications on the platform.

Additionally, the rise of off-the-shelf voice applications are eliminating programming time and providing organizations with solution that exposes speech technologies to a wider audience, namely small- and medium-sized businesses (SMBs).

Elimination of Roadblocks

One of the major roadblocks for the “virtual office” has been the lack of an open, non-proprietary standard upon which to build, operate and maintain speech-enabled applications.

But this impediment is history thanks to the development of open standards that facilitate the deployment of speech applications in the enterprise. IT departments now have access to the tools to specially design speech systems to meet their enterprise’s own business objectives.

These open standards include the Microsoft-supported SALT, which are a lightweight set of extensions to existing markup languages — in particular HTML and XHTML — that enable multi-modal and telephony access to information, applications and Web services from PCs, telephones, tablet PCs and wireless personal digital assistants (PDAs).

Another widely-used standard, voice extensible mark-up language (VXML), is a language that allows conversational access to Web-based information and services, and enables distributed applications by building on open Internet standards.

Choosing a standard for the basis of your speech system largely depends on which existing systems the IT manager is willing to utilize, which in turn mandates the standards employed.

Most IT managers will want to leverage as many existing applications and desktop investments as possible in order to keep cost and complexity levels low. If the enterprise has made significant .NET infrastructure investments, such as a Microsoft CRM system, Microsoft Outlook and an Exchange server, then it’s easiest to connect using SALT.

Enterprises that have chosen a J2EE based approach to Web services delivery may find that leveraging a VXML-based speech framework provides the enterprise with a more suitable investment option.

But the work in deploying a speech system does not lie solely in the back-end.

Making Voice Stick

Enterprises need to think about their voice/user interface (VUI) strategy, which will affect whether users have a “sticky” voice experience. IT managers need to work with the business managers and the mobile workforce to identify business process and design objectives, and then apply the VUI to meet these needs.

IT managers also need to consider how often the data is going to be updated via the voice system, and ensure that the updates are synchronized into the single or sometimes disparate database systems in a timely manner.

This is especially true at the contact center level where computer telephony integration (CTI) software is necessary to ensure a consistent view of the customer experience at the agent desktop, voice application and mobile work force device.

Although we’ve come a long way, to streamline the process of adopting speech applications, enterprises need to adopt a more Web-centric architecture that ties the delivery of data from the Web to the speech architecture.

The Market

As open platforms and packaged applications are working to drive down the cost of speech systems, more inroads can be made that make speech truly affordable for all enterprises.

To improve user experience, speech technology providers must continue to put resources towards improving user interface design. Gathering and analyzing business intelligence data is also important.

Once an IT manager has implemented a speech system, he or she must constantly reevaluate the end user experience, whether business goals are being met and if the system’s capabilities are being leveraged to their fullest.

IT managers also need to consider customer service and support as part of their overall budget for the speech system to help them overcome future roadblocks. Based on the progress made so far, the speech technology industry’s next steps can only lead us closer to the vision of a truly cost effective, robust virtual enterprise.

Mike Segura is director of Microsoft Products and Strategy for Intervoice, a provider of enterprise voice automation solutions. He has managed business units within the telecommunications, energy and consulting industries for 15 years.