Apple's Speech Technology is a Watershed, AVIOS Claims

Share Article

The Applied Voice Input Output Society (AVIOS), which holds the annual Mobile Voice Conference, commented on Apple's announcement of its Siri personal assistant as a key feature of the latest Apple iPhone, the 4S. Siri, which is both software on the phone and a service within the cloud, uses speech recognition and understanding tightly coupled with other phone services, such as scheduling reminders.

Speech recognition services on mobile phones aren't new, with speech recognition for services such as speaking search terms part of Google's Android and Microsoft's Windows Mobile phone operating systems, as well as featured in independent apps from companies such as Nuance and Vlingo. With mobile phones doing more and more and the small form factor making typing and navigation inconvenient, it has become clear that speech recognition will be a key user interface option for mobile devices. These services seem on the surface to have many of the voice-enabled features that Apple featured, so is Apple just playing catch-up? What's new?

A lot, according to Bill Meisel, editor of Speech Strategy News and co-organizer of the Mobile Voice Conference in March ( First, he notes, Apple is featuring more than speech recognition (which converts speech to a text representation)—it is highlighting speech understanding, knowing what to do with the speech content. Apple has always attempted to deliver solutions that don't require a user manual, and allowing a user to say what they want in a natural way is part of that philosophy. Siri's focus as a company before the Apple acquisition was on natural language understanding; the company's credentials as a spinout of SRI International (guess where "Siri" comes from) suggest solid core technology. The speech recognition used by Siri was from Nuance Communications, and almost certainly still is. (Nuance is widely rumored to have some licensing deal with Apple that apparently will not be announced.)

Second, the speech understanding is tightly coupled with the Apple OS and applications and services on the phone and in the network. By providing the phone and applications delivered with the phone, Apple has an advantage in making the speech assistant capable of doing what the user asks, e.g., reminding the user to do a specific task when they leave home in the morning, combining GPS and a reminder program. The speech understanding must find the most appropriate application or service to respond to a user request, so it is integrated with Web-based services as well, e.g., Wolfram Alpha.

Third, the app can be cognizant of the user preferences and user-specific information such as contact lists as a result of this tight integration. Presumably, Siri will use information on what a user tends to request and from corrections users make to improve its performance. It thus becomes over time a true "personal" assistant. Apple makes it difficult for apps from outside developers to have full access to some built-in apps and OS functions, making external apps less able to have this tight integration.

Fourth, Siri uses "conversational" speech. Many speech recognition applications today, e.g., voice search, allow the user to say one phrase, and, after a slight delay, drop them into a non-speech application, e.g., a list of web sites matching spoken search terms. There are latency issues as the speech is transmitted over the network, processed at servers, and the result returned. The examples given by Apple seem to suggest that they have reduced latency sufficiently to allow more back-and-forth interaction, although there is a big difference between a demo and in-the-field experience.

Fifth, although it wasn't specifically announced, the iPhone may have an advantage over some phones in handling speech input. Previous versions have included a chip from Audience that has advanced noise-cancelling features; this capability, if present, could allow using voice interation in environments where phones without noise-cancelling features can't.

Other companies offering speech recognition solutions for mobile devices--e.g., Microsoft, Google, Nuance, and Vlingo--have capabilities in natural language understanding, and have featured some of these capabilities in their apps. Nuance's deal with Apple, for example, while it will generate some licensing revenues, is probably motivated to get mobile phone manufacturers to preload the company's Dragon Go! app to get some of the same speech understanding features. Nuance also recently launched a mobile developer's program that will let other app developers incorporate Nuance's network-based speech recognition in their apps, a further expansion of Nuance's business. And Microsoft and Google have already built some speech recognition and understanding features into their operating systems. Expect some comparisons of which are "smartest"!

One bottom line take-away, Meisel noted, is that, given the market share of Apple's iPhone and the likely competitive response from other vendors, speech understanding is now a key and growing part of the user interface for mobile devices. This trend is the motivation for the Mobile Voice Conference, the third year of which will take place in San Francisco March 19-21, 2012.

About the Mobile Voice Conference

The Mobile Voice Conference is organized by the Applied Voice Input Output Society and Bill Meisel. It provides attendees with information to help them take advantage of the rapidly developing opportunities created by the explosion of mobile phone use, and, in particular, with the increasing role of voice interaction on mobile devices, including its implications for app development, enterprise use, and customer service. The preliminary program and sponsorship opportunities are available at

The first day of the conference is the Vendor Showcase, part of the full conference registration, but free for those attending the one day. Information on conference sponsorship opportunities and participation in the Vendor Showcase is available at the conference website.

About the Applied Voice Input Output Society

AVIOS is non-profit organization promoting the speech technology industry for over a quarter-century. For more info, see

AVIOS: Peggie Johnson, 408-323-1783, Peggie(at)avios(dot)org
TMA Associates: Bill Meisel, 818-708-0962, b.meisel(at)tmaa(dot)com


Share article on social media or email:

View article via:

Pdf Print

Contact Author

Peggie Johnson
(408) 323-1783
Email >
Visit website