Apple iPhone App Developers are all waiting with baited breath for Apple to allow open access to their SIRI engine, Apple’s engine for understanding spoken communication. As of the last time I checked (about 5 minutes ago), Apple does not allow Apps to start SIRI – so users can choose to say fill a text field with SIRI, but the app cannot start SIRI on behalf of the user.
Android has a voice recognition window which an app can open – but it is not really under the control of the app. The Android app can request that the Android phone or tablet present a window which accepts voice, then when the user indicates they have finished speaking, Android passes control back to the app, which can analyse the result of the voice recognition effort.
Neither of these options is “natural” – both the iPhone and Android option are in my opinion clunky, they require the user to take positive action to restore control back to the app.
If your iPhone app or Android app needs hands free voice control, the app needs to be able to initiate voice recognition, detect when someone is speaking, and process the voice to determine what was said, independent of whether the user presses a button.
Thankfully, third party service providers have filled this gap.
My favourite is Dragon Mobile. Dragon Mobile SDK is provided by the same company which publishes Dragon Speech, the legendary desktop PC speech recognition application, Nuance. Nuance has been in the game for over a decade – from somewhat humble beginnings, their product has developed into a sophisticated and reliable speech recognition system, a remarkable achievement in artificial intelligence.
The only downside of Dragon is it relies on an Internet connection. The processing power required to recognise and interpret normal spoken sentences is far greater than an Android App or iPhone app can deploy, so Dragon SDK ships the compressed sound files via the Internet to their servers.
What is the Internet bandwidth is an issue? There are still options. An Android App or iPhone app does not have the processing power to interpret any arbitrary spoken sentence, but it does have the processing power to recognise individual words. So if your iPhone app or Android app only has to recognise a handful of words, such as “yes”, “no”, and “maybe”, then the processing power for this simplified task can be comfortably accommodated without an internet connection to an external server.
If you are interested in Android Apps or iPhone Apps which can recognise speech, or which can read text from images (optical character recognition), please email firstname.lastname@example.org, to discuss your requirements.