When I first saw the new iPhone ads that featured voice interaction all I thought was WOW, Apple have done it, they have mastered voice interaction. What appeared to be natural voice interaction is the nirvana many people have been waiting for to replace point and click interfaces.
Speech recognition is not new and certainly both Microsoft and Android had speech driven interfaces before Apple. However it was the ability to talk naturally without breaks and without having to use specific key words that seemed to set Apple apart from the competition.
Instead we were all fooled by another Jobs skill, his “reality distortion field”. Indeed one person went as far as to sue Apple for selling something that did not perform as advertised.
What exactly is the difference? Well both Windows and Android recognise specific commands and actions rather like just “talking the menu” e.g. Saying “File, Open, text.doc”. The Apple promise was that you could simply talk as you would normally speak e.g. “File, Open, text.doc” would be the same from Apple if you said “get me text.doc”.
So why aren’t we there? The challenge is creating a dictionary that can understand the synonyms and colloquialisms that people may use in conversational speech as opposed to the very specific commands used in menu’s and buttons in graphical user interfaces.
Whilst this may seem like a daunting task, I believe the first step to solve this puzzle is to reduce the problem. So rather than creating this super dictionary for absolutely any application, dictionaries should be created for specific types of applications e.g. word processing or banking. This way the focus on creating synonyms and finding colloquialisms and linking them is more manageable.
The next step would be to garner the help of the user community to build the dictionary, so that as words are identified, the user is asked what alternatives could be used so that synonyms/ colloquialisms are captured.
I know this is not a detailed specification but this is an approach that Apple might use to give us what they promised – and what the world is waiting for … speech driven user interfaces.