Can your computer hear you?
For much of computing history, computers have lagged far behind humans in natural language processing. This is because natural language processing is far from an easy task! The first step is to isolate what you hear: what are parts of speech, and what is background noise? Next, your brain must segregate these noises into particular phonemes, or bits of sound that make up a language. Then you have to form words and phrases, and, finally, interpret the meaning of them – often ambiguous or very dependent on context.
So, can you speak? If so, give yourself a huge pat on the back. When you dig deep down into it, articulating speech is more difficult than calculus!
For a long time – decades, even – this was one huge stumbling block to any AI system that was supposed to interact with humans. Because, as humans, we speak. Speech is the way that we most naturally transfer and understand information. For computers, the most natural way is through clear, unambiguous binary signals: zeroes and ones, popularly.
But times are changing. Advances in machine learning are making it possible for computers to recognise and comprehend speech – much as you did when you were a child. Already we have dictation software that can not just correctly map sounds to words, but generate text for it that’s often context-dependent. (As is the case with homonyms; “sew” and “so” sound exactly the same, and only context tells us to use one or the other.)
In addition, everyone’s phone now contains an engine for speech recognition. Have you ever walked down the street and come across someone dictating into their smartphone? Years ago – even a few years ago – this absolutely would not have been possible.
Lastly, take common services like Alexa and Siri. These don’t only map speech into words, but process it, too. Tell Siri which song to play, and it plays that song. Something that would have taken a human, unaided by computers, far longer. A human would have to know where to find the song, locate the album, insert the disc, and navigate to the right song – which could take ages if it was on a cassette tape!
Maybe the next time your smartphone’s dictation service writes “toilet” instead of “to let,” don’t throw the device on the ground. Marvel, instead, at the 99% of the words that it correctly understood. The day of the Jetsons and their robo-maids may still be in the future; but it’s not nearly as far away as you might think!