Premium
This is an archive article published on March 10, 2018

How voice technology has moved beyond recognition to understanding meaning and context

TiVo’s Experience 4 software, which the company is pitching to partners in India, can understand the meaning and context of the command.

Artificial Intellugence, voice recognition, AI voice recognition, Sound Note, transcription apps, Tetra, Alexa, Otter Notes, TiVo, Nuance Voice technologies have finally crossed over the realm where they can start really impacting our lives my making some work easier and more efficient. (File photo of Amazon Echo speaker. Image source: AP)

I have always wanted technology to help me transcribe all my long interviews. Though I have been using Sound Note for a few years to record and annotate the audio clips, it just pushed me into the habit of jotting down just keywords during interviews. This meant there was more transcription staring at me.

I tried a lot of ways to find the right technology to help me solve this issue, like switching on the recordings with dictate option on the Google Docs or Apple Notes to see if it can write it down for me. These worked in bits and pieces, but there was never a permanent solution. Even as voice recognition technologies got better and better, this seemed like one area where I could still use some help.

So I was pleasantly surprised over the past couple of weeks when I discovered two apps that make use of the latest technologies to help with voice recognition and transcription. The first was an app called Tetra which lets you transcribe telephone calls. Yes, you need to make the call via the app, but it will give you a text of what transpired within a few minutes of the call. The other app, Otter Notes, was suggested by a journalist friend. This app records meetings, or other conversations, and transcribes what was said, also understanding and tagging different voices in the process.

Of course, neither of the apps are perfect. But they can do around 80 per cent of your work, which is good enough. At the moment, stability of your internet connection during the process, the amount of ambient noise during the recording all seem to be playing a role in the accuracy of the transcription. Anyway, for me, this is a clear sign that artificial intelligence can actually help do stuff more efficiently.

This week I also happened to see a demo of US entertainment technology company TiVo’s latest offerings. TiVo’s senior director for international marketing Charles Dawes showed me how their box could now understand voice. But it is no longer about just recognising voice and executing the command. TiVo’s Experience 4 software, which the company is pitching to partners in India, can understand the meaning and context of the command. For instance, using the right metadata along with machine learning powered by artificial intelligence, the box understands that when I am saying Tom Cruise it is looking for content related to the star across all sources.

Artificial Intellugence, voice recognition, AI voice recognition, Sound Note, transcription apps, Tetra, Otter Notes, TiVo, Nuance Tetra lets you transcribe telephone calls. Yes, you need to make the call via the app, but it will give you a text of what transpired within a few minutes of the call.

Then when I say Nicole Kidman, it starts looking for content where both the stars are featured instead of switching to just Kidman movies. A lot of similar context setting is now visible even in how Alexa works in India where we have a different way of expressing things in English. As luck would have it, minutes after the TiVo demo I got to meet Carrie Lazorchak and Jason Stirling of Nuance Communications Inc. Nuance has been at the forefront of voice and natural language understanding technologies for many years now.

Stirling stressed on the impact of voice technologies in a country like India where, because of illiteracy and limited reach of English, this medium gives more access to millions of new adopters of technologies like smartphones. “Language modelling is a daily game. The more we get, the better we get at it,” explains Stirling, adding how if we apply meaning extraction with natural language processing the overall system performance goes up many notches. Lazorchak chips in that the challenges of diversity posed by a country like India is why local partnerships are crucial.

Story continues below this ad

“Our technology performs best when augmented by local businesses that really understand the culture.” She is convinced that regional languages will be a huge usage space especially in the India market with higher adoption. “It will bring a lot of services into this space, especially in the rural areas.” I think voice technologies have finally crossed over the realm where they can start really impacting our lives my making some work easier and more efficient. We haven’t heard the last of this for sure.

Nandagopal Rajan writes on technology, gadgets and everything related. He has worked with the India Today Group and Hindustan Times. He is an alumnus of Calicut University and Indian Institute of Mass Communication, Dhenkanal. ... Read More

Latest Comment
Post Comment
Read Comments
Advertisement
Loading Taboola...
Advertisement