scorecardresearch
Follow Us:
Wednesday, July 06, 2022

Google develops human-like text-to-speech AI system, Tacotron 2

In a major step towards its ‘AI first’ dream, Google has developed a text-to-speech artificial intelligence (AI) system that will confuse you with its human-like articulation. The tech giant’s text-to-speech system called ‘Tacotron 2’ delivers an AI-generated computer speech that almost matches with the voice of humans, technology news website Inc.com reported. At Google I/O […]

By: IANS | San Francisco |
January 1, 2018 6:44:04 pm
Google, artificial intelligence, text-to-speech system, Google Tacotron 2 AI, human speech, CEO Sundar Pichai, Google AI first, Google Lens, Google Assistant, Smart Reply for Gmail, WaveNet algorithm, mean opinion score The tech giant’s text-to-speech system called ‘Tacotron 2’ delivers an AI-generated computer speech that almost matches with the voice of humans. (Image Source: Google)

In a major step towards its ‘AI first’ dream, Google has developed a text-to-speech artificial intelligence (AI) system that will confuse you with its human-like articulation. The tech giant’s text-to-speech system called ‘Tacotron 2’ delivers an AI-generated computer speech that almost matches with the voice of humans, technology news website Inc.com reported.

At Google I/O 2017 developers conference, company’s Indian-origin CEO Sundar Pichai announced that the internet giant was shifting its focus from mobile-first to ‘AI first’ and launched several products and features, including Google Lens, Smart Reply for Gmail and Google Assistant for iPhone.

According to a paper published in arXiv.org, the system first creates a spectrogram of the text, a visual representation of how the speech should sound. That image is put through Google’s existing WaveNet algorithm, which uses the image and brings AI closer than ever to indiscernibly mimicking human speech. The algorithm can easily learn different voices and even generates artificial breaths.

“Our model achieves a mean opinion score (MOS) of 4.53 comparable to a MOS of 4.58 for professionally recorded speech,” the researchers were quoted as saying. On the basis of its audio samples, Google claimed that ‘Tacotron 2’ can detect from context the difference between the noun ‘desert’ and the verb ‘desert,’ as well as the noun ‘present’ and the verb ‘present,’ and alter its pronunciation accordingly.

Best of Express Premium
Funding winter sets in for Indian startups, staff out in the cold: Over 1...Premium
New worry in J&K: Officers say militancy entering a ‘secretive, dange...Premium
Cell therapy cancer centre takes shape in Bengaluru, trials are onPremium
Explained: Why monsoon is expected to pick up in JulyPremium

It can place emphasis on capitalised words and apply the proper inflection when asking a question rather than making a statement, the company said in the paper. Meanwhile, Google’s engineers did not reveal much information but they left a big clue for developers to figure out how far they have come in developing this system. According to the report, each of the ‘.wav’ file samples has a filename containing either the term ‘gen’ or ‘gt.’

Based on the paper, it’s highly probable that ‘gen’ indicates speech generated by Tacotron 2 and ‘gt’ is real human speech. (“GT” likely stands for “ground truth,” a machine learning term that basically means “the real deal”.)

Express Explained Go beyond the news. Understand the headlines with our Explained stories

📣 Join our Telegram channel (The Indian Express) for the latest news and updates

For all the latest Technology News, download Indian Express App.

  • Newsguard
  • The Indian Express website has been rated GREEN for its credibility and trustworthiness by Newsguard, a global service that rates news sources for their journalistic standards.
  • Newsguard
Advertisement
Advertisement
Advertisement
Advertisement