Premium
This is an archive article published on March 31, 2024

OpenAI previews ‘Voice Generation’ AI model that can clone any voice with a 15-second audio sample

OpenAI's new text-to-speech model, Voice Engine, can mimic any voice with just a short 15 seconds audio clip.

OpenAI voice generationVoice Generation model is currently under preview with limited access (Image credit: OpenAI)

OpenAI has announced a new ‘Voice Generation’ text-to-audio generative AI model, which can fully replicate any voice with just a 15-second audio sample. Currently available for limited users, the latest model from OpenAI is accessible to select international partners across segments such as governments, media, entertainment, education, and more.

OpenAI’s text-to-voice generative AI model is said to have various real-world applications, including providing reading assistance, content translation, audio generation, reaching global communities, supporting people who are non-verbal, helping patients recover their voices, and more.

In an official blog post, OpenAI wrote, “Today we are sharing preliminary insights and results from a small-scale preview of a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. It is notable that a small model with a single 15-second sample can create emotive and realistic voices.”

OpenAI highlights that Voice Generation is a small model, first developed in 2022, and made available for select users via text-to-speech API, ChatGPT Voice, and Read Aloud. To prevent misuse, the company is said to be taking a “cautious and informed approach to a broader release.” OpenAI has also shared some samples generated using the Voice Generation model.

Before the official rollout to the general public, OpenAI is examining various aspects, including policies to protect individual voices in AI, educating the public on understanding the capabilities and limitations of AI, and adopting technologies that could help users distinguish between real and AI-generated voices.

 

Latest Comment
Post Comment
Read Comments
Advertisement
Loading Taboola...
Advertisement