Voice Generation model is currently under preview with limited access (Image credit: OpenAI)OpenAI has announced a new ‘Voice Generation’ text-to-audio generative AI model, which can fully replicate any voice with just a 15-second audio sample. Currently available for limited users, the latest model from OpenAI is accessible to select international partners across segments such as governments, media, entertainment, education, and more.
We're sharing our learnings from a small-scale preview of Voice Engine, a model which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. https://t.co/yLsfGaVtrZ
— OpenAI (@OpenAI) March 29, 2024
OpenAI’s text-to-voice generative AI model is said to have various real-world applications, including providing reading assistance, content translation, audio generation, reaching global communities, supporting people who are non-verbal, helping patients recover their voices, and more.
In an official blog post, OpenAI wrote, “Today we are sharing preliminary insights and results from a small-scale preview of a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. It is notable that a small model with a single 15-second sample can create emotive and realistic voices.”
OpenAI highlights that Voice Generation is a small model, first developed in 2022, and made available for select users via text-to-speech API, ChatGPT Voice, and Read Aloud. To prevent misuse, the company is said to be taking a “cautious and informed approach to a broader release.” OpenAI has also shared some samples generated using the Voice Generation model.
Before the official rollout to the general public, OpenAI is examining various aspects, including policies to protect individual voices in AI, educating the public on understanding the capabilities and limitations of AI, and adopting technologies that could help users distinguish between real and AI-generated voices.