One Year Later: OpenAI’s Voice Cloning Tool Still Not Released – What’s Holding It Back?
In late March 2023, OpenAI introduced a Voice Engine, a revolutionary AI service designed to clone human voices with just 15 seconds of speech. Nearly a year later, this innovative tool is still in its preview phase, with OpenAI remaining tight-lipped about its official launch timeline and the potential for a broader release.
Concerns Over Regulation and Misuse
OpenAI’s hesitance to fully roll out the Voice Engine may stem from concerns about its misuse and the desire to avoid regulatory scrutiny. Historically, the company has faced criticism for prioritizing flashy products over safety and for hastily releasing tools to outpace competitors.
Testing with Trusted Partners
An OpenAI spokesperson shared with TechCrunch that the company is actively testing the Voice Engine with a select group of “trusted partners.” The spokesperson noted:
“We’re learning from how our partners are using the technology so we can improve the model’s usefulness and safety.”
The Voice Engine is already being utilized for various applications, including:
- Speech therapy
- Language learning
- Customer support
- Video game characters
- AI avatars
Understanding Voice Engine’s Capabilities
The Voice Engine, which also supports OpenAI’s text-to-speech API and ChatGPT’s Voice Mode, generates lifelike speech that closely mirrors the original speaker’s voice. This tool converts written text into speech, albeit with certain content restrictions. However, its development has faced several delays and shifting timelines.
Technical Insights and Release Plans
According to a June 2024 blog post from OpenAI, the Voice Engine model predicts the most likely sounds a speaker will produce for a given text, considering various voices, accents, and speaking styles. This allows the model to create not only spoken text but also “spoken utterances” that reflect different speaking styles.
Initially, OpenAI planned to introduce Voice Engine, originally branded as Custom Voices, through its API on March 7, 2024. The strategy involved granting access to a select group of 100 developers focused on socially beneficial or innovative applications. Pricing details were even set, at:
- $15 per million characters for standard voices
- $30 per million characters for HD quality voices
Delays and Future Prospects
Despite these plans, OpenAI postponed the announcement at the last minute, ultimately unveiling Voice Engine weeks later without a sign-up option. Access remains limited to about ten developers who began collaborating with OpenAI in late 2023.
OpenAI expressed a commitment to discussing the responsible use of synthetic voices and how society can adapt to these advancements. The company stated:
“Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”
Real-World Applications and Challenges
The Voice Engine has been in development since 2022, and OpenAI has demoed the tool to high-level global policymakers to showcase its potential and associated risks. Among the partners currently using Voice Engine is Livox, a startup focused on enabling better communication for individuals with disabilities. CEO Carlos Pereira highlighted the tool’s impressive quality and multilingual capabilities, although he noted challenges related to its online requirements.
Safety Measures and Regulatory Considerations
OpenAI has indicated that the Voice Engine incorporates various safety measures to prevent misuse. These include:
- Watermarking to trace the origin of generated audio
- Requirements for developers to obtain explicit consent from original speakers
- Mandates for clear disclosures indicating that voices are AI-generated
However, OpenAI has not provided details on how these policies will be enforced, which could be challenging at scale.
Addressing Growing Concerns
With AI voice cloning becoming a significant concern in security and privacy, effective filtering and identity verification are crucial for responsible technology releases. Reports indicate that AI voice cloning was the third fastest-growing scam in 2024, raising alarms about its potential for fraud and misuse.
As for the future of Voice Engine, OpenAI’s release timeline remains uncertain. Although the company is considering the implications of a broader launch, the prolonged preview phase has become one of the longest in its history, with safety and optics being key factors in its decision-making process.