OpenAI is enhancing voice cloning technology in the face of increasing deepfakes while emphasizing responsible usage.
Introducing OpenAI’s Voice Engine, a new extension of their text-to-speech API being previewed today. This technology allows users to create synthetic voices based on a 15-second voice sample. Although not available to the public yet, OpenAI is taking the time to ensure responsible deployment and address potential misuse.
Jeff Harris, a member of OpenAI’s product staff, states, “We are focused on understanding the risks associated with this technology and implementing safeguards to mitigate those risks.”
Model Development
The underlying AI model driving Voice Engine has been in use for some time. It powers features in ChatGPT and Spotify, among others. While the training data source remains undisclosed, OpenAI emphasizes a combination of licensed and public data for training.
OpenAI faces legal challenges over alleged copyright infringement related to training its AI models with copyrighted content. The company has licensing agreements with certain providers and allows artists to remove their work from training datasets. It believes fair use protects its practices.
Voice Synthesis
Surprisingly, Voice Engine does not fine-tune on individual user data. Instead, it uses a combination of a diffusion process and transformer to generate speech based on a small audio sample and text input. OpenAI claims its approach results in higher-quality speech compared to competitors.
Although pricing details are not currently available, Voice Engine offers competitive rates for synthetic voice generation, with no customization controls available at the moment.
Voice Talent Perspective
OpenAI’s technology could potentially disrupt the voice actor industry by providing cost-effective synthetic voice solutions. While some platforms attempt to balance voice actor rights and AI integration, OpenAI focuses on responsible usage and requires explicit consent for voice cloning.
Ethical Considerations
Concerns around misuse of voice cloning technology, including generating deepfakes for malicious purposes, are real. OpenAI is taking proactive steps to prevent misuse by watermarking cloned voices and limiting access to a select group of developers initially. The company remains committed to releasing the technology safely.
Future Plans
Pending the pilot phase, OpenAI may expand the availability of Voice Engine. The company is exploring security measures like user verification through text reading to ensure responsible usage. Continuous learning and mitigation of safety risks remain top priorities for OpenAI.