OpenAI has begun the rollout of ChatGPT’s Advanced Voice Mode, introducing users to GPT-4o’s hyper-realistic audio responses. The alpha version is now available to a select group of ChatGPT Plus users, with OpenAI planning to gradually extend the feature to all Plus users by the fall of 2024.
When OpenAI initially revealed GPT-4o’s voice in May, it amazed audiences with its fast responses and striking similarity to a real human voice, specifically resembling that of actress Scarlett Johansson, known for her role as an artificial assistant in the movie “Her.” Following the demo, Johansson stated she declined requests from CEO Sam Altman to use her voice and sought legal representation after witnessing GPT-4o’s capabilities. Despite OpenAI denying the use of Johansson’s voice and subsequently removing it from the demo, the release of Advanced Voice Mode was postponed in June for safety enhancements.
Now, after a month of anticipation, OpenAI has brought forth the alpha version, excluding video and screensharing functionalities from the initial release. Although the hyper-realistic voice feature showcased in the Spring Update remains strictly experiential, premium users can now access ChatGPT’s voice component.
ChatGPT can now talk and listen
Advanced Voice Mode sets itself apart from ChatGPT’s existing Voice Mode by utilizing GPT-4o’s multimodal capabilities to eliminate the need for multiple auxiliary models. This results in lower latency conversations, with the ability to detect emotional nuances in a user’s voice, such as sadness, excitement, or even singing.
During this pilot phase, ChatGPT Plus users will experience the true-to-life nature of OpenAI’s Advanced Voice Mode. While TechCrunch has yet to test the feature, evaluations will be conducted once the access is granted.
OpenAI is introducing the new voice feature gradually to monitor its usage closely. Participants in the alpha group will receive notifications within the ChatGPT app, followed by detailed instructions via email.
As part of stringent safety measures, OpenAI engaged over 100 external red teamers, proficient in 45 different languages, to assess GPT-4o’s voice capabilities. A comprehensive report on these safety tests is expected in early August.
Furthermore, Advanced Voice Mode will be restricted to ChatGPT’s four preset voices – Juniper, Breeze, Cove, and Ember – developed in collaboration with professional voice actors. The Sky voice, featured in the May demo, has been discontinued. OpenAI emphasizes that ChatGPT will not imitate the voices of individuals or public figures, enforcing restrictions to prevent misuse.
To prevent controversies related to deepfake technology, OpenAI has implemented new filters to prevent the generation of music or copyrighted audio content. Previous instances of AI companies facing legal repercussions for copyright violations have prompted the inclusion of these safeguards, given that AI models like GPT-4o introduce new avenues for potential infringement complaints.