In 2019, OpenAI chose not to release the full research on GPT2 due to concerns about its potential dangers. Recently, Microsoft, OpenAI’s main financial supporter, has made a similar decision regarding its new VALL-E 2 voice synthesizer AI.
VALL-E 2 is a zero-shot text-to-speech synthesis (TTS) AI that can produce highly realistic speech based on minimal audio samples. According to the research team, VALL-E 2 outperforms previous systems in speech quality, naturalness, and speaker similarity, achieving human-like performance on various benchmarks.
This advanced AI can even handle challenging sentences with complex structures or repetitive phrasing, such as tongue twisters.
The potential applications of VALL-E 2 are vast, including aiding individuals with conditions like aphasia or ALS to communicate through a computerized voice. It can also be used in education, entertainment, journalism, chatbots, translation, and as accessibility features or interactive voice response systems like Siri. However, the team acknowledges the risk of misuse, such as spoofing voice identification or impersonating others.
As a precaution, VALL-E 2 will only be accessible for research purposes. There are no plans to integrate it into products or make it widely available to the public. Users can report any abusive or illegal use of VALL-E 2 through the Report Abuse Portal.
Microsoft is not the only company working on human-like speech synthesis. Google’s Chirp, ElevenLabs’ Iconic Voices, and Meta’s Voicebox all aim to achieve similar capabilities.
However, the ethical implications of such systems have raised concerns, as they have been abused for scamming individuals by mimicking familiar voices. Unlike images, there is currently no effective way to watermark AI-generated audio.