How to defend yourself against AI cloning your own voice
A new technology against deepfake prevents the cloning of voices and thus protects against misleading use: "AntiFake" makes it more difficult for AI tools to read voice recordings.
Synthetic voices are not just a sensitive topic for celebrities and politicians. Recent advances in generative artificial intelligence (AI) enable authentic-sounding speech synthesis to the point where people can no longer distinguish at a distance whether they are talking to a familiar person or a deepfake. If a person's own voice is "cloned" by a third party without their consent, malicious actors can use it to send any message they want. This is the downside of this technology, which has useful potential through personalised assistance or deliberately created avatars and not only provides competition for professional speakers and actors or material for fraud gangs.
However, the potential for misuse when cloning real voices with deep voice software is obvious: synthetic voices can easily be misused to mislead others. And just a few seconds of voice recording is enough to convincingly clone a person's voice. Anyone who sends even occasional voice messages or speaks on answering machines has already provided the world with more than enough material of themselves to be cloned.
Computer scientist and engineer Ning Zhang has developed a new method to thwart unauthorised speech synthesis before it has taken place. Zhang, who teaches computer science and engineering at the McKelvey School of Engineering at Washington University in St. Louis, has created a tool called "AntiFake". Zhang presented it at a specialist conference in Copenhagen, Denmark, at the end of November 2023. If you want to find out more, you can already read the presentation Zhang gave there in the conference proceedings.
Preventing AI clones of your own voice
Conventional methods for detecting deepfakes only take effect once the damage has already been done. AntiFake, on the other hand, starts with prevention in order to prevent the synthesis of voice data into a deep voice fake. The tool is designed to beat digital counterfeiters at their own game: it uses techniques similar to those used by cybercriminals for voice cloning to protect voices from piracy and counterfeiting. The source code of the AntiFake project is freely available to everyone.
The anti-fake voice software is designed to make it more difficult for cybercriminals to analyse voice data and read out the features that are important for voice synthesis. "The tool uses an adversarial AI technique that was originally part of the cybercriminals' toolbox, but now we use it to defend ourselves against them," explains Zhang, explaining what is going on technically: "We mess up the recorded audio signal a little, distorting or disrupting it just enough so that it still sounds right to human listeners" - at the same time rendering it useless for training a voice clone. Similar approaches already exist for the copy protection of works on the internet - for example, images that still look natural to the human eye, but whose information should no longer be readable by machines due to interference information in the image file that is invisible to humans.
For example, the Glaze software makes images unusable for the machine learning of large AI models, and certain tricks protect against facial recognition in photos. "AntiFake ensures that it is difficult for criminals to synthesise our voices and imitate us when we publish our voice data," says Zhang about the purpose of the tool he developed.
Attack methods are constantly improving and attackers are becoming more professional, as can be seen in the current increase in automated cyberattacks on companies, infrastructure and public administration worldwide. To ensure that AntiFake can keep up with the dynamic threat landscape and withstand strong synthesis models for as long as possible, Zhang and his PhD student Zhiyuan Yu have developed their tool in such a way that it is as broad and generally trained as possible.
Zhang's lab tested the tool against five modern speech synthesisers. It is said to have achieved a protection rate of 95 per cent, even against unknown commercial synthesizers for which AntiFake was not specifically designed. Zhang and Yu tested their tool with 24 human test participants from different population groups. For a representative comparative study, further tests and a larger test group would be necessary.
According to this, AntiFake can already protect shorter voice recordings against impersonation, and this is statistically the most common format for cybercriminal forgery. However, the creators of the tool believe that there is nothing to stop it being extended for longer audio documents or music in order to protect larger documents against misuse. Interested users would currently have to do this themselves, which requires programming skills. The source code itself is available on the internet.
"Sooner or later, we will be able to fully protect voice recordings," Zhang tells the American Association for the Advancement of Science with confidence - because AI systems remain susceptible to disruption. What is considered a major shortcoming in the safety-critical use of AI can be exploited in the fight against deepfake. However, the methods and tools must also be continuously adapted to the possibilities of cyber criminals, learn and grow with them.
Spectrum of science
We are partners of Spektrum der Wissenschaft and want to make well-founded information more accessible to you. Follow Spektrum der Wissenschaft if you like the articles.
Cover image: Shutterstock / fizkes
Experts from science and research report on the latest findings in their fields – competent, authentic and comprehensible.