Deepfakes strike a wrong chordTechnology | Matt Pearce 17 Aug 2021
The most important thing about a documentary deepfaking Anthony Bourdain's voice isn't that it happened - but that it happened and almost nobody noticed.
Director Morgan Neville faced skepticism and outright revulsion on social media when it was revealed he used artificial intelligence to create a model of Bourdain's voice for 45 seconds of narration in Roadrunner, about the life and 2018 death by suicide of the chef and journalist.
Bourdain's voice was one of his trademarks, known to fans the world over from his TV travelogues Parts Unknown and No Reservations.
Fans also loved how authentic he seemed, always able to level with the viewer. Faking his voice, to some, was a step too far. "In the end I understood this technique was boundary-pushing," Neville said. "But isn't that Bourdain?"
Yet the boundaries have already been pushed.
The voice imitation revolution is already here, and artists, technologists and companies in several industries who use the new tech are grappling with the big question of what happens when you separate speech from the speaker.
Need a synthetic voice that can read text for the visually impaired? A human voice actor can't preread every possible sentence in the world but an AI-built voice could cope. Have a video game that's been in interminable production for years and want to avoid hauling in voice actors for rerecording every time there's a script change? Tweak their dialogue in production.
"We believe this is the CGI of audio," says Zeena Qureshi, co-founder and CEO of Sonantic, a start-up formed in 2018. "We made the first AI that can cry last year. We made the first AI that can shout early this year."
Another synthetic-voice company, VocaliD, pitches to potential corporate clients that "the volume and speed with which written content must be transformed into brand-consistent sound bytes cannot be met by traditional voice talent or generic text-to-speech."
A third company, Resemble AI, offers services like voice "cloning" and has short clips of synthetic speech from former US President Barack Obama and actors Morgan Freeman and Jon Hamm. The company says a voice clone can start to be built if it has 50 sentences from a real speaker to synthesize.
Like any form of automation, the promises are simple: robots can do more work faster, and the money and time saved can be used on something else. And like any form of automation, there can be big downsides for the humans whose work and paychecks are getting augmented or replaced.
"Voice actors, their voice is intellectual property, it's their own, or at least that's the idea," says David Rosenthal, CEO and coach at the Global Voice Acting Academy. But he warns against companies forcing performers to sign agreements that allow those companies to synthesize performers' voices and use them for whatever they want in perpetuity. "They can't say: 'Now I own this voice because you did a job for me.' With AI, it's unfortunately like the Wild, Wild West here."
Some voice acting advocates have been watching the lawsuit filed by Canadian voice-over performer Bev Standing, who alleged that TikTok took her voice for a text-to-speech feature for users' videos without notice or compensation.
Her lawsuit, filed in federal court in New York, said she had performed the voice-over work for a Scottish company - and she did not have a contract permitting the sale to another company.
Attorneys for TikTok parent company ByteDance have signaled they will argue the lawsuit should be dismissed. "TikTok is a free platform," the attorneys wrote in a letter to the judge requesting a hearing, saying that the text-to-speech function "makes videos more accessible to disabled users." They also argued that Standing's voice just isn't recognizable enough to argue her likeness was stolen.
Some companies have preemptively contacted SAG-AFTRA, which represents some voice-over performers, about setting up fair compensation systems for performers whose voices will be re-created with AI.
At other companies, however, "A lot of nonunion performers signed away rights that they had no idea about," said Katie Watson, the union's national director for voice-over contracts.
SAG-AFTRA contracts for "very low-budget productions" require that digital re-creations of performers can't be used "without coming to us for our consent," says Danielle Van Lier, an assistant general counsel for the union who focuses on intellectual property and contracts.
The union says protections are needed.
"The AI-generated voices, in particular, can be used to put words into our members' - including our broadcast journalists' - mouths and make them say things they never said," the union said in a statement. "At best, it denies them the right and ability to control their image; at the other end, it is exploitative and may cause actual harm to their reputation, their earning potential, or worse, to the individual themselves."
Qureshi said that her company has implemented a compensation system for performers whose voices have been synthesized so that "every time their voice gets used, they get a profit share."
She also raises the prospect that the actors' AI voices could allow performers to do multiple projects at once. "If they want to work in theater, they want to work somewhere very niche, [and] their AI can work for them on the side doing voice work for games and films and things like that."
As a voice coach, Rosenthal thinks that AI technology has not quite advanced to the point where it could trick a listener over a long period of time with a fully synthetic voice.
"Anything to do with animation, cartoons, video games, that kind of stuff, there's a certain kind of physicality involved in those particular genres, not just in a human vocal ability, but also a physicality that AI has not mastered at all yet," Rosenthal said. "Imagine fighting while you're talking, punching or receiving a punch."
But he sees the writing on the wall, as the technology will only get better - and more tempting to use. "AI is really cheap, right? Costs less than having to pay a person," he said.
"Ultimately it's our job to educate those people as to the value of the human voice."
los angeles times (tns)