This is not the conversation Morgan Neville wanted. The documentarian’s 2021 film exploration of Anthony Bourdain’s life, RoadRunner, was meant to help fans navigate the celebrity’s 2018 death. “I wanted the film to be cathartic, in a way,” Neville told The New Yorker’s Helen Rosner. “Not to have good answers, but to at least help people process their feelings.”
Maybe it has been helpful for some fans, but the movie also touched off a firestorm of ethics concerns about voice deepfakes—unauthorized synthetic speech indistinguishable from a targeted speaker. Neville hired an unidentified text-to-speech (TTS) company to clone Bourdain’s voice and used the AI model to narrate small portions of the film—without onscreen disclosure—including a heartrending final email Bourdain sent to a friend. ReadSpeaker was not involved in production of the film or the cloning of Anthony Bourdain’s voice, but we welcome the conversation about ethical issues surrounding AI, including TTS-related topics like voice deepfakes.
The Bourdain Deepfake Voice Backlash
When the New Yorker broke the story, Twitter erupted in protest. “When I wrote my review I was not aware that the filmmakers had used an A.I. to deepfake Bourdain’s voice for portions of the narration,” tweeted critic Sean Burns. “I feel like this tells you all you need to know about the ethics of the people behind this project.” Reporter Dave Weigel was more succinct. “Thanks I hate it,” he tweeted in response to the New Yorker article.
Neville told GQ he obtained the appropriate permissions before emulating Bourdain’s voice. “I checked, you know, with his widow and his literary executor, just to make sure people were cool with that,” he told the magazine’s Brett Martin. “And they were like, Tony would have been cool with that. I wasn’t putting words into his mouth. I was just trying to make them come alive.”
Ottavia Busia, the widow in question, responded to this quote via tweet. “I certainly was NOT the one who said Tony would have been cool with that,” she wrote—and so the controversy deepened. “We can have a documentary-ethics panel about it later,” Neville told The New Yorker. But in the media, the ethics panel is well underway, as it has been for years in the TTS community where we at ReadSpeaker conduct our professional lives.
AI Synthetic Speech: Our Ethical Framework
As TTS developers with more than 20 years in the industry, we’ve spent a lot of time thinking about the ethics of “voice cloning”—a popular term that describes the creation of a synthetic voice using AI technologies (all high-quality synthetic voices start with human voice recordings, and the results sound like the original speaker, so the “cloning” terminology was probably inevitable; we just call it neural TTS.) Despite the hazards, we remain committed to the creative possibilities of synthetic speech. Our company began with a mission to expand digital accessibility for users with vision impairments, second-language learners, people with reading disorders, and others for whom text alone is a barrier rather than an invitation. That mission continues.
But as the controversy surrounding the Anthony Bourdain movie illustrates, it is our responsibility as TTS providers to control our products, ensuring appropriate usage so these solutions continue to improve lived experiences—from increased internet accessibility to voice navigation of an unfamiliar town—without collateral damage. We work actively toward this goal, with built-in protections for voice actors, clients, end users, and society at large. Here are just a few.
- We do not provide a voice-cloning platform for public usage.Our TTS voices are built entirely in-house, using proprietary deep neural networks. This approach prevents unethical developers from accessing our tools. We operate on a business-to-business (B2B) model that defines use cases narrowly to keep all stakeholders aligned in their expectations and goals—including the voice actors whose work forms the basis of our synthetic voices.
- Our voice actors are protected by detailed legal agreements.Contracts with voice talent—without whom top-quality TTS voices are impossible—clearly delineate appropriate use of the resulting product. If a voice actor were to die after providing recordings, ownership of that contract would default to the legal entity that owns the rights to the actor’s voice and work; it’s up to the estate whether to continue with the project or not. (Thankfully, this is not a common scenario.)
- When our customers enter contracts with voice actors directly, we make sure the agreement is mutual and fair. Sometimes our customers bring their own voice talent to the table. In that case, they may write their own contracts. That doesn’t remove our responsibility to voice actors, our clients, or end users. Our intellectual property (IP) specialists complete due diligence to verify that all parties enter the agreement willingly and with full clarity on approved uses of the resulting TTS voice.
- Our neural TTS voices are available only to licensed business users.We work hard to prevent unauthorized use of our creations. Users can only access neural TTS from ReadSpeaker AI by purchasing a license, and that process includes a contractual agreement that protects us, the user, and the voice actor at once. We track usage of all our technology; if a TTS voice were to fall into the wrong hands, we’d know and take prompt action to remove the unsanctioned access.
- We are active members of the AITHOS Coalition, an industry group devoted to responsible new media. That means we adhere to the AITHOS Pledge “to uphold the core principles of ethical, responsible and equitable media through our individual and collective technological pursuits.” This includes following the AITHOS Ethics in Synthetic Media Guide, a living document, written collectively by members, that seeks to promote “careful reflection and proactive measures to counterbalance misappropriation of advances in synthetic media.”
- We follow the European Commission’s Ethics Guidelines for Trustworthy AI, which require active steps toward human oversight, fairness, accountability, and data privacy for AI systems. As an independent TTS provider, we differ from Big Five tech companies particularly in regards to this last value: We never collect data from TTS users or our clients. Sales of personal data aren’t part of our business model the way they are for Google or Amazon, so there’s no privacy risk with ReadSpeaker AI.
While we continue to review and update our ethical protocols on an ongoing basis, the above practices keep us out of controversies like those surrounding Roadrunner. This is not to say there’s no ethical scenario involving the digital reproduction of a beloved voice. The Roadrunner scandal focuses on two main objections: First, that Neville failed to disclose the use of deepfake audio to viewers, and second, that his permissions remain murky. Whoever produced the Bourdain deepfake is learning a very public lesson about voice cloning ethics; all of us in the industry must pay attention.
Neural TTS is a powerful tool for accessibility, open communications, entertainment, and, as any user of a voice-controlled smart home device will tell you, day-to-day convenience. Our challenge as TTS creators is to provide these benefits without allowing our technology to be misused in the process. What’s the best way to do so? That’s a conversation worth having, and it’s one in which every voice has value. Tweet us to share your thoughts about the Anthony Bourdain movie, preventing deepfake voices, and the ethical use of neural TTS in any application.