AI-Generated Voices Can Do for Singers What Synths Did for Pianists (Guest Column)
Written by djfrosty on September 28, 2023
YouTube recently launched an AI Music incubator with artists and producers from Universal Music Group. The purpose of the group, according to Universal CEO Lucian Grainge, is to explore, experiment, and offer feedback on the AI-related musician tools and products the Google team is researching — with the hope that more artists will benefit from YouTube’s creative suite.
This partnership demonstrates the clear desire to involve the industry in the development stages of AI products and protect the human component of artistry. This desire is heightened in the face of deep fakes. Just last month, Google launched its Synth ID watermark meant to spot AI-generated images (Google DeepMind CEO Denis Hassabis cited the importance of deepfake detection ahead of a contentious election season). “Heart on My Sleeve,” the song created with AI-generated voices of fake Drake and The Weeknd kicked off the music industry’s scramble to shut down and stamp out any unauthorized use of artists’ voices. Most importantly, though, the viral track proved that AI voice models are here and only improving with each passing day.
As artists, labels, and other rights holders have grown more concerned about AI models learning and profiting from their copyrighted material, fans and creators have discovered new ways to engage with their favorite artists and imagine completely new musical works using their AI voice models. This is prompting other industry executives (myself included) to wonder how these models can continue to be used to explore this new creative frontier of music while protecting artists.
With all of this in mind, the industry needs to mull over a few philosophical questions and consider the distinction between voice cloning and voice synthesis. A singer is much more than timbre, the primary quality that voice models modify in a voice. AI voices are not the same as samples, where the whole vocal element is based on an underlying artist’s full performance which would include pitch, emotion, timbre, accent, tone, etc.
Regardless, AI innovations will only reach their maximum potential if the industry faces one foundational issue: artists and their labels need to control the ways in which their image, likeness and voice are used. Whether the industry decides to embrace these innovations or limit AI-powered cloning entirely, the next step begins with synthetic voice detection. Is the artist singing on any given track fake or the real deal?
In the early 2000s, music companies found themselves losing control of their content to the digitalization of music. The industry’s initial impulse to crush file-sharing networks like Napster led to the launch of Apple’s iTunes store in 2003 and, eventually, legal streaming. Other digital rights management tools, like ContentID on YouTube, were developed to detect unauthorized use of music. Once the industry learned to embrace digital music and formed a foundational infrastructure to support it, streaming revenues soared — breaking the $10 billion mark for the first time in 2022 and making up 84% of the industry’s total revenue, according to the RIAA.
The industry needs synthetic voice detection, but with 120,000 new tracks uploaded to streaming platforms daily (according to Luminate) on top of the already existing back catalogs, can it be done accurately and at scale? The short answer: yes.
As the industry begins to embrace the responsible use of AI for synthetic voice creation, I strongly believe there should be a corresponding willingness for artists and labels to collaborate in that training process. It’s in their best interests to do this now. AI applications are already scaling in a variety of categories. Well-engineered models are becoming exponentially more efficient and can increasingly manage massive computing tasks. Combined with strategic operational approaches, this is achievable today.
To honor each artist’s decision whether or not to participate in voice models, the industry needs an easy and accessible way for artists to build their own voice models and grant fans and creators permission to use it. This type of initiative paired with synthetic voice detection ensures that only the voices and works of those who want to be involved in voice cloning and other derivative AI tools are used. Artists who want to create their own voice models can work with voice synthesis platforms to establish the terms of where and how their voice model can be used–offering more control and even opportunities for monetization.
Geraldo Ramos is the co-founder and CEO of Moises, the AI-driven music platform that is transforming the way artists and businesses incorporate AI technology into their workflows.