voice
Voice-Swap, an ethically-trained AI voice company, and BMAT Music Innovators, a company that indexes music usage and ownership data using machine learning, have partnered to launch a new technical certification for AI voice and music models. It is designed to verify that the audio content used to train voice models does not infringe on any […]
There is no shortage of AI voice synthesis companies on the market today, but Voice-Swap, founded and led by Dan “DJ Fresh” Stein, is trying to reimagine what these companies can be.
The music producer and technologist intends Voice-Swap to act as not just a simple conversion tool but an “agency” for artists’ AI likenesses. He’s also looking to solve the ongoing question of how to monetize these voice models in a way that gets the most money back to the artists — a hotly contested topic since anonymous TikTok user Ghostwriter employed AI renderings of Drake and The Weeknd‘s voices without their permission on the viral song “Heart On My Sleeve.”
In an exclusive interview with Billboard, Stein and Michael Pelczynski, a member of the company’s advisory board and former vp at SoundCloud, explain their business goals as well as their new monetization plan, which includes providing a dividend for participating artists and payment to artists every time a user employs their AI voice — not just when the resulting song is released commercially and streamed on DSPs. The company also reveals that it’s working on a new partnership with Imogen Heap to create her voice model, which will arrive this summer.
Trending on Billboard
Voice-Swap sees the voice as the “new real estate of IP,” as Pelczynski puts it — just another form of ownership that can allow a participating artist to make passive income. (The voice, along with one’s name and likeness, is considered a “right of publicity” which is currently regulated differently state-to-state.)
In addition to seeing AI voice technology as a useful tool to engage fans of notable artists like Heap and make translations of songs, the Voice-Swap team also believes AI voices represent a major opportunity for session vocalists with distinct timbres but lower public profiles to earn additional income. On its platform now, the company has a number of session vocalists of varying vocal styles available for use; Voice-Swap sees session vocalists’ AI voice models as potentially valuable to songwriters and producers who may want to shape-shift those voices during writing and recording sessions. (As Billboard reported in August, using AI voice models to better tailor pitch records to artists has become a common use-case for the emerging technology.)
“We like to think that, much like a record label, we have a brand that we want to build with the style of artists and the quality we represent at Voice-Swap,” says Stein. “It doesn’t have to be a specific genre, but it’s about hosting unique and incredible voices as opposed to [just popular artists].”
Last year, we saw a lot of fear and excitement surrounding this technology as Ghostwriter appeared on social media and Grimes introduced her own voice model soon after. How does your approach compare to these examples?
Pelczynski: This technology did stoke a lot of fear at first. This is because people see it as a magic trick. When you don’t know what’s behind it and you just see the end result and wonder how it just did that, there is wonder and fear that comes. [There is now the risk] that if you don’t work with someone you trust on your vocal rights, someone is going to pick up that magic trick and do it without you. That’s what happened with Ghostwriter and many others.
The one real main thing to emphasize is the magic trick of swapping a voice isn’t where the story ends, it’s where it begins. And I think Grimes in particular is approaching it with an intent to empower artists. We are, too. But I think where we differentiate is the revenue stream part. With the Grimes model, you create what you want to create and then the song goes into the traditional ecosystem of streaming and other ways of consuming music. That’s where the royalties are made off of that.
We are focused on the inference. Our voice artists get paid on the actual conversion of the voice. Not all of these uses of AI voices end up on streaming, so this is important to us. Of course, if the song is released, additional money for the voice can be made then, too. As far as we know, we are the first platform to pay royalties on the inference, the first conversion.
Stein: We also allow artists the right to release their results through any distributor they want. [Grimes’ model is partnered exclusively with TuneCore.] We see ourselves a bit like an agency for artists’ voices.
What do you mean by an “agency” for artists’ voices?
Stein: When we work with an artist at Voice-Swap we intend to represent them and license their voice models created with us to other platforms to increase their opportunities to earn income. It’s like working with an agent to manage your live bookings. We want to be the agent for the artists’ AI presence and help them monetize it on multiple platforms but always with their personal preferences and concerns in mind.
What kinds of platforms would be interested in licensing an AI voice model from Voice-Swap?
Stein: It is early days for all of the possible use cases, but we think the most obvious example at the moment is music production platforms [or DAWs, short for digital audio workstation] that want to use voice models in their products.
There are two approaches you can take [as an AI voice company.] We could say we are a SaaS platform, and the artist can do deals with other platforms themselves. But the way we approach this is we put a lot of focus into the quality of our models and working with artists directly to keep improving it. We want to be the one-stop solution for creating a model the artist is proud of.
I think the whole thing with AI and where this technology is going is that none of us know what it’s going to be doing 10 years from now. So for us, this was also about getting into a place where we can build that credibility in those relationships and not just with the artists. We want to work with labels, too.
Do you have any partnerships with DAWs or other music-making platforms in place already?
Pelczynski: We are in discussions and under NDA pending an announcement. Every creator’s workflow is different — we want our users to have access to our roster of voices wherever they feel most comfortable, be that via the website, in a DAW or elsewhere. That’s why we’re exploring these partnerships, and why we’ve designed our upcoming VST [virtual studio technology] to make that experience even more seamless. We also recently announced a partnership with SoundCloud, with deeper integrations aimed at creators forthcoming.
Ultimately, the more places our voices are available, the more opportunities there are for new revenue for the artists, and that’s our priority.
Can some music editing take place on the Voice-Swap website, or do these converted voices need to be exported?
Pelczynski: Yes, Dan has always wanted to architect a VST so that it can act like a plug-in in someone’s DAW, but we also have the capability of letting users edit and do the voice conversion and some music editing on our website using our product Stem-Swap. That’s an amazing playground for people that are just coming up. It is similar to how BandLab and others are a good quick way to experiment with music creation.
How many users does Voice-Swap have?
Pelczynski: We have 140,000 verified unique users, and counting.
Can you break down the specifics of how much your site costs for users?
Pelczynski: We run a subscription and top-up pricing system. Users pay a monthly or one-off fee and receive audio credits. Credits are then used for voice conversion and stem separation, with more creator tools on the way.
How did your team get connected with Imogen Heap, and given all the competitors in the AI voice space today, why do you think she picked Voice-Swap?
Pelczynski: We’re very excited to be working with her. She’s one of many established artists that we’re working on currently in the pipeline, and I think our partnership comes down to our ethos of trust and consent. I know it sounds trite, but I think it’s absolutely one of the cornerstones to our success.
The All Access Audio Summit 2023 began Wednesday, April 26, and ran through Friday, April 28. Bringing together leaders in radio, podcasting, production and more, the virtual convention sparked conversation aimed at optimizing the impact of audio in multiple commercial forms.
Here’s a rundown of highlights from the gathering’s third day, when panels were introduced by YEA Networks’ syndicated host Tino Cochino, who aptly summarized the event as “a whole lot of learning, and catching vibes.”
‘We’re for the Masses’
All Access president and publisher Joel Denver opened the day with a conversation with Dave Milner, Cumulus Media president of operations.
“There’s no silver bullet” to successful radio, Milner mused. “It comes down to great local content, and making sure that content is available in multiple platforms. We have to be available any which way people listen to audio. If you put out good content, they will find it and consume it.
“Individuals are spending more time with audio – the pandemic stretched that,” Milner said. “Whether podcast, streaming or broadcast, people want audio.”
Milner also discussed one of the summit’s recurring topics: artificial intelligence. “There’s going to be a place for it,” he said. “It provides opportunities, from writing copy to traffic reports … weather reports … promotions. I have a hard time thinking it will replace any prime-time, personality-based radio.”
Milner cited a recent episode of SiriusXM’s Friday Night Freak-Out With Drew Carey that surreptitiously used AI. “I violated a rule from Radio 101,” Carey subsequently confessed, adding, “The reason treasured radio stations still make money is because people like the personality of the DJs.”
“You can’t replace that human touch, that soul, that connection with the audience,” Milner said.
Milner additionally touched upon another of the summit’s most prominent angles: finding and nurturing future talent. “The biggest thing we can control is how we mentor,” he said. “We’ve had a couple models where we’ve been able to take the third, fourth, fifth people on a morning show and given them an opportunity to have a more singular voice,” as hosts of their own shifts in different dayparts. “It’s helped them grow, and helped the station cross-pollinate and create a more contiguous audience across the station.”
As for fostering hits, “Radio is not the new music discovery place it used to be,” noted Denver, as streaming services have sliced into that share. “They can go deeper than we do,” Milner conceded of DSPs. “We’re more of a mainstream box store – we’re for the masses. It’s harder for us to take chances on a music level. We’ve got to deliver for all people. But on a day-to-day basis, we have the ability to out-local them all day long. Personalities live in communities – that is something the DSPs will never be able to do in an effective way. They’re trying … they know that’s our advantage.”
Atlanta ‘Radio United’
“You have to be in the daily conversation with your audience,” said Jimmy Steal, vp of branding and content for Hubbard Broadcasting’s WMTX and WSHE Chicago, in the day’s second session.
The discussion led to a rare, but rewarding, occurrence in radio: competing stations working together for a common cause, specifically one spearheaded by panelist Terri Avery, director of branding and programming for Cox Media Group’s WALR Atlanta. In late 2022, Avery helmed Black Radio United for the Vote, encouraging listeners to vote in the then-pending run-off election between U.S. Senator Raphael Wornock (the eventual winner) and challenger Herschel Walker. The initiative – among 11 Atlanta area radio stations – helped prospective voters check their voting status, be informed about requirements for in-person voting, get acquainted with a sample ballot and more.
That Avery could create harmony among so many stations in the same market prompted the session’s panel to agree that she herself “should run for Congress.”
Trolling the Trolls
An All Access Audio Summit panel about social media, led by moderator Lori Lewis, president of marketing firm Lori Lewis Media, had fun taking on trolls.
“They’re just looking for attention,” said Jamien “Melz on the Mic” Green, brand manager and afternoon host at Townsquare Media’s KISX Tyler, Texas. “They’re looking to feel something.” His playful strategy: “I’m gonna give you a rise back!”
His favorite online agitators? Those who take the time to craft an intricate post explaining … that they don’t care about your show. “You’ve helped my algorithm with your comment,” he noted.
Ultimately, he believes in the benefits of social media for radio. “You can lure in one listener at a time,” he said. “It’s free promo.”
Podcasting & Talk Radio (& Cheez-Its)
All Access vp of news, talk, sports and podcasting Perry Michael Simon chatted with Steven Goldstein, CEO of Amplifi Media. “We’re at a third of Americans listening to podcasts weekly – just under 90 million people,” Goldstein said. “I think that’s a giant success.”
Meanwhile, Todd Hollst, evening host on Cox Media Group’s talk station WHIO Dayton, Ohio, feels that the format doesn’t always need to be political-leaning. “There’s nonsense, serious moments … it’s not real-life, but it has that feel,” he said of his show, recapping a passion project of his combining fun and localism, and one not likely to stir a deep divide among listeners, depending on their stance on snacks: as Cheez-Its originated in Dayton in 1921, Hollst started a petition to build a statue in their honor. (No wonder he refers to himself as a wisecracker.)
VO & AI
Kelly “K3” Doherty, president and founder of Imaging House, posed one of the All Access Audio Summit’s most pointed questions, to voice-over and production specialists: Would you take a job recording AI, knowing it could ultimately result in a loss of further work?
“That’s a tough question,” pondered Scott Chambers, president of Scott Chambers VO. “Maybe, if my attorneys looked over the contract really well and I got residuals. The contract would have to be really good and lucrative.”
“I would probably prefer not to,” answered Donovan Corneetz, president of DonCo Productions. “I would not want to contribute to a tool to put me out of work. It wouldn’t serve the industry as a whole very well.”
Said Yinka Ladeinde, president of Yinka’s Voice, “I would like to say I would never do it. I would probably hold out until absolutely necessary.”
Doherty expressed caution that any recorded words could be stitched together to create audio considered offensive, or even incriminating, echoing the need for an airtight contract. Still, she noted that AI would be helpful when realizing a mistake had been made and the voice-over talent wasn’t subsequently available, or when copy is revised. “There are positives and negatives,” she said.
The panel also mused about its side of the business overall, and how sometimes factors are out of a talent’s control, regardless of how well a job is performed. Corneetz recalled once losing out on a gig because, he was later told about a client, “you sound just like her ex-husband … whom she hates.”
‘Our Superpower Is Human Connection’
In the summit’s final session, participants looked to the future of audio, and radio specifically, with another focus on AI.
Thea Mitchem, executive vp of programming for iHeartMedia, stressed the need not to dismiss AI, remembering that, around Y2K, certain executives for whom she then worked didn’t seem concerned enough about the rise of digital audio; even at the time, she thought that they should’ve been. “Technology has always moved things,” she said. “I think all industries have to embrace technology.” Still, she said about radio, repeating a common theme over the convention’s three days, “I think our superpower is human connection. There’s a trust level there.”
Said Kurt Johnson, Townsquare Media senior vp of content, “The concern with AI is no one knows where it’s going, and it’s going really fast. Copyright is a big issue. Like everyone else, were learning very quicky. We’re very big at generating local content. AI could contribute to that, but our people are what make our content.”
Added Keith Hastings, brand content director of Hubbard Broadcasting’s WDRV Chicago, of AI, “With rights come responsibilities. With opportunities comes responsibility. We have to study it and be careful with it.”
Agreed Jeff Sottolano, Audacy executive vp of programming, “All of us have a responsibility to experiment with it. I think there’s a lot of upside. Ask ChatGPT to write a 30-second script and I think you’ll be impressed – it might get you 80% of the way there.”
Johnson summed up his optimism about radio going forward (pointing out that the company’s name reflects how air talents in every market “are the town square”). “What radio provides hasn’t changed,” he said. “When you combine multi-platform – digital, radio, live events – you’re going to find people of all age groups. We have powerful tools to do it – that’s the exciting thing.”
-
Pages