State Champ Radio

by DJ Frosty

Current track

Title

Artist

Current show
blank

State Champ Radio Mix

12:00 am 12:00 pm

Current show
blank

State Champ Radio Mix

12:00 am 12:00 pm


Moises

Moises, an AI music and audio start-up, has partnered with HYPERREAL, a visual effects company, to create a “proprietary digital human asset” called Hypermodel. This will allow artists to create their digital versions of themselves for marketing, creative and fan engagement purposes.

HYPERREAL has already been collaborating with musicians since 2021, when he worked with Paul McCartney and Beck on their music video for “Find My Way.” In the video, Beck went undercover as a younger version of 81-year-old McCartney, using HYPERREAL to swap and de-age their faces.

Moises is a popular AI music and audio company that provides a suite of tools for musicians, including stem separation, lyric transcription, and voice synthesis.

According to the press release, Moises and HYPERREAL believe this collaboration will especially help the estates of legacy artists to bring the artist’s legacy “to life” and will allow artists to sing or speak in another language using AI voice modeling provided by Moises, helping to localize songs and marketing content to specific regions.

Translations and estate or legacy artist marketing are seen as two of the most sought after new applications of AI for musicians. Last week, pop artist Lauv collaborated with AI voice start-up Hooky to translate his song “Love U Like That” into Korean as a thank you to his steadfast fanbase in the region. This is not the first time AI has been used to translate an artist’s voice — it was first employed in May by MIDNATT, a Korean artist who used the HYBE-owned voice synthesis company Supertone to translate his debut single into six languages — but Lauv’s use of the technology was the first popular Western artist to try it.

Estates are starting to leverage AI as well to essentially bring a late artist back to life. On Tuesday, Nov 14, Warner Music announced plans to use AI to recreate the voice and image of legendary “La Vie En Rose” singer, Edith Piaf, for an upcoming biopic about her life and career. Over in Korea, Supertone remade the voice of late South Korean folk artist Kim Kwang-seok, and Tencent’s Lingyin Engine made headlines for developing “synthetic voices in memory of legendary artists,” like Teresa Teng and Anita Mui as a way to revive interest in their catalogs.

“Moises and HYPERREAL are each best-in-class players with a history of pushing creative boundaries enabled by technology while fully respecting the choices of artists and rights holders,” says Moises CEO Geraldo Ramos. “As their preferred partner, we’re looking forward to seeing the ways HYPERREAL, can leverage Moises’s voice modeling capabilities to add incredibly realistic voices to their productions.”

“We have set the industry standard and exceeded the expectations of the most demanding directors and producers time and time again,” says Remington Scott, founder and CEO of HYPERREAL. “In addition to Moises’s artist-first approach, the quality of their voice models is the best we’ve heard.”

YouTube recently launched an AI Music incubator with artists and producers from Universal Music Group. The purpose of the group, according to Universal CEO Lucian Grainge, is to explore, experiment, and offer feedback on the AI-related musician tools and products the Google team is researching — with the hope that more artists will benefit from YouTube’s creative suite. 

This partnership demonstrates the clear desire to involve the industry in the development stages of AI products and protect the human component of artistry. This desire is heightened in the face of deep fakes. Just last month, Google launched its Synth ID watermark meant to spot AI-generated images (Google DeepMind CEO Denis Hassabis cited the importance of deepfake detection ahead of a contentious election season). “Heart on My Sleeve,” the song created with AI-generated voices of fake Drake and The Weeknd kicked off the music industry’s scramble to shut down and stamp out any unauthorized use of artists’ voices. Most importantly, though, the viral track proved that AI voice models are here and only improving with each passing day.

As artists, labels, and other rights holders have grown more concerned about AI models learning and profiting from their copyrighted material, fans and creators have discovered new ways to engage with their favorite artists and imagine completely new musical works using their AI voice models. This is prompting other industry executives (myself included) to wonder how these models can continue to be used to explore this new creative frontier of music while protecting artists.

With all of this in mind, the industry needs to mull over a few philosophical questions and consider the distinction between voice cloning and voice synthesis. A singer is much more than timbre, the primary quality that voice models modify in a voice. AI voices are not the same as samples, where the whole vocal element is based on an underlying artist’s full performance which would include pitch, emotion, timbre, accent, tone, etc. 

Regardless, AI innovations will only reach their maximum potential if the industry faces one foundational issue: artists and their labels need to control the ways in which their image, likeness and voice are used. Whether the industry decides to embrace these innovations or limit AI-powered cloning entirely, the next step begins with synthetic voice detection. Is the artist singing on any given track fake or the real deal?

In the early 2000s, music companies found themselves losing control of their content to the digitalization of music. The industry’s initial impulse to crush file-sharing networks like Napster led to the launch of Apple’s iTunes store in 2003 and, eventually, legal streaming. Other digital rights management tools, like ContentID on YouTube, were developed to detect unauthorized use of music. Once the industry learned to embrace digital music and formed a foundational infrastructure to support it, streaming revenues soared — breaking the $10 billion mark for the first time in 2022 and making up 84% of the industry’s total revenue, according to the RIAA. 

The industry needs synthetic voice detection, but with 120,000 new tracks uploaded to streaming platforms daily (according to Luminate) on top of the already existing back catalogs, can it be done accurately and at scale? The short answer: yes. 

As the industry begins to embrace the responsible use of AI for synthetic voice creation, I strongly believe there should be a corresponding willingness for artists and labels to collaborate in that training process. It’s in their best interests to do this now. AI applications are already scaling in a variety of categories. Well-engineered models are becoming exponentially more efficient and can increasingly manage massive computing tasks. Combined with strategic operational approaches, this is achievable today.   

To honor each artist’s decision whether or not to participate in voice models, the industry needs an easy and accessible way for artists to build their own voice models and grant fans and creators permission to use it. This type of initiative paired with synthetic voice detection ensures that only the voices and works of those who want to be involved in voice cloning and other derivative AI tools are used. Artists who want to create their own voice models can work with voice synthesis platforms to establish the terms of where and how their voice model can be used–offering more control and even opportunities for monetization. 

Geraldo Ramos is the co-founder and CEO of Moises, the AI-driven music platform that is transforming the way artists and businesses incorporate AI technology into their workflows.

Dennis Murcia was excited to get an email from Disney, but the thrill was short-lived. As an A&R and global development executive for the label Codiscos — founded in 1950, Murcia likens it to “Motown of Latin America” — part of his job revolves around finding new listeners for a catalog of older songs. Disney reached out in 2020 hoping to use Juan Carlos Coronel’s zippy recording of “Colombia Tierra Querida,” written by Lucho Bermudez, in the trailer for an upcoming film titled Encanto. The problem was: The movie company wanted the instrumental version of the track, and Codiscos didn’t have one. 

“I had to scramble,” Murcia recalls. A friend recommended that he try AudioShake, a company that uses artificial intelligence-powered technology to dissect songs into their component parts, known as stems. Murcia was hesitant — “removing vocals is not new, but it was never ideal; they always came out with a little air.” He needed to try something, though, and it turned out that AudioShake was able to create an instrumental version of “Colombia Tierra Querida” that met Disney’s standards, allowing the track to appear in the trailer. 

“It was a really important synch placement” for us, Murcia says. He calls quality stem-separation technology “one of the best uses of AI I’ve seen,” capable of opening “a whole new profit center” for Codiscos.

Catalog owners and estate administrators are increasingly interested in tapping into this technology, which allows them to cut and slice music in new ways for remixing, sampling or placements in commercials and advertisements. Often “you can’t rely on your original listeners to carry you into the future,” says Jessica Powell, co-founder and CEO of Audioshake. “You have to think creatively about how to reintroduce that music.”

Outside of the more specialized world of estates and catalogs, stem-separation is also being used widely by workaday musicians. Moises is another company that offers the technology; on some days, the platform’s users stem-separate 1 million different songs. “We have musicians all across the globe using it for practice purposes” — isolating guitar parts in songs to learn them better, or removing drums from a track to play along — says Geraldo Ramos, Moises’ co-founder and CEO.

While the ability to create missing stems has been around for at least a decade, the tech has been advancing especially rapidly since 2019 — when Deezer released Spleeter, which offered up “already trained state of the art models for performing various flavors of separation” — and 2020, when Meta released its own model called Demucs. Those “really opened the field and inspired a lot of people to build experiences based on stem separation, or even to work on it themselves,” Powell says. (She notes that AudioShake’s research was under way well before those releases.)

As a result, stem separation has “become super accessible,” according to Matt Henninger, Moises’ vp of sales and business development. “It might have been buried in Pro Tools five years ago, but now everyone can get their hands on it.” 

Where does artificial intelligence come in? Generative AI refers to programs that ingest reams of data and find patterns they can use to generate new datasets of a similar type. (Popular examples include DALL-E, which does this with images, and ChatGPT, which does it with text.) Stem separation tech finds the patterns corresponding to the different instruments in songs so that they can be isolated and removed from the whole.

“We basically train a model to recognize the frequencies and everything that’s related to a drum, to a bass, to vocals, both individually and how they relate to each other in a mix,” Ramos explains. Done at scale, with many thousands of tracks licensed from independent artists, the model eventually gets good enough to pull apart the constituent parts of a song it’s never seen before.

A lot of recordings are missing those building blocks. They could be older tracks that were cut in mono, meaning that individual parts were never tracked separately when the song was recorded. Or the original multi-track recordings could have been lost or damaged in storage.

Even in the modern world, it’s possible for stems to disappear in hard-drive crashes or other technical mishaps. The opportunity to create high-quality stems for recordings “where multi-track recordings aren’t available effectively unlocks content that is frozen in time,” says Steven Ames Brown, who administers Nina Simone‘s estate, among others.

Arron Saxe of Kinfolk Management, which includes the Otis Redding Estate, believes stem-separation can enhance the appeal of the soul great’s catalog for sample-based producers. “We have 280 songs, give or take, that Otis Redding wrote that sit in a pot,” he says. “How do you increase the value of each one of those? If doing that is pulling out a 1-second snare drum from one of those songs to sample, that’s great.” And it’s an appealing alternative to well-worn legacy marketing techniques, which Saxe jokes are “just box sets and new track listings of old songs.” 

Harnessing the tech is only “half the battle,” though. “The second part is a harder job,” Saxe says. “Do you know how to get the music to a big-name producer?” Murcia has been actively pitching electronic artists, hoping to pique their interest in sampling stems from Codiscos.

It can be similarly challenging to get the attention of a brand or music supervisor working in film and TV. But again, stem separation “allows editors to interact with or customize the music a lot more for a trailer in a way that is not usually possible with this kind of catalog material,” says Garret Morris, owner of Blackwatch Dominion, a full-service music publishing, licensing and rights management company that oversees a catalog extending from blues to boogie to Miami bass. 

Simpler than finding ways to open catalogs up to samplers is retooling old audio for the latest listening formats. Simone’s estate used stem-separation technology to create a spatial audio mix of her album Little Girl Blue as this style of listening continues to grow in popularity. (The number of Amazon Music tracks mixed in immersive-audio has jumped over 400% since 2019, for example.) 

Powell expects that the need for this adaptation will continue to grow. “If you buy into the vision presented by Apple, Facebook, and others, we will be interacting in increasingly immersive environments in the future,” she adds. “And audio that is surrounding us, just like it does in the real world, is a core component to have a realistic immersive experience.”

Brown says the spatial audio re-do of Simone’s album resulted in “an incremental increase in quality, and that can be enough to entice a brand new group of listeners.” “Most recording artists are not wealthy,” he continues. “Things that you can do to their catalogs so that the music can be fresh again, used in commercials and used in soundtracks of movies or TV shows, gives them something that makes a difference in their lives.”