artificial intelligence
Page: 20
Futureverse — a multi-hyphenate AI company — published a new research paper on Thursday (June 9) to introduce its forthcoming text-to-music generator. Called Jen-1, the unreleased model is designed to improve upon issues found in currently available music generators like Google’s MusicLM, providing higher fidelity audio and longer, more complex musical works than what is on the market today.
“Jen is spelled J-E-N because she’s designed to be your friend who goes into the studio with you. She’s a tool,” says Shara Senderoff, co-founder of Futureverse and co-founder of Raised in Space, about the model in an exclusive first-look with Billboard. Predicted to release in early 2024, Jen can form up-to three minute songs as well as help producers with half-written songs through offering ‘continuation’ and ‘in-painting’ as well.
‘Continuation’ allows a music maker to upload an incomplete song to Jen and direct the model to create a plausible idea of how to finish the song, and ‘in-painting’ refers to a process by which the model can fill in spaces of a song that are damaged or incomplete in the middle of the work. To Aaron McDonald, the company’s co-founder, Jen’s role is to “extend creativity” of human artists.
When asked why Jen is a necessary invention during a time in which producers, songwriters and artists are more bountiful than ever, McDonald replied, “I think musicians throughout the ages have always embraced new technology that expands the way they can create music,” pointing to electronic music as one example of how new tools shape musical evolution. “To imply that music doesn’t need [any new] technology to expand and become better now is kind of silly… and arbitrary.”
He also sees this as a way to “democratize” the “high end of music [quality],” which he says is now only accessible to musicians with the means to record at a well-equipped studio and with trained technicians. With Jen, Johnson and Senderoff hope to satisfy the interests of professional musicians and to encourage newcomers to dabble in songwriting, perhaps for the first time. The two co-founders imagine a world in which everyday people can create music, and have nicknamed the products of this type of user as ‘AIGC,’ a twist on the term User Generated Content (or ‘UGC’).
Futureverse was formed piecemeal over the last 18 months, merging eleven different pre-existing AI and metaverse start-ups together into one company to make a number of creative AI models, including those that produce animations, music, sound effects and more. To power their inventions, the company employs the AI protocol from Altered State Machine, a company that was founded by Johnson and included in the merger.
Senderoff says Jen will also be a superior product because Futureverse created it with the input of some of music’s top business executives and creators, unlike its competitors. Though Senderoff does not reveal who the industry partners are or how Jen will be a more ethical and cooperative model for musicians, but she assures an announcement will be released soon providing more information.
Despite its proposed upgrades, Futureverse’s Jen could face significant challenges from other text-to-music generators named in the new research paper, given some were made by the world’s most established tech giants and have already hit the market, but McDonald is unperturbed. “That forces us to think differently. We don’t have the resources that they do, but we started our process with that in mind. I think we can beat them with a different approach: the key insight is working with the music industry as a way to produce a better product.”
Universal Music Group is in the early stages of talks with Google about licensing artists’ voices for songs created by artificial intelligence, according to The Financial Times. Warner Music Group has also discussed this possibility, The Financial Times reported.
Advances in artificial-intelligence-driven technology have made it relatively easy for a producer sitting at home to create a song involving a convincing facsimile of a superstar’s voice — without that artist’s permission. Hip-hop super-fans have been using the technology to flesh out unfinished leaks of songs from their favorite rappers.
One track in particular grabbed the industry’s attention in March: “Heart On My Sleeve,” which masqueraded as a new collaboration between Drake and the Weeknd. At the time, a Universal Music spokesperson issued a statement saying that “stakeholders in the music ecosystem” have to choose “which side of history… to be on: the side of artists, fans and human creative expression, or on the side of deep fakes, fraud and denying artists their due compensation.”
“In our conversations with the labels, we heard that the artists are really pissed about this stuff,” Geraldo Ramos, co-founder and CEO of the music technology company Moises, told Billboard recently. (Moises has developed its own AI-driven voice-cloning technology, along with the technology to detect whether a song clones someone else’s voice.) “How do you protect that artist if you’re a label?” added Matt Henninger, Moises’ vp of sales and business development.
The answer is probably licensing: Develop a system in which artists who are fine with having their voices cloned clear those rights — in exchange for some sort of compensation — while those acts who are uncomfortable with being replicated by technology can opt out. Just as there is a legal framework in place that allows producers to sample 1970s soul, for example, by clearing both the master and publishing rights, in theory there could be some sort of framework through which producers obtain permission to clone a superstar’s voice.
AI-driven technology could “enable fans to pay their heroes the ultimate compliment through a new level of user-driven content,” Warner CEO Robert Kyncl told financial analysts this week. (“There are some [artists] that may not like it,” he continued, “and that’s totally fine.”)
On the same investor call, Kyncl also singled out “one of the first official and professionally AI-generated songs featuring a deceased artist, which came through our ADA Latin division:” A new Pedro Capmany track featuring AI-generated vocals from his father Jose, who died in 2001. “After analyzing hundreds of hours of interviews, acappellas, recorded songs, and live performances from Jose’s career, every nuance and pattern of his voice was modeled using AI and machine learning,” Kyncl explained.
After the music industry’s initial wave of alarm about AI, the conversation has shifted, according to Henninger. With widely accessible voice-cloning technology available, labels can’t really stop civilians from making fake songs accurately mimicking their artists’ vocals. But maybe there’s a way they can make money from all the replicants.
Henninger is starting to hearing different questions around the music industry. “How can [AI] be additive?” he asks. “How can it help revenue? How can it build someone’s brand?”
Reps for Universal and Warner did not respond to requests for comment.
“Fake Drake” and similar controversies have gotten most of the attention, but not all uses of artificial intelligence in music are cause for concern.
When a young Evan Bogart tried his hand at writing a few pop songs for a girl group he managed, he had no idea he would score one of the biggest Billboard hits of 2006.
After the act disbanded, Bogart decided to pitch the songs to labels. One of them landed with a then-fledgling pop artist named Rihanna, who was signed to Def Jam Recordings. Bogart’s song, “S.O.S.,” not only broke Rihanna — it jumped 33 spots to No. 1 on the Billboard Hot 100 in a single week — it minted his songwriting career.
Multiple hits later, Bogart runs his own publishing company and label, Seeker Music, where he encourages his songwriters to create “pitch records” — songs written by songwriters, recorded as demos, and then shopped to various artists. It’s a common practice that increasingly employs a new — albeit controversial — hack: artificial intelligence voice synthesis, which mimics the voice of the artist being pitched.
Bogart says the technology helps his roster better tailor pitches to talent and enables the artists to envision themselves on the track. At a time when acts are demanding a weightier role in the song creation process, AI voice generation offers a creative way to get their attention.
“Producers and writers have always tried to mimic the artists’ voice on these demos anyway,” says attorney Jason Berger, whose producer and songwriter clients are beginning to experiment with AI vocals for their pitches. “I feel like this technology is very impactful because now you can skip that step with AI.”
Traditionally, songwriters will either sing through the track themselves for a demo recording or employ a demo singer. In cases when writers have a specific artist in mind, a soundalike demo singer may be employed to mimic the artist’s voice for about $250-500 per cut. (One songwriter manager said there are a few in particular who make good money imitating Maroon 5’s Adam Levine, Justin Bieber, and other top tier acts. In general, however, nearly all demo singers hold other jobs in music like background singing, writing, producing or engineering.)
The emerging technology doesn’t generate a melody and vocal from scratch but instead maps the AI-generated tone of the artist’s voice atop a prerecorded vocal. Popular platforms include CoversAI, Uberduck, KitsAI, and Grimes’ own voice model, which she made available for public use in May. Still, these models yield mixed results.
Some artists’ voices might be easier for AI to imitate because they employ Auto-Tune or other voice-processing technology when they record, normalizing the voice and giving it an already computerized feel. A large catalog of recordings also helps because it offers more training material.
“Certain voices sound really good, but others are not so good,” Bogart says, but he adds that he actually “likes that it sounds a little different from a real voice. I’m not trying to pretend the artist is truly on the song. I’m just sending people a robotic version of the artist to help them hear if the song is a good fit.”
Training is one of the most contentious areas of generative AI because the algorithms are often fed copyrighted material, like sound recordings, without owners’ knowledge or compensation. The legality of this is still being determined in the United States and other countries, but any restrictions that arise probably won’t apply to pitch records because they aren’t released commercially.
“I really haven’t had any negative reactions,” Bogart says of his efforts. “No one’s said ‘did you just pitch your song with my artists’ voice on it to me?’”
Stefán Heinrich, founder and CEO of CoversAI creator mayk.it, says voice re-creation tools could even democratize the songwriting profession altogether, allowing talented unknown writers a chance at getting noticed. “Until now, you had to have the right connections to pitch your songs to artists,” he says. “Now an unknown songwriter can use the power of the technology and the reach of TikTok to show your skills to others and get invited into those rooms.”
While Nick Jarjour — founder and CEO of JarjourCo, advisor to mayk.it and former global head of song management at Hipgnosis — supports the ethical use of this technology, he believes that the industry should take a different approach to applying AI voices on pitches. “The solution is letting the artist who is receiving the demos decide to put their AI voice onto it themselves,” he says, as opposed to publishers and writers sending over demos with the AI treatment already provided. To do this, artists can create their own personal voice models that are more accurate and tailored to their needs, much like Grimes has already done, and then apply those to pitches they receive.
Still, as Berger says, “this is evolving by the day.” Most publishers haven’t put this technology into every day practice yet, but now more are discussing the idea publicly. At the Association of Independent Music Publishers (AIMP) annual conference in New York City last month, Katie Fagan, head of A&R for Prescription Songs Nashville, said that she recently saw AI vocals on a pitch record for the first time. One of her writers had tested AI to add the voice of Cardi B to the demo. “It could be an interesting pitch tool in the future,” she said, noting that this technology could be used even more simply to change the gender of the demo singer when pitching the same demo to a mix of male and female artists.
“I really don’t see why you wouldn’t pitch a song with a voice that sounds as close as possible to the artist, given the goal is helping the artist hear themselves on the track,” says Berger. “My guess is that people will get used to this pretty quick. I think in six months we are going to have even more to talk about.”
In the more distant future, Bogart wonders what might happen if, as the technology advances, pitch records become the final step in the creative process. “What would be really scary is if someone asks the artist, ‘Hey, do you want to cut this?’ And they reply, ‘I don’t have to, that’s me.’”
Dennis Murcia was excited to get an email from Disney, but the thrill was short-lived. As an A&R and global development executive for the label Codiscos — founded in 1950, Murcia likens it to “Motown of Latin America” — part of his job revolves around finding new listeners for a catalog of older songs. Disney reached out in 2020 hoping to use Juan Carlos Coronel’s zippy recording of “Colombia Tierra Querida,” written by Lucho Bermudez, in the trailer for an upcoming film titled Encanto. The problem was: The movie company wanted the instrumental version of the track, and Codiscos didn’t have one.
“I had to scramble,” Murcia recalls. A friend recommended that he try AudioShake, a company that uses artificial intelligence-powered technology to dissect songs into their component parts, known as stems. Murcia was hesitant — “removing vocals is not new, but it was never ideal; they always came out with a little air.” He needed to try something, though, and it turned out that AudioShake was able to create an instrumental version of “Colombia Tierra Querida” that met Disney’s standards, allowing the track to appear in the trailer.
“It was a really important synch placement” for us, Murcia says. He calls quality stem-separation technology “one of the best uses of AI I’ve seen,” capable of opening “a whole new profit center” for Codiscos.
Catalog owners and estate administrators are increasingly interested in tapping into this technology, which allows them to cut and slice music in new ways for remixing, sampling or placements in commercials and advertisements. Often “you can’t rely on your original listeners to carry you into the future,” says Jessica Powell, co-founder and CEO of Audioshake. “You have to think creatively about how to reintroduce that music.”
Outside of the more specialized world of estates and catalogs, stem-separation is also being used widely by workaday musicians. Moises is another company that offers the technology; on some days, the platform’s users stem-separate 1 million different songs. “We have musicians all across the globe using it for practice purposes” — isolating guitar parts in songs to learn them better, or removing drums from a track to play along — says Geraldo Ramos, Moises’ co-founder and CEO.
While the ability to create missing stems has been around for at least a decade, the tech has been advancing especially rapidly since 2019 — when Deezer released Spleeter, which offered up “already trained state of the art models for performing various flavors of separation” — and 2020, when Meta released its own model called Demucs. Those “really opened the field and inspired a lot of people to build experiences based on stem separation, or even to work on it themselves,” Powell says. (She notes that AudioShake’s research was under way well before those releases.)
As a result, stem separation has “become super accessible,” according to Matt Henninger, Moises’ vp of sales and business development. “It might have been buried in Pro Tools five years ago, but now everyone can get their hands on it.”
Where does artificial intelligence come in? Generative AI refers to programs that ingest reams of data and find patterns they can use to generate new datasets of a similar type. (Popular examples include DALL-E, which does this with images, and ChatGPT, which does it with text.) Stem separation tech finds the patterns corresponding to the different instruments in songs so that they can be isolated and removed from the whole.
“We basically train a model to recognize the frequencies and everything that’s related to a drum, to a bass, to vocals, both individually and how they relate to each other in a mix,” Ramos explains. Done at scale, with many thousands of tracks licensed from independent artists, the model eventually gets good enough to pull apart the constituent parts of a song it’s never seen before.
A lot of recordings are missing those building blocks. They could be older tracks that were cut in mono, meaning that individual parts were never tracked separately when the song was recorded. Or the original multi-track recordings could have been lost or damaged in storage.
Even in the modern world, it’s possible for stems to disappear in hard-drive crashes or other technical mishaps. The opportunity to create high-quality stems for recordings “where multi-track recordings aren’t available effectively unlocks content that is frozen in time,” says Steven Ames Brown, who administers Nina Simone‘s estate, among others.
Arron Saxe of Kinfolk Management, which includes the Otis Redding Estate, believes stem-separation can enhance the appeal of the soul great’s catalog for sample-based producers. “We have 280 songs, give or take, that Otis Redding wrote that sit in a pot,” he says. “How do you increase the value of each one of those? If doing that is pulling out a 1-second snare drum from one of those songs to sample, that’s great.” And it’s an appealing alternative to well-worn legacy marketing techniques, which Saxe jokes are “just box sets and new track listings of old songs.”
Harnessing the tech is only “half the battle,” though. “The second part is a harder job,” Saxe says. “Do you know how to get the music to a big-name producer?” Murcia has been actively pitching electronic artists, hoping to pique their interest in sampling stems from Codiscos.
It can be similarly challenging to get the attention of a brand or music supervisor working in film and TV. But again, stem separation “allows editors to interact with or customize the music a lot more for a trailer in a way that is not usually possible with this kind of catalog material,” says Garret Morris, owner of Blackwatch Dominion, a full-service music publishing, licensing and rights management company that oversees a catalog extending from blues to boogie to Miami bass.
Simpler than finding ways to open catalogs up to samplers is retooling old audio for the latest listening formats. Simone’s estate used stem-separation technology to create a spatial audio mix of her album Little Girl Blue as this style of listening continues to grow in popularity. (The number of Amazon Music tracks mixed in immersive-audio has jumped over 400% since 2019, for example.)
Powell expects that the need for this adaptation will continue to grow. “If you buy into the vision presented by Apple, Facebook, and others, we will be interacting in increasingly immersive environments in the future,” she adds. “And audio that is surrounding us, just like it does in the real world, is a core component to have a realistic immersive experience.”
Brown says the spatial audio re-do of Simone’s album resulted in “an incremental increase in quality, and that can be enough to entice a brand new group of listeners.” “Most recording artists are not wealthy,” he continues. “Things that you can do to their catalogs so that the music can be fresh again, used in commercials and used in soundtracks of movies or TV shows, gives them something that makes a difference in their lives.”
President Joe Biden said Friday that new commitments by Amazon, Google, Meta, Microsoft and other companies that are leading the development of artificial intelligence technology to meet a set of AI safeguards brokered by his White House are an important step toward managing the “enormous” promise and risks posed by the technology.
Biden announced that his administration has secured voluntary commitments from seven U.S. companies meant to ensure their AI products are safe before they release them. Some of the commitments call for third-party oversight of the workings of commercial AI systems, though they don’t detail who will audit the technology or hold the companies accountable.
“We must be clear eyed and vigilant about the threats emerging technologies can pose,” Biden said, adding that the companies have a “fundamental obligation” to ensure their products are safe.
“Social media has shown us the harm that powerful technology can do without the right safeguards in place,” Biden added. “These commitments are a promising step, but we have a lot more work to do together.”
A surge of commercial investment in generative AI tools that can write convincingly human-like text and churn out new images and other media has brought public fascination as well as concern about their ability to trick people and spread disinformation, among other dangers.
The four tech giants, along with ChatGPT-maker OpenAI and startups Anthropic and Inflection, have committed to security testing “carried out in part by independent experts” to guard against major risks, such as to biosecurity and cybersecurity, the White House said in a statement.
That testing will also examine the potential for societal harms, such as bias and discrimination, and more theoretical dangers about advanced AI systems that could gain control of physical systems or “self-replicate” by making copies of themselves.
The companies have also committed to methods for reporting vulnerabilities to their systems and to using digital watermarking to help distinguish between real and AI-generated images known as deepfakes.
They will also publicly report flaws and risks in their technology, including effects on fairness and bias, the White House said.
The voluntary commitments are meant to be an immediate way of addressing risks ahead of a longer-term push to get Congress to pass laws regulating the technology. Company executives plan to gather with Biden at the White House on Friday as they pledge to follow the standards.
Some advocates for AI regulations said Biden’s move is a start but more needs to be done to hold the companies and their products accountable.
“A closed-door deliberation with corporate actors resulting in voluntary safeguards isn’t enough,” said Amba Kak, executive director of the AI Now Institute. “We need a much more wide-ranging public deliberation, and that’s going to bring up issues that companies almost certainly won’t voluntarily commit to because it would lead to substantively different results, ones that may more directly impact their business models.”
Senate Majority Leader Chuck Schumer, D-N.Y., has said he will introduce legislation to regulate AI. He said in a statement that he will work closely with the Biden administration “and our bipartisan colleagues” to build upon the pledges made Friday.
A number of technology executives have called for regulation, and several went to the White House in May to speak with Biden, Vice President Kamala Harris and other officials.
Microsoft President Brad Smith said in a blog post Friday that his company is making some commitments that go beyond the White House pledge, including support for regulation that would create a “licensing regime for highly capable models.”
But some experts and upstart competitors worry that the type of regulation being floated could be a boon for deep-pocketed first-movers led by OpenAI, Google and Microsoft as smaller players are elbowed out by the high cost of making their AI systems known as large language models adhere to regulatory strictures.
The White House pledge notes that it mostly only applies to models that “are overall more powerful than the current industry frontier,” set by currently available models such as OpenAI’s GPT-4 and image generator DALL-E 2 and similar releases from Anthropic, Google and Amazon.
A number of countries have been looking at ways to regulate AI, including European Union lawmakers who have been negotiating sweeping AI rules for the 27-nation bloc that could restrict applications deemed to have the highest risks.
U.N. Secretary-General Antonio Guterres recently said the United Nations is “the ideal place” to adopt global standards and appointed a board that will report back on options for global AI governance by the end of the year.
Guterres also said he welcomed calls from some countries for the creation of a new U.N. body to support global efforts to govern AI, inspired by such models as the International Atomic Energy Agency or the Intergovernmental Panel on Climate Change.
The White House said Friday that it has already consulted on the voluntary commitments with a number of countries.
The pledge is heavily focused on safety risks but doesn’t address other worries about the latest AI technology, including the effect on jobs and market competition, the environmental resources required to build the models, and copyright concerns about the writings, art and other human handiwork being used to teach AI systems how to produce human-like content.
Last week, OpenAI and The Associated Press announced a deal for the AI company to license AP’s archive of news stories. The amount it will pay for that content was not disclosed.
LONDON — When the European Union announced plans to regulate artificial intelligence in 2021, legislators started focusing on “high risk” systems that could threaten human rights, such as biometric surveillance and predictive policing. Amid increasing concern among artists and rights holders about the potential impact of AI on the creative sector, however, EU legislators are also now looking at the intersection of this new technology and copyright.
The EU’s Artificial Intelligence Act, which is now being negotiated among politicians in different branches of government, is the first comprehensive legislation in the world to regulate AI. In addition to banning “intrusive and discriminatory uses” of the technology, the current version of the legislation addresses generative AI, mandating that companies disclose content that is created by AI to differentiate it from works authored by humans. Other provisions in the law would require companies that use generative AI to provide details of copyrighted works, including music, on which they trained their systems. (The AI Act is a regulation, so it would pass directly into law in all 27 member states.)
Music executives began paying closer attention to the legislation after the November launch of ChatGPT. In April, around the time that “Heart on My Sleeve,” a track that featured AI-powered imitations of vocals by Drake and The Weeknd, drove home the issue posed by AI, industry lobbyists convinced lawmakers to add the transparency provisions.
So far, big technology companies, including Alphabet, Meta and Microsoft, have publicly stated that they, too, support AI regulation, at least in the abstract. Behind the scenes, however, multiple music executives tell Billboard that technology lobbyists are trying to weaken these transparency provisions by arguing that such obligations could put European AI developers at a competitive disadvantage.
“They want codes of conduct” — as opposed to laws — “and very low forms of regulation,” says John Phelan, director general of international music publishing trade association ICMP.
Another argument is that summarizing training data “would basically come down to providing a summary of half, or even the entire, internet,” says Boniface de Champris, Brussels-based policy manager at the Computer and Communications Industry Association Europe, which counts Alphabet, Apple, Amazon and Meta among its members. “Europe’s existing copyright rules already cover AI applications sufficiently.”
In May, Sam Altman, CEO of ChatGPT developer OpenAI, emerged as the highest-profile critic of the EU’s proposals, accusing it of “overregulating” the nascent business. He even said that his company, which is backed by Microsoft, might consider leaving Europe if it could not comply with the legislation, although he walked back this statement a few days later. OpenAI and other companies lobbied — successfully — to have an early draft of the legislation changed so that “general-purpose AI systems” like ChatGPT would no longer be considered high risk and thus subject to stricter rules, according to documents Time magazine obtained from the European Commission. (OpenAI didn’t respond to Billboard’s requests for comment.)
The lobbying over AI echoes some of the other political conflicts between media and technology companies — especially the one over the EU Copyright Directive, which passed in 2019. While that “was framed as YouTube versus the music industry, the narrative has now switched to AI,” says Sophie Goossens, a partner at global law firm Reed Smith. “But the argument from rights holders is much the same: They want to stop tech companies from making a living on the backs of their content.”
Several of the provisions in the Copyright Directive deal with AI, including an exception in the law for text- and data-mining of copyrighted content, such as music, in certain cases. Another exception allows scientific and research institutions to engage in text- and data-mining on works to which they have lawful access.
So far, the debate around generative AI in the United States has focused on whether performers can use state laws on right of publicity to protect their distinctive voices and images — the so-called “output side” of generative AI. In contrast, both the Copyright Directive and the AI Act address the “input side,” meaning ways that rights holders can either stop AI systems from using their content for training purposes or limit which ones can in order to license that right.
Another source of tension created by the Copyright Directive is the potential for blurred boundaries between research institutions and commercial businesses. Microsoft, for example, refers to its Muzic venture as “a research project on AI music,” while Google regularly partners with independent research, academic and scientific bodies on technology developments, including AI. To close potential loopholes, Phelan wants lawmakers to strengthen the bill’s transparency provisions, requiring specific details of all music accessed for training, instead of the “summary” that’s currently called for. IFPI, the global recorded-music trade organization, regards the transparency provisions as “a meaningful step in the right direction,” according to Lodovico Benvenuti, managing director of its European office, and he says he hopes lawmakers won’t water that down.
The effects of the AI Act will be felt far outside Europe, partly because they will apply to any company that does business in the 27-country bloc and partly because it will be the first comprehensive set of rules on the use of the technology. In the United States, the Biden administration has met with technology executives to discuss AI but has yet to lay out a legislation strategy. On June 22, Senate Majority Leader Chuck Schumer, D-N.Y., said that he was working on “exceedingly ambitious” bipartisan legislation on the topic, but political divides in the United States as the next presidential election approaches would make passage difficult. China unveiled its own draft laws in April, although other governments may be reluctant to look at legislation there as a model.
“The rest of the world is looking at the EU because they are leading the way in terms of how to regulate AI,” says Goossens. “This will be a benchmark.”

Universal Music Group general counsel/executive vp of business and legal affairs, Jeffery Harleston, spoke as a witness in a Senate Judiciary Committee hearing on AI and copyright on Wednesday (July 12) to represent the music industry. In his remarks, the executive called for a “federal right of publicity” — the state-by-state right that protects artists’ likenesses, names, and voices — as well as for “visibility into AI training data” and for “AI-generated content to be labeled as such.”
Harleston was joined by other witnesses including Karla Ortiz, a conceptual artist and illustrator who is waging a class action lawsuit against Stability AI; Matthew Sag, professor of artificial intelligence at Emory University School of Law; Dana Rao, executive vp/general counsel at Adobe; and Ben Brooks, head of public policy at Stability AI.
“I’d like to make four key points to you today,” Harleston began. “First, copyright, artists, and human creativity must be protected. Art and human creativity are central to our identity.” He clarified that AI is not necessarily always an enemy to artists, and can be used in “service” to them as well. “If I leave you with one message today, it is this: AI in the service of artists and creativity can be a very, very good thing. But AI that uses, or, worse yet, appropriates the work of these artists and creators and their creative expression, their name, their image, their likeness, their voice, without authorization, without consent, simply is not a good thing,” he said.
Second, he noted the challenges that generative AI poses to copyright. In written testimony, he noted the concern of “AI-generated music being used to generate fraudulent plays on streaming services, siphoning income from human creators.” And while testifying at the hearing, he added, “At Universal, we are the stewards of tens of thousands, if not hundreds of thousands, of copyrighted creative works from our songwriters and artists, and they’ve entrusted us to honor, value and protect them. Today, they are being used to train generative AI systems without authorization. This irresponsible AI is violative of copyright law and completely unnecessary.”
Training is one of the most contentious areas of generative AI for the music industry. In order to get an AI model to learn how to generate a human voice, a drum beat or lyrics, the AI model will train itself on up to billions of data points. Often this data contains copyrighted material, like sound recordings, without the owner’s knowledge or compensation. And while many believe this should be considered a form of copyright infringement, the legality of using copyrighted works as training data is still being determined in the United States and other countries.
The topic is also the source of Ortiz’s class action lawsuit against Stability AI. Her complaint, filed in California federal court along with two other visual artists, alleges that the “new” images generated by Stability AI’s Stable Diffusion model used their art “without the consent of the artists and without compensating any of those artists,” which they feel makes any resulting generation from the AI model a “derivative work.”
In his spoken testimony, Harleston pointed to today’s “robust digital marketplace” — including social media sites, apps and more — in which “thousands of responsible companies properly obtained the rights they need to operate. There is no reason that the same rules should not apply equally to AI companies.”
Third, he reiterated that “AI can be used responsibly…just like other technologies before.” Among his examples of positive uses of AI, he pointed to Lee Hyun [aka MIDNATT], a K-pop artist distributed by UMG who used generative AI to simultaneously release the same single in six languages using his voice on the same day. “The generative AI tool extended the artist’s creative intent and expression with his consent to new markets and fans instantly,” Harleston said. “In this case, consent is the key,” he continued, echoing Ortiz’s complaint.
While making his final point, Harleston urged Congress to act in several ways — including by enacting a federal right of publicity. Currently, rights of publicity vary widely state by state, and many states’ versions include limitations, including less protection for some artists after their deaths.
The shortcomings of this state-by-state system were highlighted when an anonymous internet user called Ghostwriter posted a song — apparently using AI to mimic the voices of Drake and The Weeknd –called “Heart On My Sleeve.” The track’s uncanny rendering of the two major stars immediately went viral, urging the music business to confront the new, fast-developing concern of AI voice impersonation.
A month later, sources told Billboard that the three major label groups — UMG, Warner Music Group and Sony Music — have been in talks with the big music streaming services to allow them to cite “right of publicity” violations as a reason to take down songs with AI vocals. Removing songs based on right of publicity violations is not required by law, so the streamers’ reception to the idea appears to be voluntary.
“Deep fakes, and/or unauthorized recordings or visuals of artists generated by AI, can lead to consumer confusion, unfair competition against the artists that actually were the original creator, market dilution and damage to the artists’ reputation or potentially irreparably harming their career. An artist’s voice is often the most valuable part of their livelihood and public persona. And to steal it, no matter the means, is wrong,” said Harleston.
In his written testimony, Harleston went deeper, stating UMG’s position that “AI generated, mimicked vocals trained on vocal recordings from our copyrighted recordings go beyond Right of Publicity violations… copyright law has clearly been violated.” Many AI voice uses circulating the internet involve users mashing up one previously released song topped with a different artist’s voice. These types of uses, Harleston wrote, mean “there are likely multiple infringements occurring.”
Harleston added that “visibility into AI training data is also needed. If the data on AI training is not transparent, the potential for a healthy marketplace will be stymied as information on infringing content will be largely inaccessible to individual creators.”
Another witness at the hearing raised the idea of an “opt-out” system so that artists who do not wish to be part of an AI’s training data set will have the option of removing themselves. Already, Spawning, a music-tech start-up, has launched a website to put this possible remedy into practice for visual art. Called “HaveIBeenTrained.com,’ the service helps creators opt-out of training data sets commonly used by an array of AI companies, including Stability AI, which previously agreed to honor the HaveIBeenTrained.com opt-outs.
Harleston, however, said he did not believe opt-outs are enough. “It will be hard to opt out if you don’t know what’s been opted in,” he said. Spawning co-founder Mat Dryhurst previously told Billboard that HaveIBeenTrained.com is working on an opt-in tool, though this product has yet to be released.
Finally, Harleston urged Congress to label AI-generated content. “Consumers deserve to know exactly what they’re getting,” he said.
From ChatGPT writing code for software engineers to Bing’s search engine sliding in place of your bi-weekly Hinge binge, we’ve become obsessed with the capacity for artificial intelligence to replace us.
Within creative industries, this fixation manifests in generative AI. With models like DALL-E generating images from text prompts, the popularity of generative AI challenges how we understand the integrity of the creative process: When generative models are capable of materializing ideas, if not generating their own, where does that leave artists?
Google’s new text-based music generative AI, MusicLM, offers an interesting answer to this viral terminator-meets-ex-machina narrative. As a model that produces “high-fidelity music from text descriptions,” MusicLM embraces moments lost in translation that encourages creative exploration. It sets itself apart from other music generation models like Jukedeck and MuseNet by inviting users to verbalize their original ideas rather than toggle with existing music samples.
Describing how you feel is hard
AI in music is not new. But between recommending songs for Spotify’s Discover Weekly playlists to composing royalty free music with Jukedeck, applications of AI in music have evaded the long-standing challenge of directly mapping words to music.
This is because, as a form of expression on its own, music resonates differently to each listener. The same way that different languages struggle to perfectly communicate nuances of respective cultures, it is difficult (if not impossible) to exhaustively capture all dimensions of music in words.
MusicLM takes on this challenge by generating audio clips from descriptions like “a calming violin melody backed by a distorted guitar riff,” even accounting for less tangible inputs like “hypnotic and trance-like.” It approaches this thorny question of music categorization with a refreshing sense of self awareness. Rather than focusing on lofty notions of style, MusicLM grounds itself in more tangible attributes of music with tags such as “snappy”, or “amateurish.” It broadly considers where an audio clip may come from (eg. “Youtube Tutorial”), the general emotional responses it may conjure (eg. “madly in love”), while integrating more widely accepted concepts of genre and compositional technique.
What you expect is (not) what you get
Piling onto this theoretical question of music classification is the more practical shortage of training data. Unlike its creative counterparts (e.g. DALL-E), there isn’t an abundance of text-to-audio captions readily available.
MusicLM was trained by a library of 5,521 music samples captioned by musicians called ‘MusicCaps.’ Bound by the very human limitation of capacity and the almost-philosophical matter of style, MusicCaps offers finite granularity in its semantic interpretation of musical characteristics. The result is occasional gaps between user inputs and generated outputs: the “happy, energetic” tune you asked for may not turn out as you expect.
However, when asked about this discrepancy, MusicLM researcher Chris Donahue and research software engineer Andrea Agostinelli celebrate the human element of the model. They describe primary applications such as “[exploring] ideas more efficiently [or overcoming] writer’s block,” quick to note that MusicLM does offer multiple interpretations of the same prompt — so if one generated track fails to meet your expectations, another might.
“This [disconnect] is a big research direction for us, there isn’t a single answer,” Andrea admits. Chris attributes this disconnect to the “abstract relationship between music and text” insisting that “how we react to music is [even more] loosely defined.”
In a way — by fostering an exchange that welcomes moments lost in translation — MusicLM’s language-based structure positions the model as a sounding board: as you prompt the model with a vague idea, the generation of approximates help you figure out what you actually want to make.
Beauty is in breaking things
With their experience producing Chain Tripping (2019) — a Grammy-nominated album entirely made with MusicVAE (another music generative AI developed by Google) — the band YACHT chimes in on MusicLM’s future in music production. “As long as it can be broken apart a little bit and tinkered with, I think there’s great potential,” says frontwoman Claire L. Evans.
To YACHT, generative AI exists as a means to an end, rather than the end in itself. “You never make exactly what you set out to make,” says founding member Jona Bechtolt, describing the mechanics of a studio session. “It’s because there’s this imperfect conduit that is you” Claire adds, attributing the alluring and evocative process of producing music to the serendipitous disconnect that occurs when artists put pen to paper.
The band describes how the misalignment of user inputs and generated work inspires creativity through iteration. “There is a discursive quality to [MusicLM]… it’s giving you feedback… I think it’s the surreal feeling of seeing something in the mirror, like a funhouse mirror,” says Claire. “A computer accent,” band member Rob Kieswetter jokes, referencing a documentary about the band’s experience making Chain Tripping.
However, in discussing the implications of this move to text-to-audio generation, Claire cautions the rise of taxonomization in music: “imperfect semantic elements are great, it’s the precise ones that we should worry about… [labels] create boundaries to discovery and creation that don’t need to exist… everyone’s conditioned to think about music as this salad of hyper-specific genre references [that can be used] to conjure a new song.”
Nonetheless, both YACHT and the MusicLM team agrees that MusicLM — as it currently is — holds promise. “Either way there’s going to be a whole new slew of artists fine-tuning this tool to their needs,” Rob contends.
Engineer Andrea recalls instances where creative tools weren’t popularized for its intended purpose: “the synthesizer eventually opened up a huge wave of new genres and ways of expression. [It unlocked] new ways to express music, even for people who are not ‘musicians.’” “Historically, it has been pretty difficult to predict how each piece of music technology will play out,” researcher Chris concludes.
Happy accidents, reinvention, and self-discovery
Back to the stubborn, unforgiving question: Will generative AI replace musicians? Perhaps not.
The relationship between artists and AI is not a linear one. While it’s appealing to prescribe an intricate and carefully intentional system of collaboration between artists and AI, as of right now, the process of using AI in producing art resembles more of a friendly game of trial and error.
In music, AI gives room for us to explore the latent spaces between what we describe and what we really mean. It materializes ideas in a way that helps shape creative direction. By outlining these acute moments lost in translation, tools like MusicLM sets us up to produce what actually ends up making it to the stage… or your Discover Weekly.
Tiffany Ng is an art & tech writer based in NYC. Her work has been published in i-D Vice, Vogue, South China Morning Post, and Highsnobiety.