A few weeks ago, OpenAI released Jukebox, “a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles.” The results are impressive, showing an ability to mimic specific artists’ style and voices. My personal favorite is this Simon & Garfunkel sample: when people say Jukebox is around 2014-15 in the GAN face timeline, that sample falls somewhere near Garfunkel-quality lyrics and Simon-quality vocals.
As more and more generative models are released for language, images, code, and music, it is important to ask questions about their legal footing and the wide-ranging impacts they may have, both on the incentives to invest in AI innovation, as well as on the creative efforts of the artists who develop the inputs that partially enable that innovation.
A number of people, including Ben Sobel in a wide-ranging survey of the space and OpenAI in a comment released with Jukebox, make the reasonable case that training a model is fair use. However, I want to take a different approach to the fair use debate. Even if there is solid legal footing for the right to train a model, much of the public response to generative models has been reactionary – How is this legal? How will this impact labor markets? How will we control monopoly power in this area? This suggests that more work is needed to make the fair use argument for the complete end-to-end models rather than just the training of a model and, to either bring more people on board with those arguments or explore new norms and regulations to apply to this space.
I want to suggest that fair use doctrine offers an interesting lens through which to consider these questions and to direct future research. While we may not have yet developed precise frameworks to regulate generative models, we can use existing legal structures to inform the discussion. Normally, a finding of fair use serves to promote innovation and protect competition, however in this case, given the scale required to train large models, a finding of fair use actually protects monopoly power. That is, the tenets of fair use may provide a lens to understand and address the scale of distributional concerns around generative models. For example, suppose we only allowed training an end-to-end model using just music for which the developer owns the copyright: Individual record companies could train, but only on music they owned the rights to. As a result, the outputs from these models would be worse, making consumers worse off, but potentially still sufficiently better off as to harm the value derived by the original artists.
This post will focus on two tenets of fair use as they apply to the end-to-end outputs of generative models: “the amount and substantiality of the portion taken” and “the effect of the use upon the potential market.” In this post, I collect some of my open questions, particularly those linked to the core initiatives of OpenAI. Expressing and discussing these in laymen’s terms will be key to democratizing and distributing AI technologies and surpluses.
The amount and substantiality of the portion taken
The finding of fair use hinges on whether a piece generated by Jukebox is a true “transformation.” However, in the past, this has always been a function of a human taking in some piece of content, understanding it, applying some intelligence or creativity specific to the type and quality of the input, and creating something new. However, the extent to which this describes the mechanism a generative model uses to produce output is up for debate and remains an area of active technical research. I want to propose three areas of ongoing research that may shed interesting results on the question of whether a piece generated by Jukebox is a true “transformation” of the original inputs.
The Copycat Test: How likely is a model is to come up with a direct copy of an existing Simon and Garfunkel song that its been trained on? Differential privacy research frames this as the inverse: how do we prevent a model from ever replicating its input thus potentially de-anonymizing training data? Similarly, this feeds into work in controllable generation – if we can force a model to plagiarize, does that change how we see the model’s originality? Sam Bowman’s lab does work on this for text models similar to Jukebox. Further research questions we might consider include asking how to “score” the amount of force required to force a model to plagiarize. The less force required, the more one would be inclined to conclude that the model has inherently taken something of value from the original artist.
The Preponderant Input Test: Alternatively, we might consider whether if Jukebox had been trained on the exact same training set minus Simon and Garfunkel and the output were substantively different and potentially worse – is that sufficient to credit Simon and Garfunkel as writers on the resulting track? “Retraining” a model without the offending input isn’t feasible when we think of models that take weeks to train. However, data deletion methods do essentially the same thing and may be used to address these concerns. These methods are motivated as part of the differential privacy literature to allow firms to continue using large models trained on a number of users’ data even after an individual user revokes access to their private data.
- The Cite Your Sources Test: Finally we could ask how much Simon and Garfunkel’s oeuvre contributed to Jukebox’s Simon and Garfunkel sample. Attribution and interpretability methods help link parts of a model’s output to individual aspects of its input. While this doesn’t link all the way back to training data, it offers a possible mechanism to test similarity (or detect plagiarism) when there is suspicion of copyright infringement. Moreover, we can also build models that themselves are trained to identify copyright infringement or similarity between works, and use these to inform citation scores or co-writer thresholds.
While none of these are fully developed, they represent valuable research directions that may offer a computational basis for providing credit to authors of the original inputs of a work. In practice, deploying them would require big rethinks for how large tech companies usually train and deploy generative models. Moreover, those firms have limited incentives to undertake this research if a finding of fair use is preemptively upheld. Yet, all of the methods are connected to core tenets of safe and fair AI because they teach us how a model learns and makes decisions.
The effect of the use on the potential market
OpenAI cites the decision in Authors Guild v. Google that snippets shown by Google Books “did not substitute” for the books themselves to argue that, similarly, the snippets employed in training Jukebox do not substitute for the works themselves. This is clearly true at training time, but it’s less clear whether fair use would cover using a generative model to, well, generate.
Imagine a Bizarro-world-Spotify made up entirely of generated samples. You could listen to “AI versions” of all of your favorite artists and Bizarro-Spotify avoids paying any royalties. Jukedeck, a start-up that promised to compose music given a video clip, is an example that’s already on the market (it has since been acquired by Bytedance). These applications do not clearly transform “the purpose and character of the use” of the original training data and may have a market-encroaching effect on “the potential market for or value of the copyrighted work.”
Are people going to Jukedeck/Spotify-alternative because they want to listen to new music – or ersatz cover versions of their favorite artists? Seth Stephens-Davidowitz finds that songs released in your early teens are the most popular among your age group in adulthood. Using 2018 Spotify data he shows that Radiohead’s “Creep” was the 164th most popular song among 38 year-old men (who were pre-bar mitzvah when the song came out) but did not crack the top 300 for men 10 years younger or older. Additionally there’s some evidence that people slow down listening to new music in their 30s and plenty that show streaming’s increasing use as a provider of background music. This is part of what made Pandora a reasonable alternative to iTunes at one point. Pandora was free but offered less control over what you heard and when, whereas iTunes lets you control listening to your favorite artists at a price. Most people don’t just listen to their favorite album on repeat, but they do tune into their favorite radio network that happens to play their favorite album on occasion.
While some market for real music may remain, it’s likely that this Bizarro-Spotify model would be able to displace at least part of the market (and royalties) that currently is owned by the original artists. That said, the effect may just be that we drive a bigger wedge between “superstar” artists and hobbyists making music in their bedrooms. We’ll place a high-value on the “authenticity” of both, but the run-of-the-mill popstar (Rita Ora, Bebe Rexha) will be replaced by music tailored to the listener.
The bottom line is that a Bizarro-Spotify would likely affect the market value of original artists – and so examining the economic impact on that – using a fair use lens – would be a valuable avenue of research. Indeed, while the literature on the impact of automation on labor markets is still quite under-developed, Michael Webb’s research suggests that the impact may be minimal and in fact focused first on largely “high-skilled” professions. Webb’s research relies on a novel technique that compares patent filings with job requirements to identify automatable skills. More research should be done on the effects of limited labor supply and, in particular, on how our perception of high-skilled and low-skilled work informs investment decisions in machine learning.
Fair use doctrine is a good starting place for understanding the potential social and economic ramifications of generative methods, even if it itself may not effectively police their use or address the ethical concerns. Those with the resources to develop, train, and deploy large scale generative models should hand in hand invest in the research discussed, focusing on developing language and technical tools that allow artists to enter the discussion.