AI v. Human Reporters: The New York Times Sues OpenAI

On December 27, 2023, The New York Times (“The Times” or “Times”) officially filed suit against OpenAI and Microsoft, OpenAI’s principal investor. The complaint alleges that OpenAI obtained content published by The Times and used it to train its AI models without the consent of The Times. As alleged in the complaint, OpenAI’s generative AI models (like ChatGPT) are now competing with The Times as a news source in violation of the Copyright Act by generating output that “recites Times content verbatim, closely summarizes it, and mimics its expressive style.”

ChatGPT, perhaps the most famous generative AI (genAI) model owned by OpenAI, is “trained” to create outputs by scanning millions of websites and documents. It then compiles the information obtained through these scans and reproduces the information in response to prompts from end-users. The Times complaint alleges that this use violates its journalists’ exclusive rights over their works by reproducing them as part of the “training” process, redistributing them in whole or in part as “output”, and using them for a commercial purpose by profiting off the proliferation of ChatGPT and other models. A “snapshot” of the material used to train one of OpenAI’s models was taken in 2019 and shows www.nytimes.com as the fourth-highest represented source of proprietary content used to train the model. Along with the complaint, The Times filed several exhibits, at least one of which provides a side-by-side comparison of ChatGPT output and a Times piece. The text is nearly identical.

ChatGPT is not the only AI model implicated in this suit. Bing Chat, a Microsoft innovation which shows AI-generated sample web searches in a chat window, is also named in several exhibits. The complaint alleges that Bing Chat shows significantly larger portions of Times articles in its “sample” web searches than do other search engines, which have licensing agreements with The Times. The Times argues that these larger portions enable would-be patrons to bypass the paywall, subscription, or license required by The Times, thereby depriving them of profit and infringing copyright in much the same way as OpenAI’s GPT model.

OpenAI maintains that AI “training” constitutes “fair use”, an affirmative defense to the allegations put forth in the Times complaint. A piece on the OpenAI blog, titled "OpenAI and Journalism”, cites a range of opinions from professionals across various industries in support of this view. Interestingly, the same blog post accuses The Times of misrepresenting their case in the complaint. Specifically, the blog claims The Times “intentionally manipulated prompts” to obtain the nearly-identical text output mentioned earlier.

It is too early in this case to guess how a suit of this magnitude could shake out in court. However, we could imagine a few possibilities. If The Times wins each of their allegations, it could stunt the growth of genAI models by forcing developers to enter into licensure agreements with copyright holders, thereby increasing the barriers to entry and innovation in the AI space. On the other hand, if OpenAI prevails, it could sound the death knell of not only The Times, but perhaps the entire field of traditional journalism. Only time will tell what the courts have to say, but this will be a compelling case to watch wind its way through the federal court, with far-reaching consequences no matter the outcome.

References

Complaint at 4, The New York Times Company v. Microsoft Corporation et al., No. 1:23-cv-11195 (S.D.N.Y. 2023).

“Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus”

OpenAI

AI v. Human Reporters: The New York Times Sues OpenAI

Recent Posts

Comments