Journalism of Courage

Premium

OpenAI accused of deleting potential evidence in New York Times copyright lawsuit

OpenAI had agreed to let the publishers’ lawyers look through its AI training datasets for any of their copyrighted content.

By: Tech Desk
New Delhi | Updated: November 22, 2024 06:09 PM IST

2 min read

Most AI companies like Google and OpenAI took a cautious approach to the US elections.

The New York Times sued OpenAI in December last year. (File photo)

Engineers working for OpenAI accidentally erased potential evidence in a copyright infringement lawsuit filed against the AI startup by news publishers, The New York Times and Daily News.

As part of the legal proceedings, OpenAI had agreed to let the publishers’ lawyers look through its AI training datasets for any of their copyrighted content. From November 1, a team of lawyers and experts started searching OpenAI’s training data on virtual machines set up by the company.

This story requires a subscription

Select a plan and use IE10 code to get 10% extra off

POPULAR

EXPRESS EDGE

Daily ePaper access

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Starts at ₹133/month

Best for UPSC aspirants

This story requires a subscription

Please Select A Plan

POPULAR

EXPRESS EDGE

Daily ePaper access

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Starts at ₹133/month

Best for UPSC aspirants

However, on November 14, lawyers for the publishers alleged that search data stored on one of the virtual machines after 150 hours of work had vanished. While OpenAI managed to retrieve most of the deleted data, the lawyers said that the recovered data did not include file names and folder structure. As a result, it “cannot be used to determine where the news plaintiffs’ copied articles were used to build [OpenAI’s] models,” the lawyers said in a letter filed in a US district court on Wednesday, November 20.

“News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time,” the letter read.

Also Read | 7 reasons why ChatGPT could be your go-to search engine

“The news plaintiffs learned only yesterday that the recovered data is unusable and that an entire week’s worth of its experts’ and lawyers’ work must be re-done, which is why this supplemental letter is being filed today,” it added.

While the publishers’ lawyers accepted that the data erasure was not done on purpose by OpenAI, it highlighted that the company was “in the best position to search its own datasets.”

Faced with multiple lawsuits by publishers alleging copyright infringement, OpenAI has argued that training its AI models on publicly available data such as news articles posted by The New York Times constitutes fair use of such content.

Story continues below this ad

On the other hand, the ChatGPT maker has also struck content licensing deals with a slew of major publishers including Reuters, Associated Press, Financial Times, and Axel Springer, the parent company of Business Insider and Politico.

Tags:

ChatGPT copyright lawsuit lawsuit in US Openai The New York Times

Journalism of Courage

Edition

Install the Express App for
a better experience

Featured

Today's E-paper
Dec 14, 2025