
Engineers working for OpenAI accidentally erased potential evidence in a copyright infringement lawsuit filed against the AI startup by news publishers, The New York Times and Daily News.
As part of the legal proceedings, OpenAI had agreed to let the publishers’ lawyers look through its AI training datasets for any of their copyrighted content. From November 1, a team of lawyers and experts started searching OpenAI’s training data on virtual machines set up by the company.
“News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time,” the letter read.
“The news plaintiffs learned only yesterday that the recovered data is unusable and that an entire week’s worth of its experts’ and lawyers’ work must be re-done, which is why this supplemental letter is being filed today,” it added.
While the publishers’ lawyers accepted that the data erasure was not done on purpose by OpenAI, it highlighted that the company was “in the best position to search its own datasets.”
Faced with multiple lawsuits by publishers alleging copyright infringement, OpenAI has argued that training its AI models on publicly available data such as news articles posted by The New York Times constitutes fair use of such content.
On the other hand, the ChatGPT maker has also struck content licensing deals with a slew of major publishers including Reuters, Associated Press, Financial Times, and Axel Springer, the parent company of Business Insider and Politico.