An Indian-American researcher who formerly worked for OpenAI, Suchir Balaji, was found dead in his San Francisco apartment on November 26, according to a CNBC report. Police have ruled that he had died by suicide.
Balaji had previously been the subject of a profile by The New York Times in which he alleged that the company had violated US copyright law while developing the ChatGPT chatbot.
The 25-year-old grew up in Cupertino, California. He told The NYT in October that his fascination with artificial intelligence dated back to 2013 when DeepMind, a London-based startup, introduced AI that had learned to play Atari games on its own. This spurred him to study neural networks, a machine-learning technique that mimics the human brain in analysing digital data and was the basis of DeepMind’s AI technology.
He studied Computer Science at the University of California Berkeley, graduating in 2021, and joined OpenAI as a member of their technical staff. A year later, he began working on the company’s GPT-4 project, for which he gathered vast amounts of digital data for analysis.
According to the NYT report, he had initially assumed that the company was free to use any internet data, regardless of its copyright. At the time, the GPT-4 project was not expected to “compete with existing internet services”, as GPT-3 itself was “not a chatbot” but “a technology that allowed businesses and computer coders to build other software apps.”
The release of ChatGPT in November 2022 reportedly prompted a rethink, and he was convinced that it, along with other chatbots, had unwittingly become competitors to the very services that provided the data used to train the AI systems.
In a blog post on his personal website, he makes a case for how OpenAI had flouted the provisions of fair use under US copyright law. He also claimed the company had made unauthorised copies of copyrighted data for the generative model to study and analyse. In the process, generative models could imitate online data and serve as a substitute for “basically anything” on the internet.
By becoming the new preferred point of access for users, generative AI models were also prone to generating nonsensical and false information, called ‘hallucinations’.
“If you believe what I believe, you have to just leave the company,” he told NYT.
In December 2023, NYT and other publishers lodged a series of cases against OpenAI and Microsoft for the alleged use of their copyrighted material as training data for the latter.
OpenAI in particular has dismissed Balaji’s claims, telling NYT in a statement: “We build our A.I. models using publicly available data, in a manner protected by fair use and related principles, and supported by longstanding and widely accepted legal precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.”