Journalism of Courage

Premium

Suchir Balaji found dead: What were the allegations raised by OpenAI whistleblower?

Employee-turned-whistleblower Suchir Balaji had alleged that OpenAI had flouted the norms of fair use under the US copyright act in sourcing data for its GPT-4 model to analyse and train on.

By: Explained Desk
New Delhi | Updated: December 16, 2024 01:06 PM IST

3 min read

Suchir Balaji, OpenAI, OpenAI employee died by suicide

Suchir Balaji, a former OpenAi employee, in San Francisco, on Oct. 3, 2024. (Ulysses Ortega/The New York Times)

An Indian-American researcher who formerly worked for OpenAI, Suchir Balaji, was found dead in his San Francisco apartment on November 26, according to a CNBC report. Police have ruled that he had died by suicide.

Balaji had previously been the subject of a profile by The New York Times in which he alleged that the company had violated US copyright law while developing the ChatGPT chatbot.

This story requires a subscription

Select a plan and use IE10 code to get 10% extra off

POPULAR

EXPRESS EDGE

Daily ePaper access

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Starts at ₹117/month

Best for news lovers

UPSC PACK

Daily ePaper access

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Starts at ₹133/month

Best for UPSC aspirants

This story requires a subscription

Please Select A Plan

POPULAR

EXPRESS EDGE

Daily ePaper access

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Starts at ₹133/month

Best for UPSC aspirants

Who was Balaji?

The 25-year-old grew up in Cupertino, California. He told The NYT in October that his fascination with artificial intelligence dated back to 2013 when DeepMind, a London-based startup, introduced AI that had learned to play Atari games on its own. This spurred him to study neural networks, a machine-learning technique that mimics the human brain in analysing digital data and was the basis of DeepMind’s AI technology.

He studied Computer Science at the University of California Berkeley, graduating in 2021, and joined OpenAI as a member of their technical staff. A year later, he began working on the company’s GPT-4 project, for which he gathered vast amounts of digital data for analysis.

What concerns did he raise about OpenAI’s use of data?

According to the NYT report, he had initially assumed that the company was free to use any internet data, regardless of its copyright. At the time, the GPT-4 project was not expected to “compete with existing internet services”, as GPT-3 itself was “not a chatbot” but “a technology that allowed businesses and computer coders to build other software apps.”

Also Read | Why Trump wants to end ‘inconvenient’, ‘costly’ practice of Daylight Saving Time

The release of ChatGPT in November 2022 reportedly prompted a rethink, and he was convinced that it, along with other chatbots, had unwittingly become competitors to the very services that provided the data used to train the AI systems.

In a blog post on his personal website, he makes a case for how OpenAI had flouted the provisions of fair use under US copyright law. He also claimed the company had made unauthorised copies of copyrighted data for the generative model to study and analyse. In the process, generative models could imitate online data and serve as a substitute for “basically anything” on the internet.

Story continues below this ad

By becoming the new preferred point of access for users, generative AI models were also prone to generating nonsensical and false information, called ‘hallucinations’.

“If you believe what I believe, you have to just leave the company,” he told NYT.

How has OpenAI reacted to these claims?

In December 2023, NYT and other publishers lodged a series of cases against OpenAI and Microsoft for the alleged use of their copyrighted material as training data for the latter.

OpenAI in particular has dismissed Balaji’s claims, telling NYT in a statement: “We build our A.I. models using publicly available data, in a manner protected by fair use and related principles, and supported by longstanding and widely accepted legal precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.”

Tags:

Explained Global Express Explained

Journalism of Courage

Edition

Install the Express App for
a better experience

Featured

Today's E-paper
Nov 21, 2025