Why the $60 mn Google-Reddit AI deal foretells future of content licensing for LLMs

Google will get access to Reddit’s data application programming interface (API), which delivers real-time, structured, unique content from the social media platform.

Reddit and Google have struck a $ 60 million deal. (Reuters)

Google and Reddit have signed a $60 million deal that will give the Internet search giant real-time access to the data of one of the world’s most visited social media and content-sharing platforms. Google will use this data for its artificial intelligence (AI) offerings.

Reddit is headed to a public listing in the coming weeks, and an infusion of cash at this time could boost investor confidence. And Google’s AI system has come under fire — in both the US and India — and the company is looking to make its offerings more reliable.

The deal underlines the value of user-generated content on the Internet, especially for Generative AI (GenAI) platforms like Google’s Gemini and OpenAI’s ChatGPT. It also points to the complexities of training large language models (LLMs) when they can potentially infringe on the intellectual property rights of content producers online — and the reason why such content licensing deals may be crucial.

What is the Google-Reddit deal?

Google will get access to Reddit’s data application programming interface (API), which delivers real-time, structured, unique content from the social media platform.

“With the Reddit Data API, Google will have efficient and structured access to fresher information, as well as enhanced signals that will help us better understand Reddit content and display, train on, and otherwise use it in the most accurate and relevant ways,” Google said in a blog post.

Why is this important for Google?

Google needs access to as much user-generated content as possible to make its foundational model more reliable and accurate. Widespread criticism of responses generated by Gemini has increased the urgency for the company to get its act together, as the likes of OpenAI come to have an increasingly greater say in the future of how users access online services.

India, in response to the question “Is Modi a fascist?” Gemini said that the Prime Minister has been “accused of implementing policies some experts have characterised as fascist”, which include the “BJP’s Hindu nationalist ideology, its crackdown on dissent, and its use of violence against religious minorities”.

Story continues below this ad

Also Read | What is an LLM, the backbone of AI chatbots like ChatGPT, Gemini?

After the IT Ministry threatened to issue a show-cause notice for the “illegal and problematic” responses, Google said it had addressed the issue, and was working to improve the system.

The company had earlier apologised for “inaccuracies in some historical image generation depictions” after criticism that Gemini had depicted white figures (like the founding fathers of the US) or groups like Nazi-era German soldiers as people of colour.

Are Google-Reddit-like content licensing deals the future of building LLMs?

The New York Times sued OpenAI and Microsoft, the creators of ChatGPT, and other popular AI platforms last year, citing “unlawful” use of copyrighted content. The NYT said in its lawsuit that the companies were scraping original content from the publisher to build their models and manufacture responses.

The NYT lawsuit has reignited the debate on the ownership of online content and whether GenAI platforms are infringing on the intellectual property (IP) rights of organisations like news publications that put out significant amounts of updated and mostly accurate information on the Internet — the kind of data that GenAI platforms can really benefit from.

Story continues below this ad

The responses that AI platforms such as ChatGPT and Gemini generate rest on the bedrock of millions of pieces of textual content that creators, including news publishers, have uploaded online.

Also Read | India is building its own ‘sovereign AI’. What does it mean?

The music business — which is among the most sensitive towards IP rights — has also been pushing back on the use of AI in the industry. Universal Music Group has asked streaming services such as Spotify to stop developers from scraping its material to train AI bots in making new songs.

Copyright laws in countries around the world, including India, need drastic reimagining in the era of AI. In India, creative works are regulated by The Copyright Act of 1957, which defines an “author” (among other things) “in relation to any literary, dramatic, musical or artistic work which is computer-generated, the person who causes the work to be created”.

However, this definition does not take into account the fact that AI systems do not generate information on their own; they are only as good as the base dataset on which they are trained. And the base dataset is built out of copyrighted work produced by other authors.

Soumyarendra Barik

Soumyarendra Barik is a Special Correspondent with The Indian Express, specializing in the complex and evolving intersection of technology, policy, and society. With over five years of newsroom experience, he is a key voice in documenting how digital transformations impact the daily lives of Indian citizens. Expertise & Focus Areas Barik’s reporting delves into the regulatory and human aspects of the tech world. His core areas of focus include: The Gig Economy: He extensively covers the rights and working conditions of gig workers in India. Tech Policy & Regulation: Analysis of policy interventions that impact Big Tech companies and the broader digital ecosystem. Digital Rights: Reporting on data privacy, internet freedom, and India's prevalent digital divide. Authoritativeness & On-Ground Reporting: Barik is known for his immersive and data-driven approach to journalism. A notable example of his commitment to authentic storytelling involves him tailing a food delivery worker for over 12 hours. This investigative piece quantified the meager earnings and physical toll involved in the profession, providing a verified, ground-level perspective often missing in tech reporting. Personal Interests Outside of the newsroom, Soumyarendra is a self-confessed nerd about horology (watches), follows Formula 1 racing closely, and is an avid football fan. Find all stories by Soumyarendra Barik here. ... Read More