Journalism of Courage

Premium

Google to use publicly available information to train its AI models

Google's updated privacy policy suggests that the company will utilise public internet data to train language models.

By: Tech Desk
Bengaluru | Updated: July 5, 2023 08:43 AM IST

2 min read

Bard is the Google's generative AI model.

Listen to this article

Your browser does not support the audio element.

Google, the search engine behemoth, is currently building its own large language AI model called Bard and is using publicly available data to train it. According to a report by Gizmodo, Google’s updated privacy policy suggests that the company will use publicly available information to build and train its products and services, such as Google Bard, Google Translate, and Cloud AI capabilities.

Also read | ChatGPT is just the tip of the iceberg: 10 AI tools that are way cooler than OpenAI’s chatbot

“Google uses information to improve our services and to develop new products, features and technologies that benefit our users and the public. For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

This story requires a subscription

Select a plan and use IE10 code to get 10% extra off

POPULAR

EXPRESS EDGE

Daily ePaper access

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Starts at ₹133/month

Best for UPSC aspirants

This story requires a subscription

Please Select A Plan

POPULAR

EXPRESS EDGE

Daily ePaper access

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Premium stories & archives

UPSC Section + UPSC Essentials Magazine

Starts at ₹133/month

Best for UPSC aspirants

The report also suggests that, under these new policies, the entire internet itself becomes an AI playground for companies that train these large language models using publicly available data.

While this approach will aid Google in developing superior generative tools, it is said to be taking advantage of the openness of the internet. Elon Musk claims that the Twitter read limit was enforced to prevent companies from scraping data from the platform to train their AI models, by limiting data access to both individuals and corporates.

It is also said that Reddit’s new API charges are intended to prevent companies from freely harnessing data from subreddits. The usage of publicly available data to train AI models has sparked discussions regarding the copyright aspect and how just a few entities control the whole of the internet.

Tags:

artificial intelligence Google

Journalism of Courage

Edition

Install the Express App for
a better experience

Featured

Today's E-paper
Dec 16, 2025