
Google, the search engine behemoth, is currently building its own large language AI model called Bard and is using publicly available data to train it. According to a report by Gizmodo, Google’s updated privacy policy suggests that the company will use publicly available information to build and train its products and services, such as Google Bard, Google Translate, and Cloud AI capabilities.
“Google uses information to improve our services and to develop new products, features and technologies that benefit our users and the public. For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”
The report also suggests that, under these new policies, the entire internet itself becomes an AI playground for companies that train these large language models using publicly available data.
While this approach will aid Google in developing superior generative tools, it is said to be taking advantage of the openness of the internet. Elon Musk claims that the Twitter read limit was enforced to prevent companies from scraping data from the platform to train their AI models, by limiting data access to both individuals and corporates.
It is also said that Reddit’s new API charges are intended to prevent companies from freely harnessing data from subreddits. The usage of publicly available data to train AI models has sparked discussions regarding the copyright aspect and how just a few entities control the whole of the internet.