Premium

Why AI companies are giving subscriptions for free in India

AI companies see India as an ideal source for training data for their models. However, there is a lack of transparency regarding these datasets, raising concerns and legal scrutiny over their practices.

AI appsGiving away AI subscriptions worth thousands of rupees for free is not a new tactic. (Image: Unsplash)

When I opened the Airtel app on my smartphone over the weekend to recharge my mobile plan, I was surprised to see a Perplexity Pro AI subscription worth Rs 17,000 included with my plan. I double-checked whether I had accidentally subscribed to Perplexity AI.

Before I started to panic about being subscribed to another service, I realised that the Perplexity AI subscription was free with my plan, and that the offer is valid until July 2026. But Perplexity isn’t the only company offering its AI subscription plan for free to millions in India. Reliance Jio, another telecom operator, has an active partnership with Google to provide select users a free Google AI Pro subscription for 18 months.

Meanwhile, OpenAI is offering the annual ‘low-cost’ ChatGPT Go subscription, worth Rs 399 per month, free for one year in the country.

It’s not that AI companies are averse to offering free services to users who buy their products.

Apple, for example, is known for offering free access to Apple TV for three months with new products like the iPhone and iPad. Even my new Pixel 10 Pro Fold and 10 Pro XL come with a free Google AI Pro plan, which includes premium AI features and 2TB of storage for a year. In both cases, you pay a premium for their hardware products, and these tech companies occasionally offer access to new services to increase popularity and lock users into their ecosystems.

However, Google, OpenAI, and Perplexity are offering long-term, free access to premium AI services to millions of users in India. It makes one wonder why these tech companies are so generous at a time when artificial intelligence is an expensive playground, and the costs of infrastructure investments and training AI models run into the billions of dollars.

Well, these tech companies may not be showing a sweet gesture but rather a calculated motive in offering premium AI tools to millions of Indians for free, given the huge long-term rewards.

Story continues below this ad

Home to millions of smartphone and internet users

India is the world’s top consumer of mobile data per user, and its internet users are set to surpass 900 million, creating enormous market potential. The boom is largely thanks to the availability of low-cost internet, the penetration of smartphones even in rural areas, a digitally savvy youth aged 18 to 35, and the growth of digital infrastructure and services such as digital payments. This is why many companies are pumping money into India’s fast-growing internet ecosystem, a prize many investors find too big to ignore.

The role of telecom operators like Airtel and Jio is equally important in setting the stage for India’s AI boom. Together, Airtel and Jio have a massive user base, and they play a decisive role in ensuring effective execution, from deployment to demand conversion, ultimately driving the growth of user adoption for any new service.

….but where do AI training data come from?

No wonder companies like Google and OpenAI see an opportunity to train their most cutting-edge, advanced algorithms and models in India. The thing is, every AI model, from GPT to Gemini, needs vast amounts of human-labelled data, and a country like India may be just perfect for becoming a backbone of training data. But the big question is, where does AI training data come from?

To build large generative AI models, tech companies often turn to the public-facing internet, but there is no single place to download the entire web. Instead, tech companies select their training sets using automated tools that catalogue and extract data from the internet. After all, high-quality data, mainly scraped from the web, is crucial to the performance of AI models.

Story continues below this ad

These tools include web “crawlers,” nicknamed “spiders,” which are automated programs that systematically browse the World Wide Web to index its pages. For example, Alphabet, Google’s parent company, has already built web crawlers to power its search engine and can use its own tools to harvest data and train its AI models. Other companies, however, rely on resources like Common Crawl, a major source of training data that helps OpenAI’s GPT, which memorises huge amounts of text to respond to user questions when given a prompt.

In addition to publicly available data, AI companies also use their own data for model training. OpenAI, for example, fine-tunes its models based on user interactions with its chatbots. Meta AI is partially trained on public Facebook and Instagram posts. Amazon, too, says it uses some voice data from customers’ Alexa conversations to train its LLM. However, in most cases, AI companies have been secretive about the datasets used for training.

Now, comes the hard part

A lack of transparency about training data is the biggest red flag for why AI companies are under scrutiny. The New York Times recently filed a lawsuit against Perplexity, alleging that the startup has illegally copied and distributed its copyrighted content.

The suit, filed in the Southern District of New York last week, accused Perplexity of unlawfully scraping The Times’ stories, videos, podcasts, and other content to formulate responses to user queries. Another publication, The Chicago Tribune, filed a similar copyright lawsuit against Perplexity. The Tribune also argues that Perplexity scraped and distributed its content without authorisation.

Story continues below this ad

Perplexity, founded by Indian-origin Aravind Srinivas, has been the subject of multiple lawsuits. Earlier this year, Cloudflare, a leading digital infrastructure company, accused Perplexity of hiding its web-crawling activities and scraping websites without permission. Perplexity denied the allegations.

In October, social media company Reddit also sued Perplexity in New York federal court, accusing it and three other companies of unlawfully scraping its data to train Perplexity’s AI-based search engine.

News sites and numerous publications have accused AI companies of using copyrighted content without authorisation to build and operate their AI systems. In 2023, The New York Times blocked OpenAI’s web crawler, GPTBot, from using its content to train AI models. Soon, AI companies realised they needed to strike deals with publications, as data cannot be properly used without permission.

This led OpenAI to begin signing agreements with major international media companies to use their copyrighted content as training data. Axel Springer, France’s Le Monde, and Spain’s Prisa Media inked deals with the ChatGPT makers to provide material for training its AI models, followed by the Financial Times, which also cut a deal allowing ChatGPT users to receive summaries, quotes, and links to FT articles.

Story continues below this ad

Later, Reuters and the Associated Press signed deals with OpenAI, as did Hearst, The Guardian, Condé Nast, Vox, TIME, and The Atlantic. Microsoft signed a deal with USA Today. Meanwhile, Perplexity gained access to the work of AdWeek, Fortune, Stern, The Independent, and the Los Angeles Times. Axios, a leading technology publication, also signed a licensing deal with OpenAI.

However, publishers have no issue with search engines like Google using web crawlers to access their websites. That way, the search companies can, in return, get direct traffic to their content.

That said, the bitterness between content creators, publications, musicians, artists, and AI companies persists, with stakeholders turning to the courts to prevent what they see as AI firms infringing on creative rights. Take Disney and Universal, for example, who recently sued the artificial intelligence firm Midjourney over its image generator. The two Hollywood studios allege that it is a “bottomless pit of plagiarism”.

They claim Midjourney’s tool makes “innumerable” copies of characters, including Darth Vader from Star Wars, Elsa from Frozen, and the Minions from Despicable Me. Transparency about data sources should be given priority, after all. Even if artists and musicians ink a deal with AI companies, there is always the question of the extent to which AI attempts to recreate their style with just a few keystrokes. There is still no concrete answer.

Story continues below this ad

Can Indians protect their data from AI?

Giving away AI subscriptions worth thousands of rupees for free is not a new tactic. Google and others have shown that this strategy has worked in the past and could work again if access to AI services is provided free of charge. In fact, companies like Google gained access to many of their customers by offering services for free. Take its Google search engine, for example, which is essentially free but displays ads on the results page and collects user data – the source of most of its revenue.

But there is always a “catch”: a cost to free online services, and we as consumers ultimately pay the price. A startup like Perplexity only needs one thing: your attention. Its goal is to build a substantial user base, and if we are successfully lured in, it can secure funding and grow even bigger.

Perplexity’s valuation has increased to $20 billion in just three years. The same can be said of OpenAI, which has amassed 800 million weekly ChatGPT users, boosting its valuation to $500 billion. That’s how capitalism works.

All major AI companies are eyeing India, and for good reason. India not only has a large consumer base but is also a hub for the outsourcing IT industry. On one hand, global AI companies are gaining inroads into highly diversified consumer markets where users speak multiple languages and each region has its own unique culture and dialect, especially in rural towns.

Story continues below this ad

At the same time, they are gaining access to a large pool of users from startups and SMBs. For large tech companies, the more users who engage with their AI services, whether a student, a corporate professional, or a warehouse worker managing systems, the better it is for training their AI models. And no market is better than India at the moment.

If the AI ecosystem develops, it could also create a market for cloud farming in smaller centres across India, which can be used for building datasets to train AI and for content moderation.

One fundamental question that cannot be ignored is whether sensitive personal data can be kept away from AI training. Currently, India does not have a law specifically governing artificial intelligence. While the Digital Personal Data Protection Act (DPDP) 2023 provides broad protections for personal data, it has yet to be enacted. Moreover, the Act does not address AI systems or algorithmic accountability.

In states like California in the US, digital privacy laws do give consumers the right to request that companies delete their personal data. In the European Union, the Artificial Intelligence Act imposes controls on “high-risk systems” used in areas such as education, healthcare, law enforcement, and elections. It bans some AI use altogether.

Story continues below this ad

However, there is currently no clear way to make an AI “forget” previously learned data; completely removing copyrighted or sensitive information would require retraining the model from scratch, which could cost tens of millions of dollars.

Anuj Bhatia is a seasoned personal technology writer at indianexpress.com with a career spanning over a decade. Active in the domain since 2011, he has established himself as a distinct voice in tech journalism, specializing in long-form narratives that bridge the gap between complex innovation and consumer lifestyle. Experience & Career: Anuj has been a key contributor to The Indian Express since late 2016. Prior to his current tenure, he served as a Senior Tech Writer at My Mobile magazine and held a role as a reviewer and tech writer at Gizbot. His professional trajectory reflects a rigorous commitment to technology reporting, backed by a postgraduate degree from Banaras Hindu University. Expertise & Focus Areas: Anuj’s reporting covers the spectrum of personal technology, characterized by a unique blend of modern analysis and historical context. His key focus areas include: Core Technology: Comprehensive coverage of smartphones, personal computers, apps, and lifestyle tech. Deep-Dive Narratives: Specializes in composing longer-form feature articles and explainers that explore the intersection of history, technology, and popular culture. Global & Local Scope: Reports extensively on major international product launches from industry titans like Apple and Google, while simultaneously covering the ecosystem of indie and home-grown tech startups. Niche Interests: A dedicated focus on vintage technology and retro gaming, offering readers a nostalgic yet analytical perspective on the evolution of tech. Authoritativeness & Trust Anuj is a trusted voice in the industry, recognized for his ability to de-jargonize trending topics and provide context to rapid technological advancements. His authority is reinforced by his on-ground presence at major international tech conferences and his nuanced approach to product reviews. By balancing coverage of the world's most valuable tech brands with emerging startups, he offers a holistic and objective view of the global technology landscape. Find all stories by Anuj Bhatia here. You can find Anuj on Linkedin. ... Read More

 

Latest Comment
Post Comment
Read Comments
Advertisement
Loading Taboola...
Advertisement