Microsoft pilots new content marketplace for AI training: What it means for publishers

Microsoft’s new publisher content marketplace comes amid a long-running stand-off between publishers and big tech companies, which has only intensified with the rise of AI.

Microsoft will pay Harvard a licensing fee, the report added.Microsoft is looking to onboard Yahoo and other partners as it continues to pilot test the platform. (Image: Reuters)

Microsoft has unveiled a new platform that lets AI developers pay publishers and train their models on ‘premium content’ under licensing terms set by the publishers themselves.

The content licensing hub called Publisher Content Marketplace (PCM) will serve as a new revenue stream for publishers while enabling developers with scaled access to AI training data in order to improve the quality of responses generated by their models, Microsoft said in a blog post on Tuesday, February 3.

PCM will also provide publishers with insights on training data usage to help them understand the value of such content and accordingly set prices as well as licensing terms. The platform will be voluntary and open to all types of publishers. Microsoft also emphasised that publishers will retain ownership of their content along with editorial independence.

Microsoft’s PCM platform comes amid a long-running stand-off between publishers and big tech companies which has only intensified with the rise of AI. The current AI boom has been largely fueled by large language models (LLMs) that have been trained and developed by ingesting vast amounts of training data scraped from all corners of the internet, including publishers’ websites, without authorisation.

In response, publishers such as The New York Times have filed copyright infringement lawsuits against tech companies such as Microsoft and OpenAI. In India, publishers — members of the Digital News Publishers Association (DNPA), including The Indian Express, among others — have mounted a legal challenge against OpenAI over the “unlawful utilisation of copyrighted material”.

On the other hand, several major publishers have inked lucrative deals with AI companies to licence their content for AI training purposes.

“The open web was built on an implicit value exchange where publishers made content accessible, and distribution channels – like search – helped people find it. That model does not translate cleanly to an AI-first world, where answers are increasingly delivered in a conversation,” Microsoft said.

Story continues below this ad

“At the same time, much of the authoritative content lives behind paywalls or within specialised archives. As the AI web grows, publishers need sustainable, transparent ways to govern how their premium content is used and to license it when it makes the most sense,” it added.

The Windows maker said its newly unveiled PCM platform has been developed in partnership with US-based publishers such as Vox Media, The Associated Press, Condé Nast, People, and others. To show that training models on premium content improves the quality of their responses, Microsoft said it grounded specific responses of its Copilot AI chatbot with licensed content and ran experiments to validate assumptions before scaling. Its testing revealed that premium content meaningfully improved the quality of Copilot’s responses.

The company further said it is looking to onboard Yahoo and other partners as it continues to pilot test the PCM platform.

India’s proposed licensing regime

Last year, a government-constituted committee led by the Department for Promotion of Industry and Internal Trade (DPIIT) proposed a new framework which would require all AI companies to pay royalties to creators for using copyrighted work under a mandatory blanket licence.

Story continues below this ad

The flat royalty rates would apply retrospectively and would be prescribed by a government-appointed committee, according to the working paper drafted by the DPIIT-led committee. The collection and distribution of royalties to creators would be overseen by a new umbrella industry body called the Copyright Royalties Collective for AI Training (CRCAT).

Notably, the committee rejected direct, voluntary content licensing models between AI developers and individual companies, saying it would lead to high transaction costs, long negotiations, and unequal bargaining power that disadvantage small creators and startups, without offering broad, dependable access to training data.

 

Latest Comment
Post Comment
Read Comments
Advertisement
Loading Taboola...
Advertisement