Premium

‘Indians among most avid users’: Team behind ChatGPT Images 2.0 on multilingual AI image generation

An interview with members of the San Francisco-based team that built Images 2.0, OpenAI’s latest model that is said to be a step change above previous versions, particularly in generating images across languages like Hindi and Bengali.

OpenAI interviewAbhi Muchhal, product manager at OpenAI (left) and Boyuan Chen, research scientist at OpenAI. (Image: OpenAI)

India is playing a growing role in shaping how AI image generation models are developed, with OpenAI’s ChatGPT Images 2.0 now capable of generating everything from Manga-style panels in Hindi to more realistic depictions of crowded and chaotic Indian streets.

Earlier this week, OpenAI CEO Sam Altman said that Indian users have generated more than one billion visuals using Images 2.0 since its release in April 2026. The milestone comes a year after OpenAI first introduced the ‘Images for ChatGPT’ feature that kicked off the viral Studio Ghibli-style AI images trend.

However, OpenAI is also reportedly undergoing a broader strategic reset, pulling the plug on experimental side projects while redirecting talent and computing resources toward enterprise products. In a surprise move, the company shut down Sora, its popular AI video-generation tool, just six months after releasing it to the public.

In this context, The Indian Express sat down with members of the San Francisco-based team that built Images 2.0 to understand how exactly the latest model is a step change above previous versions and more importantly, how it was iterated for multilingual, culturally diverse markets like India – an approach that seems to be paying off in terms of adoption and user engagement.

“Previously, most of our work, including model evaluations, were done in English. Our models also struggled with a lot of details, especially in Asian languages. In Chinese, Japanese, Korean, Hindi and others, there are thousands of characters compared to just 26 letters in English,” ⁠Boyuan Chen, a research scientist at OpenAI, said.

“However, this time, we spent a lot of time making sure cultures from around the world were covered in our internal iteration process. Whenever we saw that a language was not performing well, we added a lot more data to ensure broader cultural and linguistic coverage,” Chen explained.

With ChatGPT Images 2.0, OpenAI said it has achieved significant gains in non-Latin text rendering, particularly in Japanese, Korean, Chinese, Hindi, and Bengali. The multilingual understanding of the model is said to go beyond simple translation, where language is embedded in visual outputs such as posters, comics, diagrams, etc.

Story continues below this ad
ChatGPT Images 2 (AI-generated image using ChatGPT Images 2.0/OpenAI)

Abhi Muchhal, a product manager at OpenAI, offered another example of the model’s India-specific realism. “In the previous model, if you prompted it to make a city scene in India, it wouldn’t be crowded at all. While this model is not perfect, now you can see a realistic representation where there’s rickshaws moving left and right, and there’s a lot of people, there’s hustle and bustle,” he said.

Beyond multilingual capabilities, Images 2.0 has the ability to generate across a wide range of aspect ratios in much higher quality, with support for up to 2K resolution, and is said to demonstrate improved fidelity across a wide range of visual styles.

Challenge of multilingual image generation

As recently as 2024, text-to-image generators like DALL-E 3 struggled to spell words accurately inside images. Because diffusion models generate images by reconstructing pixels from noise, small text elements received less attention during training. The issue became more complex with regard to outputs in different languages.

But that limitation has now largely gone the way of the infamous ‘extra fingers’ problem that plagued earlier image generators.

Story continues below this ad
images-2-candid-people (AI-generated image using ChatGPT Images 2.0/OpenAI)

A key breakthrough came earlier with Images 1.0, which reportedly took an autoregressive approach to generate images sequentially from left to right and top to bottom, similar to how text is written. This differed from the diffusion model technique used by most image generators like DALL-E that create the entire image at once.

With Images 2.0, OpenAI was able to improve the model’s ability to accurately render text in different languages by applying the same advances used to improve its text-based chatbots. Declining to share details for proprietary reasons, Chen said,“It’s similar to text intelligence in ChatGPT. Depending on the prompt, it can respond robotically or more naturally and conversationally. The same idea applies here.”

He further mentioned that the key was training the model to follow instructions from users better. “With this image-generation model, we wanted it to follow the user’s intent. So we trained it on both types of data, publicly available casual data and studio-style images,” he said. “We made sure the model follows what people actually want, instead of simply outputting good-looking images,” Chen added.

Images 2.0 is also OpenAI’s first image generation model that ‘thinks through’ user prompts as it is built on top of the company’s reasoning models. It also has the ability to use the web to find relevant information, with a knowledge cut-off date of December 2025. It is also more likely to understand context than Images 1.5 did, according to Muchhal.

Story continues below this ad

Unexpected ways Indians use Images 2.0

Stating that Indians have consistently been one of the most avid users of image generation, Muchhal said, “We were very happy to see the level of adoption in India, but more than the numbers, what surprised me most was the diversity of use cases.”

Not all of the usage trends pertained to generating photorealistic outputs, he said, pointing to the latest trend of asking ChatGPT to turn nice photos into scribbly drawings like the ones done on Microsoft Paint decades ago.

When asked whether viral AI image trends are intentionally shaped by OpenAI or driven organically by user behaviour, Muchhal affirmed that it was a combination of both: “We try to pick a representative set of use cases where we know that either the model has struggled with it in the past or areas that we want to improve, and we try to improve on those. But to be honest, a lot of the things that go viral are also unexpected to us.”

The OpenAI executives also said some of the most unexpected trends in India included AI-generated hair-colour previews, the ‘younger me’ portraits, and Y2K-style romantic portraits.

On enterprise adoption of AI image generators, Muchhal said, “In the past, the model struggled with accurately following instructions which made it very hard for users to be able to use this for a professional use case.” “But what we’ve seen now with Images 2.0 is not only the personal use cases, but there’s been overwhelming enterprise demand because now you’re able to make the creative workflow go so much faster,” he added.

Story continues below this ad

Safety, watermarks, and deepfake risks

Images 2.0 is also able to generate fine-grained elements, including the tiny flaws that add realism to its visuals.

Asked about the risk of photorealistic outputs used to spread misinformation, Muchhal said that OpenAI looks to strike a constant balance between creative freedom and user safety and transparency. “We have very high standards around copyright infringement, and we make sure there is no misuse in those areas. One thing we care deeply about is ensuring there is nothing deceptive or impersonating in the outputs,” he said.

ChatGPT-generated images support the open C2PA (Coalition for Content Provenance and Authenticity) standard which adds a clear signal in the metadata that an image was generated by an AI model. A few days ago, OpenAI also announced a partnership with Google to include an invisible watermark called SynthID. But the AI-generated images do not carry a visible watermark so as not to tarnish the output, as per Muchhal.

When asked for comment on the Indian government’s recently notified AI labelling rules, which require social media platforms to attach a prominent label on AI-generated content, Muchhal said, “We believe the system needs to be built in collaboration with stakeholders […] We have shared a lot of what we are doing with government stakeholders, continue to incorporate their input, and are working to find the right balance between giving users control and meeting the trust and safety expectations set by governments.”

Karan Mahadik is a Tech Correspondent for The Indian Express based in Delhi-NCR, specialising in the intersection of technology and public policy. With a focus on how digital infrastructure shapes governance and society, he is a key voice in the publication's coverage of the rapidly evolving tech regulation landscape. Experience & Career Karan brings a robust background in digital journalism to his role at The Indian Express. Before joining the organisation, he honed his skills at MediaNama and The Quint. Expertise & Focus Areas Karan’s reporting moves beyond product cycles to investigate the broader implications of technology. His work is defined by: Tech Policy & Regulation: In-depth coverage of legal frameworks, government directives, and internet governance. Artificial Intelligence: His work is dedicated to demystifying AI developments and their impact on industries and individuals. Privacy & Security: Reporting on digital rights, data protection (DPDP rules), and platform accountability. Complex Analysis: Known for his ability to translate dense policy documents and technical shifts into clear, accessible narratives for a general audience. Authoritativeness & Trust Karan is recognised for his rigorous approach to sourcing and his commitment to digital privacy, evidenced by his accessibility via secure channels like Signal (Username: karanhm.24). His work is frequently cited for its detailed examination of regulatory overreach and corporate accountability. By anchoring his reporting in verified data and expert commentary, he provides readers with a reliable compass for navigating the "wild west" of modern technology. Find all stories by Karan Mahadik here ... Read More

 

Advertisement
Loading Recommendations...
Advertisement
Latest Comment
Post Comment
Read Comments