Premium

Google AI model helps unmask cancer cells to the immune system: Lead scientist explains breakthrough

Google DeepMind’s AI model generated a ‘novel hypothesis’ about how cancer cells behave, confirmed through lab experiments. What is this breakthrough, and what can it mean for cancer treatment?

Expert Explains, cancer cells, Cancer, detect cancer, cancer treatment, Shekoofeh Azizi, Indian express explained, explained news, current affairs

Our immune system is constantly looking for unhealthy or diseased cells, but cancer cells are often good at hiding. Scientists asked their AI model to find drugs that could make cancer cells more “visible” to the immune system. (Blog.google)

Google DeepMind recently announced that its AI model C2S-Scale had generated a “novel hypothesis” about how cancer cells behave, which was later confirmed through lab experiments. The research was conducted in collaboration with Yale University. The lab believes this marks a milestone for AI in science and opens up a promising new direction for developing cancer treatments.

Shekoofeh Azizi, Staff Research Scientist and Research Lead at Google DeepMind, speaks with Kaunain Sheriff M about the significance of this breakthrough.

In simple terms, what is C2S-Scale, how does it ‘read’ the language of individual cells, and why do you consider it a breakthrough in single-cell analysis?

C2S-Scale is a family of large language models (LLMs) built upon Google’s Gemma-2 architecture. Think of it as a specialised AI model that we’ve taught to understand the language of biology in the form of gene expression inside of cells. We do this by taking the complex gene activity inside a single cell — measured by a technique called single-cell RNA sequencing (scRNA-seq) — and translating it into a simple “cell sentence,” which is a list of the most active genes in order of their activity.

Story continues below this ad

The model “reads” these sentences across millions of cells and learns the patterns of gene expression that define what a cell is and what it’s doing. The paradigm shift is that this approach bridges the gap between raw genomic data and human language, and allows LLMs to perform complex tasks on cells in natural language.

C2S-Scale generated a new hypothesis about cancer cell behavior, which you then confirmed in living cells. Can you explain that hypothesis?

Our immune system is constantly looking for unhealthy or diseased cells, but cancer cells are often good at hiding. We asked our model to find drugs that could make cancer cells more “visible” to the immune system by acting as a conditional amplifier: increasing antigen presentation in cancer cells when in the presence of low levels of interferon (a key immune signaling protein).

Our model predicted that a drug called silmitasertib would significantly boost antigen presentation in the immune-context-positive setting. This prediction serves as a promising hypothesis that now requires rigorous validation through research and clinical trials.

What the breakthrough means.

Single-cell RNA sequencing lets scientists peek inside individual cells, but the data is massive and complicated. How does C2S-Scale make sense of all that information and understand what’s happening inside a cell?

The key is in its training. Before we asked it to do a complex task like drug screening, we put C2S-Scale through a rigorous pre-training phase. We trained it on a massive dataset of over 50 million cells from public repositories like the Human Cell Atlas, covering a wide range of human and mouse tissues, diseases, and conditions.

Story continues below this ad

Explained Interview | After obesity and diabetes, where GLP-1 research is headed

During this pre-training, we gave it a series of fundamental tasks, like predicting a cell’s type based on its “cell sentence,” identifying its tissue of origin, or even generating a realistic new cell from scratch. By mastering these foundational tasks, the model learns the fundamental patterns of gene expression. This biological intuition is what allows it to make sense of new, complex information and perform sophisticated reasoning in later stages.

This model has 27 billion parameters, which is huge. Why does the scale of the AI matter when it comes to discovering new biology?

Scale is critical because biology is unimaginably complex. A large model, like our 27 billion-parameter C2S-Scale, has a greater capacity to learn and remember the countless subtle relationships between genes, cells, and tissues. There’s a well-known phenomenon in AI called “scaling laws,” where larger models don’t just get incrementally better, they often develop entirely new, emergent capabilities that smaller models lack. For a problem as vast as understanding life at the cellular level, that massive scale is essential for the model to have enough capacity to uncover genuinely new biological insights.

The model predicted that a drug called silmitasertib could make certain cancer cells more visible to the immune system, but only under very specific conditions.

How did you test this in actual cells, and how did you confirm that the AI’s prediction really works in the lab?

To validate the AI’s prediction, we took it to the lab. We used human neuroendocrine cancer cell lines that the model had never seen before, and set up a controlled experiment with two scenarios: cells treated with silmitasertib alone, and cells treated with a low dose of the immune signal (interferon) along with silmitasertib.

Story continues below this ad

The results confirmed the AI’s prediction. The drug by itself had no effect on the cells’ visibility markers. But when we combined it with low levels of interferon signaling, we saw a marked and significant increase in the molecules that make cancer cells visible to the immune system. It was a clear demonstration of the synergy the model had predicted, moving an AI-generated hypothesis from the computer to a real biological outcome.

It’s important to note the limitations of this validation: these experiments were conducted in vitro, not in a living organism. Furthermore, this was observed in a specific neuroendocrine cancer cell line. While these results are highly promising, significant further research and clinical trials would be required to understand if this effect translates into a safe and effective therapy for patients.

If C2S-Scale can find ways to make cancer cells more visible to the immune system, what does that mean for developing new treatments or speeding up drug discovery?

Traditional drug discovery involves physically screening thousands of compounds in a lab, which is incredibly slow, expensive, and often misses the mark. C2S-Scale allows us to perform these massive screening experiments in silico — inside the computer — at a scale and speed that would be impossible in the real world. This shows AI can be a powerful accelerator for science.

This doesn’t replace scientists, but it empowers them. It allows us to rapidly identify and prioritise the most promising and often non-obvious drug candidates. By narrowing the search space, AI can help researchers focus their lab experiments where they’re most likely to succeed, dramatically shortening the timeline from an initial idea to a potential new therapy.

Story continues below this ad

AI can connect different sources of knowledge to come up with new ideas. In this case, C2S-Scale didn’t just look at cell data, it also read other biological notes. How does it combine all that information to generate something new?

This gets to the heart of our multimodal approach. During its training, C2S-Scale wasn’t just fed raw cell sentences. It saw them alongside the human-generated context they came from — things like scientific annotations, tissue and disease labels, and even summaries from the research papers where the data was published.

By being trained on this rich mixture of biological data and natural language simultaneously, the model learns to connect the dots. It understands that a certain pattern of genes is not just a list, but corresponds to a “T-cell in a kidney from a patient with this disease,” as described in a scientific abstract. This ability to bridge the world of cellular data with the world of human knowledge is what allows it to generate novel hypotheses.

Kaunain Sheriff M

Kaunain Sheriff M is an award-winning investigative journalist and the National Health Editor at The Indian Express. He is the author of Johnson & Johnson Files: The Indian Secrets of a Global Giant, an investigation into one of the world’s most powerful pharmaceutical companies. With over a decade of experience, Kaunain brings deep expertise in three areas of investigative journalism: law, health, and data. He currently leads The Indian Express newsroom’s in-depth coverage of health. His work has earned some of the most prestigious honours in journalism, including the Ramnath Goenka Award for Excellence in Journalism, the Society of Publishers in Asia (SOPA) Award, and the Mumbai Press Club’s Red Ink Award. Kaunain has also collaborated on major global investigations. He was part of the Implant Files project with the International Consortium of Investigative Journalists (ICIJ), which exposed malpractices in the medical device industry across the world. He also contributed to an international investigation that uncovered how a Chinese big-data firm was monitoring thousands of prominent Indian individuals and institutions in real time. Over the years, he has reported on several high-profile criminal trials, including the Hashimpura massacre, the 2G spectrum scam, and the coal block allocation case. Within The Indian Express, he has been honoured three times with the Indian Express Excellence Award for his investigations—on the anti-Sikh riots, the Vyapam exam scam, and the abuse of the National Security Act in Uttar Pradesh. ... Read More