Next door to Delhi, a ‘bank’ to store country’s digitised biological data

The ‘Indian Biological Data Bank’ has come up at the Regional Centre for Biotechnology in Faridabad. The digitised data will be stored on a four-petabyte supercomputer called ‘Brahm’. A petabyte equals 10,00,000 gigabytes (gb).

Regional Centre for Biotechnology will house the digital repository

The government has for the first time set up a digitised repository where Indian researchers will store biological data from publicly funded research, reducing their dependency on American and European data banks.

The ‘Indian Biological Data Bank’ has come up at the Regional Centre for Biotechnology in Faridabad. The digitised data will be stored on a four-petabyte supercomputer called ‘Brahm’. A petabyte equals 10,00,000 gigabytes (gb).

Also Read | Australian health insurer’s customer data published online after Medibank refused to pay ransom

The government has mandated that data from all publicly funded research should be stored in this central repository. So it will not only provide a platform to researchers to securely store their data within the country, it will also provide access to a large database of indigenous sequences for analyses.

Such databases have traditionally played a key role in determining the genetic basis of various diseases and finding targets for vaccines and therapeutics.

“At present, most Indian researchers depend on the European Molecular Biology Laboratory (EMBL) and National Center for Biotechnology Information databases for storing the biological data. There are other smaller datasets available with some institutes, but those are not accessible to all. This will be the first national data repository, where the data will not only be submitted from across India but can be accessed by researchers from across India,” said Dr Sudhanshu Vrati, director of the Regional Centre for Biotechnology.

At the inauguration of the centre on Thursday, Union Science Minister Jitendra Singh said the bio-bank will create “Indian data for Indian solutions”.

Also Read | Ground report: At MCD’s largest hospital, crumbling infra, delayed salaries and patients in distress

“Many of our researchers still depend on other countries for such large databases, but the Indian phenotype is very different and solutions based on others’ data might not be optimal. We also need to look beyond. We can even provide our data to Western countries. You go to any of our public hospitals and you can find a patient with any disease you want to study; Western countries hardly see cases of tuberculosis or many other tropical diseases,” said Singh.

Story continues below this ad

The bio-bank, which cost about Rs 85 crore to set up, currently accepts neucleotide sequences — the digitised genetic makeup of humans, plants, animals, and microbes. There are now 200 billion base pair data in the bio-bank, including 200 human genomes sequenced under the ‘1,000 Genome Project’, which is an international effort to map the genetic variations in people. The project will also focus on populations that are predisposed to certain diseases.

The database also contains most of the 2.6 lakh Sars-Cov-2 genomes sequenced by the Indian Sars-CoV-2 Genomic Consortium (INSACOG). These sequences, which are also uploaded to a global database, have helped the consortium keep track of Sars-CoV-2 variants circulating in the country and warn authorities about any emerging variant that might lead to more cases. For instance, the government learnt from this data that the Omicron sub-variant BA.2.75 was being overtaken by a recombinant variant XBB — which is a combination of two Omicron sub-lineages, BJ.1 and BA.2.75.

Also Read | Why family needs to be at the heart of India’s health system

Other than human and Sars-CoV-2 genomes, the database will also store the 25,000 sequences of mycobacterium tuberculosis that another national consortium is trying to sequence. This will help not only in understanding the spread of multi-drug and extremely drug resistant TB in the country, but also aid the search for targets for new therapies and vaccines.

The database currently also stores the genomic sequences of crops such as rice, onion, tomatoes and mustard, among others. With genomes of humans, animals, and microbes present in the same database, it will also help researchers in studying zoonotic diseases, that is, diseases that jump from animals to humans.

Story continues below this ad

Department of Biotechnology Secretary Dr Rajesh Gokhale said: “Take for example the BRCA gene that we know is associated with breast cancer. If we have a geographically representative database, we can actually determine the prevalence of breast cancer risk in the country by region. Or, if we compare our genes with sequences available from other parts of the world, we may detect mutations that are present only in our population.”

Although the database currently only accepts such genomic sequences, it is likely to expand later to storage of protein sequences – strings of amino acids that join together to form various proteins found in these organisms – and imaging data such as copies of Ultrasound and MRI.

The database currently offers two mechanisms for data submission to researchers. One, open access where the data uploaded can be immediately used by other researchers from across the country and two, controlled access where the data will not be openly shared for a number of years before being opened up to all.

The biobank also has a backup data ‘Disaster Recovery’ site at National Informatics Centre (NIC)-Bhubaneshwar.

Story continues below this ad

“We are thinking of providing controlled access for a six year period — the government has to take a call on that. During this period the data will be stored on our servers but be accessible to only the researchers who have uploaded it. After the period, it will be made openly available to others,” said Dr Vrati. The data will also be tagged with an accession number that will make it searchable not only in the Indian database but also in international databases.

Anonna Dutt

Anonna Dutt is a Principal Correspondent who writes primarily on health at the Indian Express. She reports on myriad topics ranging from the growing burden of non-communicable diseases such as diabetes and hypertension to the problems with pervasive infectious conditions. She reported on the government’s management of the Covid-19 pandemic and closely followed the vaccination programme. Her stories have resulted in the city government investing in high-end tests for the poor and acknowledging errors in their official reports. Dutt also takes a keen interest in the country’s space programme and has written on key missions like Chandrayaan 2 and 3, Aditya L1, and Gaganyaan. She was among the first batch of eleven media fellows with RBM Partnership to End Malaria. She was also selected to participate in the short-term programme on early childhood reporting at Columbia University’s Dart Centre. Dutt has a Bachelor’s Degree from the Symbiosis Institute of Media and Communication, Pune and a PG Diploma from the Asian College of Journalism, Chennai. She started her reporting career with the Hindustan Times. When not at work, she tries to appease the Duolingo owl with her French skills and sometimes takes to the dance floor. ... Read More

Stay updated with the latest - Click here to follow us on Instagram