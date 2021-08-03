The SARS-CoV-2 virus has certain genetic instructions that it uses to generate copies of itself.

All viruses, including SARS-CoV2, the virus that causes Covid-19, mutate over time. While some changes may or may not have any impact on the virus’s properties, keeping a close watch and tracking the variants has been an unenviable task.

For one and a half years now, experts at the Council of Scientific and Industrial Research Institute of Genomics and Integrative Biology, one of the top institutions at the forefront of Covid research, have been analysing genome sequencing data of the SARS-CoV-2 virus to understand the genetic epidemiology of Covid-19 in India. They extensively use Pango, a system for naming SARS-COV2 variants.

Among these experts include Vinod Scaria, a principal scientist at CSIR-IGIB, and Bani Jolly, a graduate student at CSIR-IGIB, who in an interview with Anuradha Mascarenhas said that approximately 50,000 genome sequences of the SARS-CoV2 virus have been assembled in the country.

Covid has taken a huge toll worldwide. How did genome sequencing help?

Since the past year, our lab has been actively involved in using genome sequencing-focused analysis of SARS-CoV-2 samples from across the country to understand the evolution and spread of the virus and variants of the virus. This includes an active and pioneering, collaborative research effort between CSIR-IGIB and state governments of Kerala and Maharashtra for genome surveillance of Covid-19, and focussed collaborative efforts with Medical Colleges to understand unique clinical issues. Sequencing SARS-CoV-2 genomes help us look closely at the mutations that arise in the virus during replication inside the human body during infection.

Looking at such mutations helps determine if a different lineage or ‘variant’ of SARS-CoV-2 has emerged in a region. Identification of virus lineages is important from a public health point of view since particular mutations may lend additional advantages to the virus in terms of its ability to transmit better from person-to-person, or in terms of its ability to decrease the efficacy of vaccines, as we have seen for the Alpha, Beta and now Delta variants. Additionally, tracking such mutations can also allow the tracing of the origin and spread of a specific variant of the virus especially as variants spread across geographical areas. For example, we have seen that the Delta variant was predominant during the second wave in India and its prevalence corresponded to the increase in cases seen in the country. Multiple studies subsequently suggested that Delta is more transmissible as compared to other previous lineages of SARS-CoV-2.

What is a Covid sequence? How many are there across the world and in India

The SARS-CoV-2 virus has certain genetic instructions that it uses to generate copies of itself. These instructions are coded as 29,903 letters of RNA (ribonucleic acid bases – A, U, G, C) that make up what is known as the ‘genome’ of the virus. Sequencing the genome of the virus essentially means that we determine the sequence of the 29,903 letters of the virus. Approximately 50,000 genome sequences of SARS-CoV-2 have been assembled in India through different labs throughout the country and under different initiatives, including state-wide programmes in Kerala and Maharashtra, as well as national consortia like INSACOG. More than 2.5 million genome sequences are available publicly world over, particularly in the public database gisaid.org

Why was Pango developed?

Since the beginning of the pandemic, researchers have emphasized the need to have a uniform naming system for different variants of SARS-COV2. Pango is a system for assigning names to different lineages of SARS-CoV-2 genomes, which was developed by virologists in the UK and Australia, early in 2020.

The Pango system is a hierarchical system of naming lineages. For example, the B.1.1.7 lineage, more commonly known as the Alpha variant, emerged from the lineage B.1.1 which had emerged from the lineage B.1, which is a direct descendant of the lineage B. The system is designed to assign lineages in a dynamic manner.

A group of genomes will be given a new lineage name according to the system if they have a defined set of characteristics, such as having a common ancestor, having a group of common mutations, or is linked to an important epidemiological event such as a large outbreak of the disease. The Pango nomenclature and the tool that can be used to assign lineages to genomes was initially developed by virologists at the University of Edinburgh, the University of Sydney and the University of Oxford.

AY.3 – a new sub lineage of Delta – is being steadily found in some states of the USA. Do we need to worry about this in India?

Since the emergence of Delta, it was expected that sub-lineages of Delta with additional mutations will also emerge in different regions since the virus continues to mutate and evolve. Not all mutations are of significance as they arise as a natural process of evolution. Currently, AY.3 is being reported in significant numbers from the USA. However, the Pango system of assignment of lineages works better and more accurately if it processes more sequences that represent a particular lineage.

Currently, AY.3 numbers are small, the lineage assignment for AY.3 may not be accurate. For instance, the small number of genomes from India currently assigned as AY.3 lack the mutations that have been reported in the cluster of genomes from the USA. Although the number of such lineages is small, we will be continuously tracking the genome sequences of the virus to see if AY.3 or any other Delta sub-lineage emerged in India or elsewhere.

How much data do you handle in a day and what are future plans

We do handle data arising from within the lab as well as emerging from elsewhere in the world and available in the public domain. The numbers that can be handled also depend on the questions being asked and the complexity of the analytic questions being addressed. The infrastructure at CSIR-IGIB has significant capacity to handle the data throughputs presently emerging. Also, the Pango network has become more collaborative and has recruited volunteers from different institutes for the task of proposing and assigning new lineages. Apart from this, epidemiologists and researchers from around the world, like us, can contact the team through different channels, particularly through their GitHub issues page, if they believe they have identified a new lineage in a group of genomes.