July 22, 2021 7:33:29 pm
Quantifying excess deaths due to Covid has acquired a passionate appeal. The only objective reality is that existing data architecture, whether the Civil Registry System (CRS) or the Consumer Pyramid Household Survey (CPHS), is ill-suited for the task.
The challenge with using the CRS data is documented in my Hindustan Times article published last week. This is a major concern that we have highlighted in the past too. However, the difficulty with using the CPHS data in quantifying excess death is well articulated in a recent report by the Center for Global Development (‘Too many people have died’, IE, July 20) on estimating all-cause excess mortality in India during the Covid 19 pandemic. The key — and perhaps only worthwhile conclusion — of this paper concerning the CPHS mortality estimates is the following: “The important caveat on the death estimates from the CPHS is that its pre-Covid mortality does not track closely estimates from other official sources. Perhaps even more important is that the CPHS shows a big and inexplicable spike in mortality in 2019 before Covid. If some of the measurement errors from the CPHS pre-Covid carry over to the Covid period, the reliability of the excess deaths estimates is not assured.”
It would have been wise and prudent for the researchers to emphasise this conclusion rather than produce headline numbers based on data not suited for the task.
In this essay, I wish to highlight significant differences between the Sample Registration System (SRS), which is the basis of official estimates of birth and death rate in India, and CPHS data, which is designed to provide estimates for consumption and labour force participation. This would put in perspective why the official mortality figures from SRS differ significantly from the CPHS even before Covid. First and foremost, the SRS and the CPHS are conducted for different purposes; the SRS provides reliable estimates at the national and state-level for vital statistics such as birth, death, and infant mortality rate. CPHS provides estimates at the national and state level for consumption and labour force participation. Given that the objectives of the surveys are different and that deaths are a rare event, the sampling methodology of SRS and CPHS are very different, which perhaps explains why CPHS and the official SRS mortality estimates did not match even before the pandemic.
Subscriber Only Stories
In the SRS, the ultimate sampling unit in a rural area is a village or a part of it if the village population exceeds 2,000. In urban areas, the sampling unit is the census enumeration block (CEB) with a population between 750 to 1,000. This implies that all households in the sampled village (or segment)/CEB are covered. Furthermore, the SRS follows a dual record system, where the resident part-time investigator or the enumerator continuously records births and deaths of all the households in the sampled unit. This data is matched with an independent retrospective survey by a full-time supervisor after six months. The mismatched and partially matched events are re-verified to arrive at the correct number of events. The enumerators are expected to record all births and deaths in the sampling unit. They are also expected to record all events for usual residents, which occur outside the sampling unit. Thus, the recorded events are those associated with usual residents inside the sampling unit, usual residents outside the sampling unit, immigrants present or absent, and visitors inside the sampled unit. For ensuring complete coverage of the events, the enumerators take help from village priests, headmen, barbers, midwives, and others to collect information on births and deaths. This highlights the complex nature of collecting accurate birth and death data. Overall, 4,961 villages or segments with approximately 5.9 million people and 3,886 CEB in urban areas with a population of roughly 2.2 million people are covered to produce national and state-level estimates.
In contrast, for the CPHS, the ultimate sampling unit is a household within a village in rural areas or a Census Enumeration Block (CEB) in urban areas. For example, from a typical village with 300 households or a CEB, 16 households are randomly selected for the survey. Therefore, the sample size of the CPHS as compared to that of the SRS is a fraction (roughly one-tenth). Moreover, the CPHS data does not take all the necessary steps as the SRS to rigorously record all births and deaths. It is important to re-emphasise that the sampling methodology of the CPHS is not designed to provide estimates of vital events such as birth, death, and or infant mortality. If we were to inadvertently use the CPHS data to estimate vital events (births and deaths), we would encounter non-sampling errors (coverage and measurement) that are not corrected by sample size. Non-sampling errors are very difficult to measure, and their effects on estimates (bias and variance) are largely unknown. At this stage, the CPHS data does not provide any confidence that these errors are minimised. The reality of quantifying excess deaths, whether by using the CRS or the CPHS data, is fraught with problems that make it difficult to put a number to it. The fundamental issue is not that excess deaths did not happen but quantifying it scientifically and objectively remains challenging.
Using speculative numbers, many journalists and academics have accused the government of hiding the actual number of deaths. I think there is a much simpler explanation – the government does not know. Even prior to Covid, India did not have the data architecture to estimate vital statistics, such as births and deaths at the level of a district. We have followed an archaic system of survey-based estimates at the national and state level that has not changed in the last 50 years. It might come as a surprise to many, but the sample size of these surveys makes it difficult to compare mortality rates from one year to another. The real problem as I see it is that the government, journalists, academics (with a few exceptions) have been indifferent to the issues of vital statistics prior to Covid, largely because these statistics concerned the weaker sections of society (women and children) and the politics around it was not entertaining. Now that we are faced with a tragedy, we are demanding answers from a system that does not know. The real tragedy here is that because of our collective indifference to vital statistics, we perhaps will never have an honest account of the dead.
The writer is Non-Resident Senior Fellow, Brookings Institution
📣 Join our Telegram channel (The Indian Express) for the latest news and updates
- The Indian Express website has been rated GREEN for its credibility and trustworthiness by Newsguard, a global service that rates news sources for their journalistic standards.