Follow Us:
Friday, August 19, 2022

There’s a hole in the data

The state has failed to create capacities for a timely, reliable, decentralised data regime.

India data systems, India employment data, National Sample Survey, Census of India, NSSO, NSSO data, NSSO data health, NSSO data expenditure, NSSO data debt, Budget 2019, Narendra modi govt, indian express The state has failed to create capacities that can be devoted to developing and maintaining a timely, reliable and decentralised data regime. (Illustration: CR Sasikumar)

The credibility of India’s data systems is under serious threat with the recent controversy over the employment data of the National Sample Survey. While the Census of India and the National Sample Survey Organisation (NSSO) have a good reputation, when it comes to data related to the social sector — health, education, nutrition — the situation, even with these sources (along with other large data sets), has been deficient on numerous counts.

One, the information collected is not available in real time or even annually. The NSSO collects data through specific rounds (health expenditure, debt etc.) which don’t have a fixed cycle unlike the consumption expenditure surveys, and the Census collects data once in 10 years. Budget allocations follow an annual cycle and policy pronouncements are not dovetailed to the years for which data is available. This raises important questions about the basis on which policies and plans are made. In the case of malnutrition, which is a problem needing urgent solutions, there was no independent data telling us what the trends are for a long time — the National Family Health Survey (NFHS-4) report came out in 2017 after a gap of over 10 years (NFHS-3 was in 2006). In the interim, major initiatives were planned for the eradication of malnutrition without any inkling of the situation on the ground or how it was changing. Similarly, data on learning levels was not collected consistently by the government, till 2017, and it is not known when the next round will be held or how long it will take for the data to be made available.

Two, there are inconsistencies in definitions and sampling frames across data sources and across time in the same data source. For instance, questions posed by the NSS for obtaining information on out-of-school children vary dramatically from those posed by the Census. As a result, the two arrive at vastly different numbers. Similarly, in the case of malnutrition data, there have been changes in the definitions used by NFHS across different rounds that make comparisons over time difficult. Periodicity of data collection also varies across sources, furthering difficulty in validation. Data validation plays an important part in improving the quality of data collected and ensuring authenticity, without which departments are basically shooting in the dark.

Three, the data collected in these surveys, is not geared towards policy or planning. The education rounds of NSS are part of the survey on social consumption, which in turn is for the purpose of making an assessment of the benefits derived by various sections of society from public expenditure incurred by the government. It provides no information on how the education system is functioning. As a result, several important indicators that would be of interest for planning or to the people, do not even figure in them. For instance, the different categories of teachers or their salaries is a not a data point in any data-set on education.

Subscriber Only Stories
UPSC Key-August 19, 2022: Why you should read ‘RBI and Inflation’ or ‘Lor...Premium
‘Bridgerton’ season 2 star Charithra Chandran on why she may ...Premium
Newsmaker | Syed Shahnawaz Hussain, the giant killer and youngest Union C...Premium
Explained: The CBI’s Delhi excise casePremium

In the absence of regular large-scale survey data, what is available is the registry data collected by departments and ministries for monitoring of programmes. Unfortunately, these too suffer from gaps in information and are rarely used for programmatic purposes. At most, they are part of an accounting exercise. For instance, school surveys by the MHRD collect information on broad indicators of infrastructure and teacher availability (only two categories, whereas multiple exist) and student enrolment (but not attendance) and distribution of incentives. These take stock of the provisioning in schools, showcasing administrative efforts, but not functioning of the education system or real changes within it.

Another major problem with departmental data sets is the conflict of interest that results from data being collected by people who are entrusted with ensuring outcomes. Thus, school data for District Information System for Education (DISE) is collected by school teachers, health workers fill in the information for Health Management Information System (HMIS), anganwadi workers provide nutrition data and so on. This creates perverse incentives for them to hide the reality on the ground. This came out starkly in a comparison (by N C Saxena) of monitoring data of ICDS, which showed severe malnutrition for the country at 0.4 per cent, whereas NFHS data for a comparable period showed it to be around 16 per cent. Field studies show that anganwadi workers are often penalised by their superiors for reporting severe malnutrition. Similarly, teachers fear losing their job if enrolment or attendance falls below a certain level.

Data collection also suffers because it is not used in any meaningful manner. The anganwadi worker who fills numerous registers each month never receives any feedback on the data collected. Cluster and Block Resource Persons in the education system routinely collect enormous amounts of information in multiple formats. But no action is taken on it. This lack of feedback acts as a huge disincentive to the data collectors reducing the quality of what they collect. The shift to mobile reporting has not changed the situation on the ground as introduction of technology did not improve the feedback mechanism that continues to be a missing link.


In effect, the state has failed to create capacities that can be devoted to developing and maintaining a timely, reliable and decentralised data regime. This inadequacy pervades the system from top to bottom. DISE, for instance, has barely a handful of people manning the entire operation of developing and maintaining the official database for education. At the sub-national level, they rely on data entry operators to collate and digitise data manually collected by teachers in complex formats. There are no statisticians in the system and few inputs received from educationists. Data in usable or useful form is unavailable at local levels, severely hampering ideas of transparency, accountability and decentralised planning.

The paucity and unreliability of government data has given rise to a plethora of non-government data sources in the social sectors, similar to Centre for Monitoring Indian Economy for industry and employment data. In education, the Annual Status of Education Report and the India Human Development Survey are commonly used. While these sources have been useful in highlighting neglected issues, it raises the question of data neutrality. Which source will, or should, the government use in making its policies and plans? Should not a large country of India’s complexity and growth strengthen its own data regime to ensure independence and neutrality? It will also go a long way in ensuring that its policies and plans are on track.

Bhatty is senior fellow with Centre for Policy Research, New Delhi, and Sinha is assistant professor (Economics), Ambedkar University

First published on: 11-02-2019 at 12:18:11 am
Next Story

Former diplomat Deb Mukharji’s photographs capture the Himalayas over five decades

0 Comment(s) *
* The moderation of comments is automated and not cleared manually by

Featured Stories