
The reason behind the lack of appreciation from a member of the Economic Advisory Council to the PM vis-a-vis large-scale sample surveys is unclear. Indeed, university-level courses on statistics and economics do not teach sample surveys with sufficient detail, especially in terms of practical applications and, therefore, there is a knowledge gap amongst researchers using National Sample Survey (NSS) data. For the last couple of years, in various print and digital media, attempts have been made to create ripples around the issue without a sound understanding of the subject.
The article by Shamika Ravi (‘The sample is wrong’, IE, July 7) and her rejoinder (‘Statisticians can be wrong’, IE, July 13) to Pronab Sen’s rebuttal (‘Statisticians aren’t stupid’, IE, July,10) is one such example. Ravi’s argument centres around two broad points. First, the sample design of various NSS surveys, the NFHS and PLFS is defective since these do not estimate rural and urban populations correctly and are biased against the urban and favour the rural. As a result, they are not capturing the impact of the work done by the government accurately. The second point is regarding the inability to collect data from wealthier respondents, stating that as more and more people become better off, they stop giving information to such surveys. The evidence given on this point is limited to NFHS and that too is from one union territory, Chandigarh, and NCT-Delhi where the non-response rate has been much higher and possibly non-random, vitiating the estimates.
The more important question is: Why are we using NSS data to estimate the population? Are NSS surveys carried out to estimate the population of rural and urban areas? The answer is a clear “no”. NSS surveys are carried out with a specific subject of enquiry, which may be to assess consumption patterns and changes in consumer expenditure, employment, unemployment, etc. In all such enquiries, the main effort is to measure the variable under study through scientific sampling techniques, with an appropriate sample size to achieve the highest level of precision of parameters estimated. Moreover, the population estimates given in the NSS reports are essentially sampling-design-based control totals that may assist users in applying multipliers for survey-based estimates. Any researcher using NSS data needs to understand the complete ecosystem, which includes the sampling methodology, its design and coverage while using the survey results. The NSS provides ratio estimates on various characteristics, which need to be applied to the population projections for arriving at estimated numbers.
On the second point regarding non-response by wealthier households in various surveys, any kind of non-response is scientifically adjusted in the NSS design, which is robust and accepted in the scientific community. A household, rich or poor, or divided with any other classificatory variables, if found recalcitrant in responding to the predefined questionnaire, is substituted with a similar household. Outliers are excluded during the data cleansing phase. Therefore, a statement like missing “Ambanis” is more gimmick than reflective of reality as non-response is not always from the rich, but also due to various other socio-political factors.
However, the non-response in NSS estimates is much lower than in other countries and is not significant enough to give erroneous estimates. It is worth appreciating that the information regarding non-response is in the public domain and hence, the user should approach the data with a clear understanding of its deficiencies as in the cases cited by Ravi for Chandigarh and Delhi. In any case, stating that all the data is prejudiced or biassed against urban areas or is not reliable due to sampling design is puerile, akin to throwing out the baby with the bathwater. Non-sampling errors are always present in large-scale sample surveys and NSS makes systematic and continuous effort to reduce them through appropriate strategies. Continuous research is always carried out to improve the response rate for all such surveys.
The quality of NSS estimates is further confirmed by low Relative Standard Errors (RSEs), which are published with the estimates. RSEs provide the confidence interval of the estimate — an internationally-accepted best practice. Therefore, before questioning the NSS design and the quality of its estimates, and drawing conclusions, we expect the author to “drink deep” from the documents related to NSS sampling design and learn the system for the conduct of surveys, especially the NSS.
The writer is Distinguished Fellow, Pahle India Foundation, Senior Fellow, NITI Aayog, Government of India and former, Director General, Ministry of Statistics & Programme Implementation, Government of India