Sample surveys are important to validate administrative databaseshttps://indianexpress.com/article/opinion/columns/nsso-organised-sector-cso-services-jobs-5747314/

Sample surveys are important to validate administrative databases

The report shows the importance of sample surveys to validate administrative databases and the need to restrain the tendency to be overly confident on administrative data — be it EPFO data or MCA-21 data — without adequate scrutiny of the processes that generate the databases.

A regular annual survey of manufacturing establishments using the list of registered factories has been in vogue since long.

The National Sample Survey Organisation’s Technical Report on the survey of services sector enterprises is in the news for its findings on the accuracy of the list of companies in the services sector, sourced from the Ministry of Corporate Affairs. In the atmosphere of mistrust of official data following the suppression of the employment report, the conclusions of this technical report are not surprising. The clarification from the government has done little to clear the air. Some misgivings expressed by experts overlook the usual weaknesses of administrative data, though some questions still need to be answered by the Central Statistical Office (CSO).

A regular annual survey of manufacturing establishments using the list of registered factories has been in vogue since long. There was no similar survey for the services sector. The need for a regular system of surveys of enterprises in the services sector was strongly articulated by the Rangarajan Commission. The reason for not conducting this survey was the absence of a dependable list of enterprises in the service sector for drawing samples.

Successive Economic Censuses have failed to produce a reliable list of establishments. The NSSO did a large pilot survey in selected states and metros during 2012-13 using a “list frame” of enterprises having 10 or more workers from the 2005 Economic Census. This survey showed almost 50 per cent substitution of the original units attributed to the imperfect coverage of establishments in the Economic Census. Around this time, the MCA-21 data came to be used for the new GDP series. Subsequently, the National Statistical Commission (NSC) suggested a fully list frame-based survey of the services sector enterprises in the NSS 74th round (July 2016-June 2017), as a prelude to a regular Annual Survey on Services Sector (ASSSE). This survey was expected to generate estimates of various operational and economic characteristics of services sector enterprises.

Three types of lists were used for data collection in 74th round viz, the 2013 Economic Census (EC) list, Business Register (BR) available with 11 states and the list of active private non-financial companies of 2013-14 sourced from the MCA database by CSO, and updated for 2014-15. Postal addresses of the companies in the MCA database available from the Ministry of Corporate Affairs were used by matching the Company Identification Number.

Advertising

In the first phase, the units selected from the EC and BR lists were verified through field visits. Out of the 1.35 lakh establishments from this, only about 63,000 were found eligible for the survey. Out of the 3.5 lakh enterprises in the MCA list, the survey was to cover 35,456 units. Of all units selected for the survey, only 67 per cent were found to be in operation causing a major setback to the survey. This happened due to non-response, closure of units, units out of coverage or units non-traceable.

The current debate is on the large number of MCA companies that could not be surveyed. Of the 35,000 companies, data could be collected from only 54.5 per cent. Of the remaining, 7 per cent did not respond or did not agree for the survey. About 12 per cent could not be traced at their registered addresses. Four and half per cent were found closed and over 21 per cent were not in the survey coverage i.e. were not non-financial service companies. In view of the large truncation of the planned sample, the NSC under this writer’s chairmanship recommended that no meaningful estimates can be prepared. Only a short technical report giving the survey experiences along with the sample based indicators was recommended.

Administrative data from government databases are notoriously imperfect. It requires sustained cleaning to make them usable. The use of MCA database was a major shift in the compilation of GVA in the new national accounts series, besides the shift from the establishment approach to the enterprise approach. This has implications for economic activities under different sectors. For instance, trade carried out by manufacturing companies becomes part of “manufacturing”, but was earlier covered in “trade” because of the establishment approach.
While non-responding units and not traceable units will not impact GDP computation based on the data actually filed by them, the presence of over 20 per cent of units in some other business other than services raises questions. Computing GVA from filings done by these companies, may not impact the overall GVA, but the sectoral GVA will be wrong if the companies are not put in the correct activity category. This is important when the new GDP series has changes in the shares of manufacturing and services sectors.

A major drawback of using the MCA data has been its inability to produce regional estimates for computing state level GDP as GVA estimates for the private corporate sector based on MCA cannot be readily distributed among states, but has to be allocated to states using other information. A field survey based on the MCA list of companies can help identify the geographical distribution of establishments and strengthen the computation of GSDP.

Administrative data are only as good as the administration that produces them. The report shows the importance of sample surveys to validate administrative databases and the need to restrain the tendency to be overly confident on administrative data — be it EPFO data or MCA-21 data — without adequate scrutiny of the processes that generate the databases.

This article first appeared in the print edition on May 25, 2019 under the title ‘Double-check that data’.