BY: Gary Marcus and Ernest Davis
Big data is not all it’s made out to be.
Big data is suddenly everywhere. Everyone seems to be collecting it, analysing it, making money from it and celebrating (or fearing) its powers. Whether we’re talking about analysing zillions of Google search queries to predict flu outbreaks, or zillions of phone records to detect signs of terrorist activity, or zillions of airline stats to find the best time to buy plane tickets, big data is on the case. By combining the power of modern computing with the plentiful data of the digital era, it promises to solve virtually any problem — crime, public health, the evolution of grammar, the perils of dating — just by crunching the numbers.
- Ae Dil Hai Mushkil Audience Reaction: Ranbir, Aishwarya, Anushka Starrer Gets A Thumbs Up
- Bigg Boss 10, October 27 Review: Navin, Lokesh Fights During The Immunity Task
- Shivaay Audience Reaction: Ajay Devgn Impresses Viewers
- Pakistan High Commission Staffer Caught With Defence Documents: What It Means For India & Pakistan
- The Royal Opera House Reopens After Decades Of Neglect: Here’s A Quick Tour
- Tata Sons Rubbishes Cyrus Mistry’s Allegations: Here’s What Happened
- Pakistan High Commissioner denies allegations leveled on his staffer for espionage activities
- Odisha: Villagers Refuse To Cremate Dalit Woman’s Body
- Here’s What Farhan Akhtar Said On Karan Johar-MNS ‘Deal’ Over Ae Dil Hai Mushkil’s Release
- Government’s Diwali Gift to Central Government Employees, Pensioners
- Bigg Boss 10 26th October Review: This Episode Is All About Fights
- New Zealand Beat India By 19 Runs In Ranchi; Series Levelled At 2-2
- DND Toll-Free: Noida Toll Company Moves Supreme Court Against Allahabad High Court
- British PM Theresa May Says Kashmir Is A Matter For India, Pakistan To Sort Out
- J&K: Students Suffer As Schools Along LOC Forced To Shut Amid Firing
Or so its champions allege. “In the next two decades,” the journalist Patrick Tucker writes in the latest big data manifesto, The Naked Future, “we will be able to predict huge areas of the future with far greater accuracy than ever before in human history, including events long thought to be beyond the realm of human inference.”
Is big data really all it’s cracked up to be? The first thing to note is that although big data is very good at detecting correlations, especially subtle correlations that an analysis of smaller data sets might miss, it never tells us which correlations are meaningful. A big data analysis might reveal, for instance, that from 2006 to 2011 the United States murder rate was well correlated with the market share of Internet Explorer: both went down sharply. But it’s hard to imagine there is any causal relationship between the two.
Second, big data can work well as an adjunct to scientific inquiry but rarely succeeds as a wholesale replacement. Molecular biologists, for example, would very much like to be able to infer the three-dimensional structure of proteins from their underlying DNA sequence, and scientists working on the problem use big data as one tool among many. But no scientist thinks you can solve this problem by crunching data alone, no matter how powerful the statistical analysis; you will always need to start with an analysis that relies on an understanding of physics and biochemistry.
Third, many tools that are based on big data can be easily gamed. For example, big data programs for grading student essays often rely on measures like sentence length and word sophistication, which are found to correlate well with the scores given by human graders. But once students figure out how such a program works, they start writing long sentences and using obscure words, rather than learning how to actually formulate and write clear, coherent text.
Fourth, even when the results of a big-data analysis aren’t intentionally gamed, they often turn out to be less robust than they initially seem. Consider Google Flu Trends, once the poster child for big data. In 2009, Google reported — to considerable fanfare — that by analysing flu-related search queries, it had been able to detect the spread of the flu as accurately and more quickly than the Centres for Disease Control and Prevention. A few years later, though, Google Flu Trends began to falter; for the last two years it has made more bad predictions than good ones.
Big data is here to stay, as it should be. But let’s be realistic: It’s an important resource for anyone analysing data, not a
Gary Marcus is professor of psychology at New York University and an editor of the forthcoming book ‘The Future of the Brain’. Ernest Davis is professor of computer science at New York University