A Blog by Jonathan Low

 

Mar 3, 2013

Data Without Context Is Often Misleading

There is a good reason why horror films do so well: we like the thrill of scaring ourselves.

We also have a tendency to believe data. It just looks so organized and official with all those numbers and columns and stuff. Who could make that up? 

Which is why we are so susceptible to bad news packaged around data. And that could be perfectly harmless, just like a good horror flick. But in the real world, data gets converted into narratives with political or business overtones - and then it becomes policy or strategy. If the underlying presumption was incorrect or misguided or misinterpreted or intentionally misleading, the impact can be harmful and potentially disastrous.

As we become more data-centric, at least ostensibly, the implications become more significant. Health care, economic growth, infrastructure quality, international trade: all are becoming increasingly dependent on trend data - and the decisions made around each of them could have far-reaching consequences. Data - accurate or misleading - could either enhance humanity's ability to craft solutions or stumble in a mistaken direction from which it could take years to find our way back.

Until we learn to provide useful contexts for information and the decisions on which they are based, we are at risk in ways that may be as bad or worse as those resulting from ignorance. JL

Nick Bilton reports in the New York Times:

In today’s digitally connected world, data is everywhere: in our phones, search queries, friendships, dating profiles, cars, food, reading habits. Almost everything we touch is part of a larger data set. But the people and companies that interpret the data may fail to apply background and outside conditions to the numbers they capture.
Several years ago, Google, aware of how many of us were sneezing and coughing, created a fancy equation on its Web site to figure out just how many people had influenza. The math works like this: people’s location + flu-related search queries on Google + some really smart algorithms = the number of people with the flu in the United States.
So how did the algorithms fare this wretched winter? According to Google Flu Trends, at the flu season’s peak in mid-January, nearly 11 percent of the United States population had influenza.
Yikes! Take vitamins. Don’t leave the house. Wash your hands. Wash them again!
But wait. According to an article in the science journal Nature, Google’s disease-hunting algorithms were wrong: their results were double the actual estimates by the Centers for Disease Control and Prevention, which put the coughing and sniffling peak at 6 percent of the population.
Kelly Mason, a public affairs spokeswoman for Google, said the company’s Flu Trends site was meant to be only one source in addition to the C.D.C. and other flu surveillance methods. “We review and potentially update our model each season,” she said.
Scientists have a theory about what went wrong, as well.
“Several researchers suggest that the problems may be due to widespread media coverage of this year’s severe U.S. flu season,” Declan Butler wrote in Nature. Then add social media, which helped news of the flu spread quicker than the virus itself.
In other words, Google’s algorithm was looking only at the numbers, not at the context of the search results.
“Data inherently has all of the foibles of being human,” said Mark Hansen, director of the David and Helen Gurley Brown Institute for Media Innovation at Columbia University. “Data is not a magic force in society; it’s an extension of us.”
Society has encountered similar situations for centuries. In the 1600s, Dr. Hansen said, an early census was recorded in England as the Great Plague of London killed tens of thousands of Britons. To calculate the spread of the disease, officials started recording every christening and death in the city. And although this helped quantify the mortality rate, it also created other problems. There was now an astounding collection of statistical information for scientists to review and understand, but it took time to develop systems that could accurately assess the information.
Now, as we enter a world of big data, we have to learn how to apply context to these numbers.
Dr. Hansen said the problem of data without context could be summed up in a quote from the playwright Eugène Ionesco: “Of course, not everything is unsayable in words, only the living truth.”
I experienced this firsthand in the spring of 2010, when I was an adjunct professor at New York University teaching graduate students in the Interactive Telecommunications Program.
I created a class called “Telling Stories With Data, Sensors and Humans,” with the goal of determining whether sensors and data could become reporters and collect information. Students built little electronic contraptions with $30 computers called Arduinos, and attached several sensors, including ones that could detect light, noise and movement.
We wondered if we could use these sensors to determine whether students used the elevators more than the stairs, and whether that changed throughout the day. (Esoteric, sure, but a perfect example of a computer sitting there taking notes, rather than a human.)
We set up the sensors in some elevators and stairwells at N.Y.U. and waited. To our delighted surprise, the data we collected told a story, and it seemed that our experiment had worked.
As I left campus that evening, one of the N.Y.U. security guards who had seen students setting up the computers in the elevators asked how our experiment had gone. I explained that we had found that students seemed to use the elevators in the morning, perhaps because they were tired from staying up late, and switch to the stairs at night, when they became energized.
“Oh, no, they don’t,” the security guard told me, laughing as he assured me that lazy college students used the elevators whenever possible. “One of the elevators broke down a few evenings last week, so they had no choice but to use the stairs.”

0 comments:

Post a Comment