Big Data with a Grain of Salt*


By Rachel Grabenhofer, Managing Editor, Cosmetics & Toiletries

Big Data is everywhere, tracking click-throughs and time stamps of our daily lives like the digital paparazzi. It can be used for good, revealing patterns in the most miniscule of details and leading to discoveries beyond our wildest imaginations. But in the wrong hands, it’s dangerous; unraveling identities, draining bank accounts or worse.

From a scientific standpoint, Big Bad Data is greedy and hungry, swallowing everything in its path to chew it up and spit it out, making something from nothing. This can lead to false correlations and, as many industries have seen, junk science. And with today’s scientific community tallying results of everything from gene up- and down-regulation, to neuron response patterns and more, we must be careful.

Statisticians remind us, “Correlation does not indicate causation,” and vast amounts of data can easily mislead the misguided or faint of heart. That’s why Susan Etlinger, a data analyst recently featured on NPR Radio’s “TED Radio Hour,” underlined that context is crucial.1

People Make Meaning, So Think Critically

“When it comes to Big Data and the challenges of interpreting it, size isn’t everything,” said Etlinger, who explained there’s also its speed and variety of types (images, text, video and audio). And what unites these types is they’re created by people and they require context.

Etlinger expanded, “Facts are vulnerable to mis-use; willful or otherwise. Why? Because data doesn’t create meaning. People do. And now, with the capability to process exabytes of data at lightning speed, we have the potential to make bad decisions far more quickly, efficiently and with far greater impact than we did in the past,” she said.

This makes it more important to spend time on the humanities—sociology, rhetoric, philosophy, ethics, etc., because they give context. “They help us become better critical thinkers,” Etlinger said, adding they also help teach us to find confirmation biases and false correlations. “[Just] because something happens after something doesn’t mean it happened because of it.”

“As my high school algebra teacher used to say, ‘show your math’ because if I don’t know what steps you took, I don’t know what steps you didn’t take. And if I don’t know what questions you asked, I don’t know what you didn’t ask.”

She added that this means asking the hardest question of all: Does the data really show us what we think it does? Or do the results make us more successful?

This is a hard question indeed. While a product may be built on good, factual science—and logic dictates we’d be shooting ourselves in the foot to make false claims and mislead or harm the very consumers we serve—humans ultimately want to create meaning. Especially skeptical ones. And when they base that meaning on Big Bad Data, it can create something from nothing, which fuels skepticism. This makes it even more important to ask this difficult question.

Putting Big Data into Practice

Thankfully, we’re not alone. In fact, we’d be hard-pressed to find an industry that’s not collecting Big Data and trying to connect the right dots for good use.

Richard Shriffin, of Indiana University, studied2 how others are attempting to draw causal inferences from Big Data. “The age of Big Data poses enormous challenges because collecting and storing the data are only a minimal first step.”

He described Big Data collection in stages: 1) finding interesting patterns in the data; 2) explaining those patterns, e.g., with experimental manipulations of variables and additional data; and 3) using the patterns and explanations for a variety of purposes.

“Finding interesting patterns is itself a daunting task because a hallmark of Big Data is the fact that it vastly exceeds human comprehension.”

He continued, “[Then,] how does one judge what is a significant pattern or correlation?” To a large extent, it’s a matter of statistical practice and implementation. Then again, as Shriffin noted, traditional statistics were developed to deal with 2×2 tables, which are nowhere near the complexities of Big Data.

“Most Big Data [is] formed as a nonrandom sample taken from the infinitely complex real world: Pretty much everything in the real world interacts with everything else, to at least some degree,” wrote Shriffin. Needless to say, statisticians are working on this, too.

In the end, it seems the best thing we can do is remember our methods and data have limits. Whatever the results appear to tell us should be reconsidered—and more than once. We don’t want to feed into Big Bad Data; we need to see it in context and serve it with a grain of salt.


*Adapted from Cosmetics & Toiletries, Oct 2016

More in Trends