« « Data Delver: Ted Mellnik, Charlotte Observer database editor

Committing fact errors in visualizations » »

Collaborating with computers to parse “big data”

Posted by on Jan 18, 2010 in Blog, CAR, sql, theory, web | No Comments

Picture it: You’ve been given a new story assignment, and you have to leave for the interview in five minutes. You’ve got to have enough background to ask the right questions, but there’s no time to do research.  Somehow pulling that all-nighter on an English term paper in college seems like a cakewalk. Luckily, you’re a seasoned journalist (where I hope to be in 10 years), and you’re used to figuring it out fast.

Enter the new reporter. He can come up with millions of questions in a millisecond, and assimilate information ten times faster than that. He doesn’t take no for an answer, and he doesn’t even need to spend travel time on his interviews. Oh, and he’s the pinnacle of efficiency – doesn’t take lunch or bathroom breaks, doesn’t even stop to sleep.  He’s the new ace reporter on staff: a computer.

Time to be afraid? No, because the computer has a serious Achilles heel — it has no flexibility, no understanding of nuance. It’s either black or white, right or wrong. The best way to work with your new colleague, so to speak, is to let him focus on his strengths, while you focus on yours. I’m talking about using the machine as an analytical tool for details, but leaving the big picture exploration to the humans.  This concept is buzzing around the Internet these days — the idea of parsing “big data” — data sets that are too big for human brains alone to comprehend.

This idea was explored in a large-scale report (66-page-PDF ahead!)  from the Aspen Institute, titled “The Promise and Peril of Big Data.”  I’m not surprised by a title like that, having long been a believer that all technology is amoral, and like much of life, brings its own good and bad aspects.  The computer can examine these large sets, but we have to use our own judgment to bring journalistic and statistical ethics — just because there is a correlation, that doesn’t mean there is a cause and effect, for example.  But just as we go to expert sources to fill in our own knowledge gaps, the computer serves as a certain type of expert that can aid reporting.

Here’s the computer’s specialty:  It helps us analyze information, and then display it so it’s fit for human consumption. This is the essence of journalism, it’s just that the computer can process more stuff than the human brain can.  But we must make sense of it.  He’ll take care of the trees, we have to deal with the forest.

As for statistical analysis, it’s been big in the business community for years. The idea is simple: Parse the information so it can help people make intelligent decisions. The field is progressing rapidly, and the community is talking about using data for storytelling. Roger Magoulas, director of research for O’Reilly Media, recently said, “You need more sophisticated ways of distilling what you know down. A good visualization, if you turn something into a good story, it’s just going to resonate a lot more…than a simple chart that gets delivered to everyone.”

Wait, isn’t that what journalists do? Only difference is that instead of making data accessible to a few decision makers, it’s about making data accessible to a bigger group of people.  Many call it data analysis, I call it opening up democracy.

« « Data Delver: Ted Mellnik, Charlotte Observer database editor

Committing fact errors in visualizations » »