« « Visual confections are more than mere presentation

Data Delver: Lisa Pickoff-White, California Watch » »

Visualizing networking: When it doesn’t work

So much in journalism and in life, we strive for perfection in our work. The best story idea, the best presentation.  We strive for perfection as individuals, we compete to be the best, get to the scene before someone else, write tighter, edit faster.  But sometimes, it’s just as important to recognize when something just didn’t work.  I think that’s still an achievement, the real problem is if you think succeeded, when in reality you have failed your audience. That was my lesson of the week.

Social networking analysis is a topic that my independent study advisor Rich Gordon and I have been discussing often during our weekly meetings and through materials he’s pointed me to, as well as my own reading and observation, I think it has some great potential in terms of journalistic applications.  Whether it’s about getting a job or pushing your political agenda, so much in this world is about who you know.

That’s why I was so excited that this week’s programming exercise in Ben Fry’s Visualizing Data book, which I’m using to learn Processing, focused on visualizing networks.  The chapter focused on showing these connections between adjacent words in the text, and proving how it was not an effective visualization technique.  I couldn’t agree more – this was the wrong application for it.

The book mapped the words in Mark Twain’s Huckleberry Finn, but because I don’t like replicating exact projects when I can avoid it, I decided to hop over to Project Gutenberg and find something more interesting.  Upon searching for journalism, because why not, I found a book on “Journalism and Women,” written by a man.  This piece is worth a post of its own, in trying to supposedly help women.  Parts are condescending, and some advice is more relevant today than ever, but for both genders.  Today’s subject isn’t the book’s content as a whole, but the connections between words to see if that helped to illuminate certain points.  Hint: It didn’t, because this was a failed experiment.

But I learned the skills of how to use a network map to visualize connections, which could be valuable in the future.  It’s really difficult to make sense of thousands, and even hundreds of nodes.  I think this type of visualization demands a strong interactive component so the user can make sense of what’s happening, and a filtering mechanism would be even better.

As I was writing this, Rich sent me a link to a Slate visualization of how news stories connect to each other — “News Dots”. Great content, and the interface works, using different colors for different categories, having many different levels of filtering and exploration.  You can even go back and look at other days.  The more you roll over, the more you click, the more you want to explore, but I can understand what’s important at a glance.  I just got sidetracked for 20 minutes in deep exploration. Which is the mark of good work, in my eyes.

Back to my visualization.  Here’s the final product, which I coded to be output as a PDF, it looks like a jumbled mess.  The bigger the bubble, the more time the word appears.  And there’s still so many common words with no meaning that the visualization is started to represent how many times “than” was used, utterly unhelpful.  (There’s 126 pages worth of connections by the way, when you force the program to output them, and that’s eliminating duplicate nodes.)  A user can’t get much out of this, no longer how long he or she stares at it, other than the fact that there are a lot of complicated connections.

The book then instructed us to try to make more sense of the information by bringing it into the open source Graphviz program. It looks at and sorts the nodes hierarchically.  The program froze on my 126 pages of connections.  When I limited it to 10 pages, the first two chapters of my book, it produced this rather complicated gif file — which has a significantly long load time.

It makes a little more sense, but frankly not much, but at least you can see the individual nodes. And if you zoom in, you can scroll around, which is an interesting way to provide interactivity within a very large static graphic.  But this is nowhere near ready for professional publication.

This section has taught me a lot about the importance of optimizing complicated visualizations, probably by coding it so only some parts load at once.

The chapter also took a look at using networking the right way by looking at a Web server and instead of individual nodes letting the data create a shape. The shape depicts a network, but without the same amount of detail. It works because it doesn’t overwhelm the reader.  But being as I have no Web server log of my own – I have to dig into the innards of my GoDaddy site, or I might be able to do it with my Django server once I get a project up and running in the coming week or so – I opted to treat it as an interesting reading exercise, but not one to put into practice, for now anyway.

« « Visual confections are more than mere presentation

Data Delver: Lisa Pickoff-White, California Watch » »