« « My next move: LA Times!

Self-teaching data and programming skills » »

Bringing data journalism into curricula

Posted by on Mar 24, 2010 in Blog, CAR, programming, theory | 8 Comments

As a recently graduated Medillian (yay for entering the “real world, boo for having to leave such a nurturing and wonderful place), I’ve been thinking a lot about data journalism and my generation.  Why were there so few students at NICAR?  Yes, it costs money to get to a conference, but I’m not even seeing all that many on NICAR-L.  Perhaps it’s not as snazzy or attractive as video or Flash, but I maintain that the Web cries out for data, and with the right mentality, it’s even more interesting than these other journalism subsets.

Data journalism ought to be something every journalist is familiar with, and at least considers as a possible specialization.  I think part of the issue is that it’s just not being taught as well as it could be in journalism schools.  First, let me be clear.  My strongest recommendation to anyone about graduate education — not just in journalism — is to seek out your own education.  Yeah, keep up with blogs and tutorials in your field, engage in social media.  But find professors whose work sounds interesting, and bug them.  Just because it’s not covered in the syllabus doesn’t mean they can’t help you.  It might have been their speciality in the past, or they might like to learn it with you.  When you work together closely with professors, you get individual attention and gain new skills. Here’s the not-so-secret secret: Most educators love to work with interested students.

But schools are there to help you, you shouldn’t have to seek everything out for yourself.  Our curricula must move forward, and teach us more of what we need to know, show us these possibilities exist.  So, I put fingers to keyboard to outline some of my thoughts of what I wish journalism schools could do.

I originally wrote this at the request of Rich Gordon, as Medill seeks to ramp up its data options.  After teaching myself, I supposedly have some sense of knowing what would have made it easier.  Let’s not kid ourselves: There’s still a long way to go.  But six months of obsession has to yield something, right?

I make this public because it’s a path any journalism school should consider. Hey, you don’t even have to be a school.  Any journalist would benefit from considering these concepts.  Let me know your thoughts, anything I’ve done so far I just consider Data 101 — there’s much more to do.

First, understand the distinction between computer-assisted reporting, data-driven Web development and integrating data into stories.  The areas overlap, but I would argue they are not all the same.

Integrating data into stories: This is the quickest, and as of now, most common use of data in journalism.  Use a stat to fill in the blank, add a little additional context to a story.  Often, it doesn’t hurt, but lacks the punch of the other options.  And using the wrong stat is like a fact error.  This is the easiest technique for general reporters to implement.  But you’ve got to know what you’re doing.  It’s easy to do it right, but it’s also easy to do it wrong.

Computer-assisted reporting: Get databases from public sources, or roll your own database from original reporting or Web scraping.  Interview this database as you would a source, ask it questions. Find the outliers, patterns that are strange.  Figure out why that is so by traditional shoe-leather reporting. Here, data serves as the backbone to a story.  Using a database to answer a question that comes from your beat reporting also falls under this category.

Data-driven Web development: You have a cool, or newsworthy data set you want to share with the world.  Instead of reporting specific findings from data in a written story, display much or all of the data in an application that users can navigate.  This takes advantage of non-linear storytelling concepts.  This helps us to give the public access to more of our data.  Instead of picking a few choice factoids and examples, we let the user find the records that are most relevant to them.  If done incorrectly, it’s information overload.  Done right, it’s personalized news making the most of the power of the Web.

Essential skills

I would like to see some sort of survey course that encompasses both theoretical and practical skills.  This may seem a tall order, but I think it’s doable.  I believe every journalism student in this era should be exposed to:

  • Basic math – be able to complete this test from UNC’s Phil Meyer
  • Understanding when to use absolute numbers or percents, know why per capita values are important and how to calculate them
  • Learning what kind of data sets are available, esp. from govt
  • The fact that unstructured text can be broken down into components to create numbers
  • The fact that Web scraping exists – even if they don’t know how, they can probably find someone who can
  • The fact that Access and SQL exists, what each of those can do that Excel can’t, or why they make life easier
  • How to add numbers and text together in a spreadsheet program
  • Examples of computer-assisted reporting-based stories and data-driven applications


I would suggest a multi-course sequence for those who truly want to specialize in this field:

  • One course focused on the above, but knowing how to actually do it, not just be aware it exists. Use tools like OutWit Hub to simplify scraping process.  There are ways to do programming-like tasks without actually knowing programming.  This would include understanding APIs and how to use them.  Also start talking about all tools that exist, and the best way to start doing self-teaching.  Where do you find good tutorials/blogs, how do you find the discipline?  Because in reality, even a full data sequence isn’t going to teach you everything you need to know, but will help you figure out what you need to know, and how to get up to speed quickly and accurately with new tools as they emerge.
  • One course that is mainly focused on meeting stories with deadlines, both daily and long-term and integrating these concepts into them. Data shouldn’t operate in a bubble.  In the Medill program, this could be an additional requirement for those in the downtown newsroom.  This could replace “alternative story forms” or maybe even one of the video requirements.
  • One advanced course that moves into introduction to programming and data-driven apps. I would recommend Python, and then a transition to Django.  (Yes, Aron Pilhofer, Ruby and Rails would also be fine.)  I say Python because it made the most sense to me, and you could also use the excellent Head First Programming book as a key reference point.  While it uses Python, what it’s really teaching is programming, and once you understand that structure, changing languages or frameworks will become simpler.

It’s certainly difficult to fit another thing into the rapidly expanding world of journalism education.  And for a one-year curriculum like Medill, it’s extraordinarily rough.  But let’s make it an option.

Brian Boyer, my fellow Medill alum and soon-to-be-colleague, if I may call him that, of the News Apps team at the Chicago Tribune, refers to journalist-programmers as “hacker journalists.”  I adore his tagline: “Like a photo journalist, but with a laptop.”  So let’s treat data journalism as a whole the same way we treat photo and videojournalism.  Everyone should know something about it, and people should have the option to specialize in it.

People go to grad school to get the skills that will help them get jobs.  This area of data is a growing field.  It’s too tough, I argue, for people to enter it just for a better chance at employment, but if it’s something you love, it might be a fantastic fit!  Every journalism student should get the chance to try it on for size, early on in their program.

« « My next move: LA Times!

Self-teaching data and programming skills » »