« « Which states have been hit hardest by unemployment in the last decade?

Importance of combining data analysis with context (reflections on readings from week two) » »

A list of 40 CAR-friendly news organizations
(my adventures in parsing the IRE directory)

Posted by on Jan 11, 2010 in Blog, CAR | 2 Comments

Sure, data-driven reporting investigations sound good, but how does an aspiring journalist know which organizations are most supportive of that kind of work? I don’t believe there’s one all-encompassing way to tell, but I would argue that one measure might be the number of Investigative Reporters and Editors-card-carrying members in a news organization. And why not approach my first large-scale HTML parsing project using Python and the Beautiful Soup library by examining IRE membership in the United States? As with all data, it’s merely information, not the only piece needed to make a decision, but it certainly can’t hurt.

Here’s the list of the 40 news organizations/universities that appeared most frequently in the directory.  The numbers in parentheses detail how many IRE members belong to that organization.  I left off the top result, which was “Freelancer” — 159 of the 1,719 total members self-identified as such.  But that’s not a demonstration of how many IRE members belong to an organization.  It is a testament to the fact that work like this can be done by anyone with a blog and some computer, web and analytical skills.

  1. Associated Press (24)
  2. Columbia University (19)
  3. The Washington Post (17)
  4. The New York Times (16)
  5. University of Missouri (14)
  6. Arizona State University (12)
  7. The Boston Globe (12)
  8. The Seattle Times (12)
  9. Bloomberg News (11)
  10. Newsday (11)
  11. The Philadelphia Inquirer (9)
  12. USA TODAY (9)
  13. University of Maryland (9)
  14. CNN (8)
  15. California Watch (8)
  16. Houston Chronicle (8)
  17. Los Angeles Times (8)
  18. Milwaukee Journal Sentinel (8)
  19. Northwestern University (8)
  20. San Antonio Express-News (8)
  21. St. Petersburg Times (8)
  22. Star Tribune (8)
  23. The Dallas Morning News (8)
  24. The Orange County Register (8)
  25. The Wall Street Journal (8)
  26. American University (7)
  27. Center for Public Integrity (7)
  28. Chicago Tribune (7)
  29. Detroit Free Press (7)
  30. ProPublica (7)
  31. St. Louis Post-Dispatch (7)
  32. NBC News (6)
  33. National Public Radio (6)
  34. The Plain Dealer (6)
  35. The Sacramento Bee (6)
  36. The Salt Lake Tribune (6)
  37. The Tampa Tribune (6)
  38. University of California – Berkeley (6)
  39. University of Illinois (6)
  40. University of Minnesota (6)

Find a CSV here of the IRE membership broken down by the frequency of all 944 news organizations represented in the directory.

The directory itself is a publicly-accessible web page here.  Parsing web pages seems to be an important part of expanding the data we have access to as journalists. Sure, we can use the datasets given to us by federal agencies, and we even know how to copy and paste HTML tables, and make them into spreadsheets.  But if we have examples like this, where the information is laid out as a jumble of text, isolating what we’re looking for into separate columns becomes more difficult.  Luckily, Beautiful Soup allows the programmer to iterate through the tags in a web page, and even select the element that follows.  So, for example, I parse out the bit that comes after “Affiliation” everywhere on the page, to get my list of all the affiliations on the page.  Bring that into SQL, run a count on the column, and now there’s some interesting analysis that tells us quite a bit about where the investigative gurus are.

As a journalist just starting out, I’d prefer to surround myself with more, rather than less, of these investigative types.  It’s important to note that this data is from a list of people who identify as belonging to IRE, not NICAR.  They don’t all necessarily have a CAR emphasis.  But it seems to me that those interested in investigative would have a stake in extending the number of CAR-specializing journalists, and perhaps that stake would be greater than that of editors who haven’t invested the time and/or money to join IRE.


Of the 948 total organizations listed as affiliations of IRE members, at least 160 of them — or about 17 percent — are television stations.  That’s more than I expected to see, actually, which is a good thing.  I strongly believe in the potential of TV news stations.  They have great ability to investigate serious issues, and use visuals along with the familiar personality of their anchors and reporters to bring issues into people’s living rooms. I see some stations doing well with this, but I think this possibility is overlooked far too often.

I see that CAR skills for the broadcast media is one of the many themes listed as a focus for March’s CAR conference in Phoenix (which I’ll be attending, by the way, which has both the journalistic and geeky parts of my personality very excited!) I’ll be interested to hear how those reporters’ challenges differ from those at print publications, how they’re addressing those challenges and how they’re making the most of all technology has to offer.  It’s got to be more difficult to drive TV viewers to a web app than print consumers who are often already reading our content on the web.  But in the end, I believe that we’re all fundamentally online news services whatever our other platforms may be, and that’s only going to become more true in the years to come.

« « Which states have been hit hardest by unemployment in the last decade?

Importance of combining data analysis with context (reflections on readings from week two) » »
  • http://cronkite.asu.edu/faculty/doigbio.php Steve Doig

    Michelle: I’m glad to see ASU ranked so high — though almost certainly because I make my students join IRE instead of buying a text book. But you have linked to the University of Arizona rather than Arizona State University. Very definitely different places!


  • http://www.michelleminkoff.com Michelle Minkoff

    Steve: So sorry about that. I must have been powering through those links a bit too quickly. I should have known that — you’re the home base of the conference in March, if I’m not mistaken. You’ll find it’s fixed now.

    Glad you make your students join. The contacts I’ve already made through IRE, and the opportunity to see the queries of actual reporters at NICAR-L, have been invaluable as I strive to ramp up my skills, and I’d highly recommend it to any student striving to figure out if this is the career path for them.