« « Data Delver: William Hartnett, Palm Beach Post

Data Delver: Mo Tamman, Wall Street Journal » »

Data Delver: David Donald, Center for Public Integrity

Posted by on Feb 13, 2010 in Blog, CAR, data delvers | One Comment

What is it we love about computer-assisted reporting? Why is the NICAR-L list full of people eager and willing to diagnose problem queries and discuss the merits of mapping software? What draws people to it?  The first time I saw a Python script perform batch geocoding, as numbers spun out of a Terminal console quicker than I could read them, my eyes glazed over with wonder, and I think my mouth dropped open a little.  But that process isn’t journalism — that’s computer science.  It’s what you do with the information that matters.  It’s the computer’s ability to parse, analyze and display information that helps us find and tell better stories.

It’s not news to many of my readers and mentors, I suppose, but CAR is the means to an end — it makes investigative journalism possible.  The “R’ is the important part, supplemented by the “C.”  I’ve been on a whirlwind journey, discovering so much in the past months, I don’t think I’ve stopped to remember that as often as I should.  Which is why I really appreciated the opportunity to remember why we do what we do, when I spoke with David Donald, Data Editor for the Center for Public Integrity, an organization whose main goal is to produce powerful journalistic investigations.  CAR in all its aspects – finding data, crunching data, posting it online — helps the Center pursue its mission.

This profile of Donald is a part of my continuing series I’m calling “Data Delvers,” where I pass on summaries, quotes and audio clips from conversations with journalists using technology to find, analyze and convey data-driven stories and/or projects to the modern audience.

Data’s role in the Center’s newsroom

Here are some statistics for you: David Donald makes up one-half of the Center’s CAR team.  The CAR team makes up 1/15 of the total staff working at the Center.

Donald’s fellow data cruncher is Computer-Assisted Reporting Specialist Mike Pell, a graduate of the University of Missouri who had a fellowship at NICAR.  The two of them are integrated into the overall newsroom, something Donald feels is important.  In past years, before Donald joined the Center in 2008, the three-or-four person team worked in a separate room, which Donald felt served to separate the data team from the group.  Sometimes Donald and Pell look up data upon request from reporters, but more often their work is a more central part of a project.  Often they’re involved in the first planning meetings.  Donald said it’s key to constantly be in communication across reporting, data, Web and graphics teams, otherwise it may become too easy to simply be a lookup service, that isn’t reporting, but merely acts on requests.

The Center primarily uses SQL Server, largely because posting the data online often falls to the man responsible for IT, who is most familiar with SQL Server.  Donald pointed out that even moving to open source software requires overhead for setup, and that he doesn’t have a software preference.  “I’m more of a pragmatist than an idealist,” he said.

Bringing CAR training to newsrooms

Before his work at the Center, Donald was IRE training director, an experience that he reflects on often during his work at the Center.  As training director, he traveled to newsrooms throughout the country.  In fact, there was so much travel that the job is nicknamed OTR, or “On the Road” by some at IRE.   He listened to what newsroom staff said they needed in terms of data, and then worked with reporters to help them improve those skills.

“Having been a teacher years ago, it was a kind of a nice five-year period for me to combine my CAR interests with being able to, hopefully, inspire some people through teaching,” said Donald.

It all started with one small book

Donald didn’t get into CAR until his mid-30s, working for some time as a teacher.  Around age 37, he decided he wanted to be a reporter, and went to Kent State University for graduate school.  Near the end of his time there, he was wandering around the library, where he often liked to pick up journalism books that looked interesting. He saw a small one with “the strangest title” — Phil Meyer’s “Precision Journalism.”  “I can still remember taking it off of the shelf, starting to read it and going, ‘Wow; this is really interesting.’ So I got hooked,” he said.

After grad school, he moved to a reporting job in Savannah, Georgia, and went to his first NICAR conference in Raleigh in 1993, and he started to see more possibilities.  But, Donald said, he knew one conference wasn’t sufficient.  He took statistics and social sciences courses, to update his math skills, and then to learn how to use those skills. During this time, he moved from education reporter to projects editor.  Continuing his learning through IRE, he went to a statistics boot camp taught by Meyer — the man whose book had first inspired him.  There, he met Sarah Cohen, the IRE training director at that time, who said his teaching background made him a good candidate for the position in the future.

Ujima Project: International CAR

One of the ongoing data projects at the Center is the Ujima Project.  Started by Ron Nixon, now of the New York Times, the initiative focuses on bringing hard-to-access data sets to international journalists, particularly in Africa.  Ujima is Swahili for “Community and Responsibility.”  The goal is to make it easy to download and analyze data, and also to provide the necessary tools.  Journalists in Africa often don’t have the money for advanced software, or the technical knowledge to learn to work with open source options.

CAR and the Center will stay linked for the future

In the meantime, the Center has been succeeding in its fundraising efforts, and is looking at expanding its staff, including its data team. The new hire will help with traditional projects, as well as outreach such as Ujima.  “It enables us to expand at a time when a number of news organizations are obviously going in the other direction, said Donald.  “It’s a great place to be.”

Audio: The role of the Center in posting data onlineThe role of the Center in posting data online.

Donald said the Center has always been involved in posting data, and they will continue to do so, but only when it serves a journalistic purpose. That means it’s currently unavailable or hard to find, or if it’s been enhanced by staff at the Center in some way.

Investigative journalism demands CAR

Donald emphasized that the purpose of CAR is to be used as a tool for investigative reporting.  Other uses for data are important, but for him, CAR is a means to the end of reporting investigations.

“The heart of CAR has always been in data analysis,” he said.  “That has been the bedrock of investigative reporting.”

Advanced CAR skills, such as mapping, posting data online, and web scraping should be available tools in every newsroom.  That means one person should know how to do it, but it’s not mandatory for everyone.  The fundamental point for Donald is that CAR must continue to be a priority in journalism, because it makes journalism what it is.

“It’s not really a question of why is CAR important, but the real question is if you think investigative reporting is important. If you believe investigative reporting is necessary for a functioning democracy,  then it’s something we should pay attention to.  You can do investigative reporting without CAR, but if you don’t have CAR in your investigative team, you are going to miss all kinds of stories in your reporting,” said Donald.  “Is there CAR work that isn’t investigative? Sure, but that’s not why we do it.”

Extended interview transcript

Keep reading for more details, including Donald’s perspective on the new trend of locally-based investigative journalism groups.

I can see how CAR applies to so much of what CPI does, so I am curious as to how you integrate with the rest of the team.

That’s a good question. CAR is set up as its own team within the newsroom. Here, it’s a very collegial newsroom and it’s not that large.  There are around 30 folks, maybe a few more, that work at CPI. That’s everyone; it’s not just on the journalism side. We have people that do fundraising, since we’re non-profit. So the newsroom is relatively small. We might have 20 reporters. One of the things I think is key is that we’re not assigned to any individual team or any individual topic. There are two of us on the CAR team.

One thing that was most important, when I first got here, was to make sure that the CAR team was visible where all the reporters work, for a couple of reasons. It makes it seem like it’s nothing unusual that we do computer-assisted reporting. We’re in with everybody else doing their usual work. I think it helps for the CAR team to listen to what other people are talking about. It makes us very approachable since the reporters feel they are not going to another area of the newsroom just to talk about CAR.

In the past, the CAR team — which might have been three or four people at various times — was in its own room. I decided not to go that route. There was a period, in about ’06-’07, where the newsroom had shrunk, and there were various issues going on. It was a difficult transition for the Center when its founder, Chuck Lewis, left. Bill Buzenberg became executive director and put it back on real solid financial footing. So the CAR team had to be built back up. I got here and Mike Pell, who recently had been out at the University of Missouri in its grad school, was here. He had an assistantship or fellowship at the Data Library at the National Institute for Computer-Assisted Reporting. I was hired from IRE. I worked as the training director for NICAR for about five years before joining the Center here.

I had some ideas of not just CAR projects and analysis that I wanted to do, but how to integrate the team. The other thing is that I think it’s important when story ideas are discussed, that members of the CAR team are present, so we’re not deemed as a service organization to bring in at some point in the process, but to be there early on, right from the start. I think another important thing is to build a strong rapport with whoever your graphics people are, because that’s crucial to data visualization, to integration and presentation, and sometimes even on the analytical end. Be aware of what people are up to and their strengths.

Let’s say there’s a story idea and the CAR team is at the meeting where it’s floated. Do you then just take the data set and explore, or do the rest of reporters tell you to look for something?

It depends. We can also pitch our own ideas. With data we know, that we think would be a good story, we take it to a reporter who might be interested and say, “Look at what I’ve got. Would you be interested if we looked at this? This is the kind of thing I think we’d find.” If they have an idea, we’ll listen and say we’re aware of the data. If they know of the data, we’ll say what we think of it. The key in all that discussion is to try to feel out the reporters. Are they using the data, thinking of using the data simply to support an idea or thesis, simply wanting evidence for something they’re already reporting on, or are they really looking at data analysis to generate that thesis or idea? That’s really the great dichotomy in CAR. If you’re doing the latter, that’s real original investigative work. If it’s the first, it’s almost a research service type of thing: a certain set of facts, a certain kind of numbers. If the first is mostly used, then CAR is really being underutilized.

At the Center, what is the percentage of using each of those two approaches to CAR?

At the Center we have the great fortune of being given a lot of time. We’re seen as a scarce resource now, with only the two of us. I would say maybe 5% of just finding things for people. Most of my time is being used for original data. That’s one of the reasons I chose to work here. They have a history of investigations. This is what the director and current editors wanted from me. This is a good place to be.

What was your work over at IRE?

Basically, I did not do a lot of original reporting or data analysis. The training director at IRE was conceived as a travel job. In other words, taking IRE out into newsrooms, universities, our own workshops, whatever, but also taking it around the country and also internationally. IRE’s offices are at the University of Missouri. But I didn’t live in Columbia, Missouri; I lived in Georgia, because as training director when I was generating revenue for IRE, I was always on the road traveling somewhere. Internally in IRE the training director is known as OTR, which stands for “On The Road.” It was a lot of travel. But that gets a little old after a while. It may sound a little romantic upfront, but travel today, as anybody who has experience with an airport knows, has lost the glamour. But the good part was that I got into lots of newsrooms.

From a period of ’04 to ’08 I probably was in as many newsrooms as almost anybody else getting an idea of what was going on. And it did include some international travel, which was quite rewarding — getting to train journalists in China, South Africa and other places. You could say that was an added perk. I taught some skills, but more of my time was spent listening to what people needed in terms of training and what they were trying to do, and then helping. You get to see what CAR programs were more successful than others. Having been a teacher years ago, it was a kind of a nice five-year period for me to combine my CAR interests with being able to, hopefully, inspire some people through teaching.

What is it that you discovered as you were visiting these newsrooms?

In most instances they brought me in because they needed to jumpstart some kind of computer-assisted reporting. Editors had to approve this, so they knew there was some value, but they just didn’t know enough about how to get it off the ground, what needed to be done and how to maintain a program. They would have a pretty tight budget for getting it done; that was probably the norm. So I would train, I would try to give advice, I would try to follow up. Were there places doing it right? Of course; all you have to do is look around. Interestingly, I left and came to the Center right about the time that some of the great newspaper newsrooms and local station newsrooms were dealing with cut-backs, lay-offs, buy-outs, just the general shrinkage that’s been going on; I missed that while I was with IRE. I’m sure that now it would be very different if I were out there.

What tools are you using to do the analysis at the Center?

The legacy here has been our server system, how we store data, distribute it through the newsroom, work with it, archive it: essentially it’s been Microsoft SQL server. While we do use some open source MySQL, that’s fine but our IT support has a lot of background here in Microsoft SQL server and I’m comfortable using SQL server, so it seemed to make a lot of sense since we didn’t have a budget when I came onboard for ripping everything out and starting over. You might think with open source you wouldn’t need to do that because it’s all free, but you still have to think about actual servers and how you’d go about that.

Quite frankly, I’m somewhat software-neutral, maybe more than the average CAR person. If it can get your job done, then that’s the right software. I see pros and cons with those off-the-shelf products like Microsoft SQL server and open source. So philosophically, I’m much more of a pragmatist than an idealist when it comes to that particular issue.

I can’t see how you can do this job without a decent spreadsheet. Almost every computer these days comes with Excel. Sure, there are lots of bells and whistles, but it really gets the job done for me. It’s that all-purpose tool. I somewhat have a background in statistics, statistical analysis and in survey work and the social sciences, a little bit. One of the first tools that I learned early in my development as a CAR specialist, if you will, was SPSS. We’re doing more, with the new graphics person that’s come on board, with ArcView GIS.

How did you get interested in IRE? How did your career develop?

It’s not a way that I’d necessarily recommend. Up until the time I was in my late 30s, I did lots of other things that had nothing to do with journalism. Somewhere around the age of 37 or 38 I got a hare-brained idea that I wanted to be a reporter. I went to graduate school at Kent State in Ohio because I figured that I’d better get some training in it. Near the end of that program, which was very good, I was in the research library on campus. One of the things I like to do when I’m in a library like that is find the shelves of subject areas I’m interested in. So I was wandering around the journalism shelves and came upon what I thought was the strangest title for a journalism book, a reporting book: Precision Journalism by Phil Meyer. I can still remember taking it off of the shelf, starting reading it and going ‘Wow; this is really interesting.’ So I got hooked. But I was at the end of my grad school period, so I really didn’t explore very much at Kent State.

When I got my second reporting job, in Savannah, Georgia, that’s what I wanted to do – go down that path. I went to the very first NICAR sponsored/IRE sponsored CAR conference in Raleigh, North Carolina in 1993, so I got an idea of what was possible and it was very fascinating. I met my future boss there: Brant Houston, who ran IRE and hired me at IRE. But one conference wasn’t sufficient. One of the things I knew I wanted was more in the data analysis/statistical stuff so I took a stats course from a math department and did well. Now I knew the math, but didn’t know what to do with it. I took a social research course from a sociology grad school; non- degree. I just wanted these classes, these skills, and they accommodated me. So I started doing surveys in the newsroom, of our community. By that point I became the research and projects editor. I started life for another reason as an education reporter. I did education reporting for a few years. I was getting pigeon-holed into that because I had been a high school.

In 1998, IRE and NICAR used to run, and still runs, a statistics boot camp. It used to be that Phil Meyer taught that for one week in North Carolina. I went to that and it was the best training I ever had. The training director at that time was Sarah Cohen. She was there running the boot camp with Phil. On a break, Sarah and I were talking, and she said with my teaching background I would be a really good candidate to train for IRE. I filed that away. When my son grew up, I felt it would be easier to be away from home as much as you needed to be when you are the IRE training director. When Ron Nixon left that position in 2003, it became open and I decided I wanted to work for IRE and leave the work of the daily newsroom, so I applied for it and got the job. I started in the beginning of ’04 and worked there nearly five years.

There are a lot of investigative non-profits springing up. In terms of the future of CAR, how do you see these fitting in? Do you think they’re taking away from mainstream newspapers?

No. There’s just a handful of two or three of those non-profits that go way back and have been reporting at the national level. You have the Center for Investigative Reporting, out of San Francisco, which has not had a big CAR tradition through the years. They were founded in the ‘70s. The Center was founded around ’89 or ’90 by Chuck Lewis, and he did get in and embrace the CAR stuff early on. That was part of what eventually helped to get the Center on the map. Of course you have ProPublica, and a good friend of mine, Jennifer LaFleur, does major CAR work there now, and she’s also a former IRE training director. There’s just a lot fewer investigative reporters out there doing work now. The reporters that remain have much less time to do the digging. There’s more pressure on them; quite frankly, they have more delivery methods; meaning they have to spend time potentially creating a greater variety of reports than we used to. With all those time pressures it’s hard to do something as time-intensive as investigative reporting or the CAR aspect of it.

So many of these new non-profits, while they may break some national news, most of them are state or regionally focused. I think that’s a good thing because that kind of investigative work is one of the main things that we’re losing. It’s not to compete with the Center or ProPublica, but it’s to do what’s missing with what’s going on in traditional newsrooms. I think that’s very healthy.

The question is, “How are they going to sustain themselves?” “How do they find the money and time?” They’ve taken on the challenge. We don’t see them as competing, but as complementing. We shouldn’t be trying to outscoop anyone, and shouldn’t be replicating each other’s work. ProPublica has done a lot on the stimulus money, and that’s great. But it wouldn’t make sense for us to replicate that effort.

When there were more CAR people, were you able to do more stories?

I haven’t been here that long. There’s only been two of us here my entire time. Will we possibly hire more? There’s been some discussion and it looks like we will be bringing on one more CAR person. Then yes, we will be doing more. Because the Center went through a more difficult time in ’06 and ‘07, there was just less revenue generated and so that was one of the reasons why the CAR program shrunk. So now the success that Bill Buzenberg and his team has had in building back the Center’s fundraising and support enables us to expand at a time when a number of news organizations are obviously going in the other direction.

What are your thoughts about the movement to post more data online that we’re seeing in news organizations?

That used to be one of the Center’s primary roles, particularly with Federal Election Commission data. Every four years, I think starting in 1996, we would do the ‘Buying of the President’ Series. It would look at who’s giving money to all the candidates that are running. So there was a tradition of putting that online. I think there are a number of things that have changed the model quite a bit. One: There are some organizations – non-profit — that have made data distribution a bigger part of their mission than the analysis part. Certainly the Sunlight Foundation has done a great job in getting public data online; it’s more usable than maybe what the government offers it up as. Even something like The Center for Responsive Politics, which does some great reporting, still has their niche in terms of campaign financing, lobbying and that kind of stuff.

So when we post data to the web it should be data that no one else has because a) we discovered it and analyzed it first. Therefore it’s not our data, it’s public data, but nobody’s making it available. b) We enhanced the data. We had built some very specific lobbying databases because in order to track certain issues in lobbying, you can’t just download the data, and do some sorting and a little bit of math. You actually have to go in and do hand coding and reading documents, because the data has flaws. Our databases on these topics tend to have more detail, offering the public more than the standard data or lobbying data you might get from the Center of Responsive Politics or directly from the Senate or House’s Web site.

In terms of making the databases talk to each other and posting it online: Are the technical abilities behind that mostly on you and your colleague that make up the CAR team?

Yes. Besides the two of us on the CAR team, there’s the IT guy. He will help with that quite a bit because he likes the work. So it’s really the three of us and with help from the web team if they can help, depending on what we’re doing.

If we’re creating a Google map, IT or our graphic artist might get involved. We don’t post data just to post data. The other thing we’ve gotten into is a project started by Ron Nixon, who used to be training director at IRE and currently is at The New York Times. This is not a New York Times project. He alone had started a project called Ujima, which distributes databases that have lots of local and regional and countrywide data for countries in Africa, that is up and running. Part of the Ujima Project is to identify data that would be useful to African-based journalists. The data needs to be easy to download so they can grab it easily and put it into a basic tool like Excel. At this point, they often don’t have the ability to buy, or the technical skill to operate, a database manager. The site’s up and running, but it’s going to be expanded quite a bit. Ujima is Swahili, and it means something like ‘Community and Responsibility.’

What are your thoughts on web scraping as it relates to CAR?

I think it’s a necessary skill. I don’t think every CAR reporter needs to have it, unless they are what I would call a one-person operation. Here, at the Center, we have a team, so we don’t need to have everyone on the team able to do that. But we need to have the skill in-house so we can do it when we need it. You also need to be able to build your own database from paper documents. It’s time consuming, and when we have to we will pay companies to do manual data entry. The way I see it, the heart of computer-assisted reporting is that it’s an extension of what investigative reporters have always. Sometimes, you need scraping skills to build a database. Then we need to think about how to get what we find into people’s brains, and that’s where this is a role for data visualization in CAR. But the heart of CAR remains in data analysis, which is a fancy way of saying digging through the database looking for evidence. That has been the bedrock of investigative reporting.

Why do you see computer-assisted reporting as important to journalism?

The reason I got involved is because I realized it could lead to better investigations. I think that’s why it’s important to journalism, because it really can add a lot to an investigative team, it helps by supplementing stories with evidence to prove your point. It’s not really a question of why is CAR important, but the real question is if you think investigative reporting is important. If you believe investigative reporting is necessary for a functioning democracy, then it’s something we should pay attention to. You can do investigative reporting without CAR, but if you don’t have CAR in your investigative team, you are going to miss all kinds of stories in your reporting. Is there CAR work that isn’t investigative? Sure, but that’s not why we do it.

« « Data Delver: William Hartnett, Palm Beach Post

Data Delver: Mo Tamman, Wall Street Journal » »