« « Data Delver: Jennifer LaFleur, ProPublica

Data Delver: Matt Wynn, Arizona Republic » »

Data Delver: Anthony DeBarros, USA Today

Posted by on Mar 8, 2010 in Blog, CAR, data delvers | No Comments

It’s one thing to say we’re interested in the conversion of journalism and technology now, but it was a completely different story decades ago, when it was the beginning of a melding of the writers and the computer geeks. And as much as things were different than today, newsrooms still wondered how to best integrate the new technology. Then, just as now, those interested in combining journalism and technology had some fascinating challenges to tackle. And for Anthony DeBarros, Senior Database Editor at USA Today, the transition of journalism aligned with his career, and changed how he did his job, the education he pursued and the career path he followed. It’s a fascinating story of seizing on opportunities when they arise, and following your dreams.

This profile of DeBarros is a part of my continuing series I’m calling “Data Delvers,” where I pass on summaries, quotes and audio clips from conversations with journalists using technology to find, analyze and convey data-driven stories and/or projects to the modern audience.

Path to CAR

DeBarros first became interested in journalism while in college, and was considering going into the radio business. He worked at a local station while in school, but ultimately decided against it as a career due to the low pay.  So, he started work as a cops reporter.  At about the same time, he bought his first personal computer and enjoyed experimenting with all the different tools it had to offer.  “I reformatted floppy disks for fun,” said DeBarros.

So, when the time came for newsrooms to start more thoroughly integrating technology into their daily routines, DeBarros said he was ready and willing to get his hands dirty with any gadget that he could. He served as systems editor, a common position in the ’90s, helping the Poughkeepsie Journal to transition to color printing, and working to retool their page layouts.  He also worked with the Life section of the paper, in his editorial function.

Audio: DeBarros realizes he can pass data between computers.DeBarros realizes he can pass data between computers.

Technology and journalism — it was a combination that DeBarros continued to notice and enjoy.  So much so that he went back to school for a computer science degree, thinking it would be interesting and help his career, he said. While earning that degree, he learned about databases.  And he continued working at the Poughkeepsie Journal.  Soon after, he went to USA Today on a loanership program — Gannett properties such as the Journal would send staffers to Washington to gain experience on a national paper.  DeBarros met the other database editors, including Census guru Paul Overberg, and DeBarros never left.

USA Today’s multi-section approach to CAR

When DeBarros first started at USA Today, the four database editors who made up the CAR team all worked together. But by 1998, they spread out into various sections.  For example, Barbie Hansen moved to Money, where she remains today, and DeBarros focused on Life, especially entertainment.  He remained there until 2008, when he became Senior Database Editor, and works with other CAR specialists across beats.  DeBarros said USA Today has made a commitment to spreading CAR across topical areas, since there are data sets that are applicable in a large variety of stories.  “We’ve had the relative luxury of having more than one CAR person or database editor,” said DeBarros, “so we made the decision early on not to focus all of our people on those traditional news section topics, but spread out into the other areas as well.”

Data’s essential to the story

Interview a data source the way you would a human source, said DeBarros.  Not all data is good data, but it’s up to you as a journalist to figure it out. “There are some data sets out there that are rich and deep and valuable, and you’re going to go to them again and again,” he said, “and there are some where you’re like, ‘Meh, doesn’t really do much for me.’ But the only way to discover that is to really interview them and find out what they’ve got, and what they’re really telling you.

He’s concerned about news organizations providing uncontextualized data sets, but if you approach the use of data philosophically, following in the footsteps of many NICARians, you can’t really separate the data from the context.  “We’ve never really divorced the notion of context from data, because to us, data was never just a means to an end in itself, it was always the driver of a story.”

Changing platforms

DeBarros sees the rising popularity of all sorts of technological tools for analysis and presentation.  He said a lot of that is currently falling to the CAR specialist, and it’s an extremely broad spectrum for one person to cover.

“I think it’s very hard for people to be an expert in mapping and an expert in frameworks and an expert in statistics and an expert in data parsing and an expert in writing – there’s a lot there,” he said.  “Right now, I think we have a lot of people who are doing their very best to try to cover all those bases, but people being people, we all gravitate toward different parts of that spectrum.”

But the most important part is to remember that no matter what the tool — it’s still always about conveying information and telling a story: “I just think we always need to think about: What is the story that we’re telling? There always has to be a story. And that holds true no matter what we’re doing.”

Extended transcript

Keep reading for more about the dovetailing of DeBarros’ career and the development of computers as well as the type of work CAR reporters do at USA Today.

Trace your career path from how you got started to CAR and journalism to how you got to USA Today.

I was always focused on the arts and liberal arts. Even when I was in high school, I gravitated toward the school newspaper, and then the radio station. In fact, I was very much set on a career in radio and broadcasting. After my first two years of college, I wound up getting a job at a local radio station, working as a disc jockey at a big rock station in the early 1980s. I did that for a little while, and I knew that I was going to go and finish off my degree, and I was trying to decide, because I had studied communications for the first two years of college, I wasn’t too sure if I wanted to continue as a communications major, or if I wanted to declare a major in something more specific, because after working for radio for a little while, I realized it was an extremely low-paying job. In fact, I was making more working at a local McDonalds than I was working in radio.

I had always gravitated toward writing, and I’d always really loved telling stories, I put my own little newspaper together as a kid. So, I decided to go back to school, and declare myself an English major, and do a concentration in journalism. So, I was going to Marist College in upstate New York at the time, and what I also did, while I was going to school and doing that degree, was the radio station let me start doing some work for them in their news operation. It was a small radio station, and they had two people who were their basic full-time news staff, but they needed somebody to go out and cover stories at night and go to town board meetings. So, that’s how I really got into journalism.

I graduated from college and got my first job as a police reporter and obit writer at the Poughkeepsie Journal in upstate New York, which is a Gannett paper, it’s still there. That was in 1986. It wasn’t so long after that, that I went out and bought my first personal computer. That was in 1987 or 1988, it was an Epson. It came with no hard drive, but it did come with two five-and-a-quarter-inch floppy disk drives. So any program that you wanted to load, you had to insert the disc for that particular program in the drive, and let it load that way. This fascinated me, because I had taken some computer programming courses in college, and I had a bit of an aptitude for it. I really liked typing in codes, and seeing what the computer would do. As soon as I had that PC, I really started to devour the manual that came with it. The operating system was DOS. I started going through the manual, and going through all the different commands. I reformatted floppy disks for fun.

Then, I went out and bought a hard drive, and put that into the system. I started learning how to use the little spreadsheet program that came with it to put numbers in and add them together. And I just was like, “Wow, this is really cool. You can do a lot with computers.” As my career was developing as a reporter, and as an editor, in the newsroom, I naturally started to gravitate toward any kind of technology that the newsroom was implementing. I found myself on committees to put in new desktop publishing systems.

There was a whole job function that developed in journalism in the 1990s. A lot of newspapers had something called a systems editor. They don’t really have them anymore. But the systems editor was the person who helped the newsroom figure out what to do with emerging technologies. You have to understand that newspapers were going through a transition from black and white to color printing. USA Today had come along with color, but a lot of newspapers didn’t follow suit for many, many, many years. They needed printing press upgrades. The Poughkeepsie Journal didn’t go to color printing until some time in the mid-1990s. At the time, newspapers were also moving from the mainframe, dumb terminal model of systems to the desktop publishing system. I wound up being the guy who set up our Macintosh network, and programmed all of the templates for our page design and all that. I was really, really interested in anything technological that the newspaper would let me get my hands on.

In the meantime, I was the life editor for a while, I was on the copy desk for a number of years, I was doing all different kinds of jobs.As I started to get more interested in technology, I decided, “You know, this actually could be a career path for me, and I’d like to get some formal training in it, so I went back to college part-time. I said, “You know what, I could get a masters degree in computer science, and that would probably be a good career move.” I went back to Marist College on a part-time basis. It took me about seven years altogether, but I got a masters degree in computer science. In the course of that, I learned a lot about programming, a lot about databases and systems design, and all of that. It was during that time that I started to get really involved in doing demographics research for our newspaper.

There were some coverage things we were doing, we wanted to expand into different markets, and the publisher and the editor came to me and said, “Hey, could you do a market study for us, and could you figure out where the population is growing, and where it’s not? What might be some opportunities?” So, I started to get familiar with all the data that was out there in terms of the Census and economic data. I started to do some profiling of our area, and that very naturally led to me taking over one of the only personal computers that were in the newsroom at that point, and turning it into a real basic computer-assisted reporting work station. I stuck a copy of Paradox on there, which is the database program that a lot of us used before we moved into Access and SQL Server. Paradox was very cludgy, it wasn’t the type of thing you wanted to let your life revolve around for very long. A lot of things came together between my natural training as a journalist, and my interest in technology, and a lot of different opportunities that kept coming my way.

It wasn’t too long after doing all of that, that a reporter in the newsroom came to me and said, “Hey, it would be really good if we could figure out what the most valuable properties are in the city of Poughkeepsie. And I thought to myself, “You know, this might be a good opportunity for me to go and make friends with the IT guy over in City Hall.” I went over and visited him, he was down in the basement of City Hall, in the computer room. Back in those days, they all had big mainframe computers in an air-conditioned room. Actually, what I first did was I went to the tax assessor’s office, and I said, “I want a list of all the properties in the city of Poughkeepsie, and how much they’ve been assessed for.” And they pointed me over to the corner, where there were these big books, filled with computer printout, and they said, “Well, all the numbers are there, and you can just start copying them down.” And I thought to myself, “If they were printed on this piece of paper that looks like computer paper, then certainly they are in a computer somewhere in this building. And I can get that data on a disk that I can bring over and put into my computer.” And that’s how I really started figuring out that we can do computer-assitsted reporting by going to the government and getting data.

That’s what I did. I went to visit that guy in City Hall, and I said, “Look, I know you’ve got a file on your computer. I’d love to have you put it on this floppy disk for me.” And he had to check with the local attorneys, and get their permission, and I called up a sunshine advocate in New York state and got him to weigh in, and they agreed that, “Yeah, the law says we can do this.” The next thing I know, I had that data on the computer, and was going through it in Paradox. We wound up writing a couple of stories about different properties.

I had the opportunity to come down to USA Today in 1997, as part of a loanership program that Gannett used to have, they would bring reporters and editors from the various Gannett newspapers into Washington D.C., into USA Today for four months. They’d get the experience of working for a larger newspaper and then go back home. So I came, and never left. I got involved with what was then called the enterprise department. There were three other database editors, one was Paul Overberg, he’s our Census guru, and there were two other people. They hired me, and that’s where I’ve been ever since. The loanership program started pretty early on, around the inception of USA Today in 1982. They ended it years ago, probably around 1999, 2000 or 2001.

After you arrived at USA Today, how did you get to be the head database editor? How has the structure of CAR changed?

When I started, we were all in one section, and we were called enterprise, and we were basically a special projects team. There was myself, Paul Overberg, Barbie Hansen, who’s still at USA Today, and another woman. The four of us basically worked on the projects as they arose, but within about a year, the editor of USA Today at the time decided to end that department, and sent each of us to one of the four sections of USA Today. Because I had a long history with covering entertainment and features and education, I volunteered to get attached to the Life section. In about 1998, all of us left that projects department and went into the various sections. Paul went to News, Barbie Hansen went to Money where she remains to this day, and the other person went to Sports. They left that position eventually.

I spent from 1998 until 2008, a good ten years, focused on developing computer-assisted reporting around the topics of health, education, religion and all things entertainment. By that I mean, movies, music, books, television, Broadway, and really,nobody at that time, and nobody since, has really done a lot of CAR around entertainment. I think there’s been a few things, but nobody on a consistent basis. I think USA Today’s been somewhat unique in that. Just because we’ve had four database editors, now we have five, we’ve been able to have people focus on those different topical areas.

How do projects get started under the CAR team’s current structure?

It’s a mix of things. Sometimes, just the news demands that we do something. The Census is a really good example of that. The Census Bureau releases data on a very regular basis, and that’s something that we pay very close attention to. We always know what’s coming, and we always are planning well ahead to make sure that we do things around that topic. That’s an example of an event dictating that we do something. Another thing would be the NFL draft, if we’re going to do some analysis around what players have been selected and where they’ve come from, or do an interactive around that, like we did last year. There’s that driver, news and events, and then there’s the driver of things that reporters bring along, or editors bring along, but I would say even in those cases, it’s still very much driven by topics that are in the news, or things that have risen to prominence on the radar screen or the surveillance that reporters and editors are always doing off of their beats.

What enables USA Today to maintain a large-enough data staff to practice CAR across multiple sections?

The decision was made a number of years ago that we would really spread our focus around the entire news operation when it came to developing CAR. I think we’ve known intrinsically that there really aren’t too many beats that journalists cover where CAR cannot play a role. Never mind that many, many journalists cover things where the federal government or a state or local agency would play some kind of role where data gets generated by those agencies and would be of interest for analysis. Even in cases where that’s not the case, there are very often sources of data or the ability to create your own data sets, off of just about any beat that there is. I think that’s something we’ve recognized, for many years, and we’ve really tried to do our best to dig in places that sometimes journalists don’t always dig in. And I think we have to recognize that when you’re talking about a news organization that has one CAR person, then very typically they’re going to focus on things that are generally what you would find a Metro desk or a news desk focusing on. Those tend to be the things that journalists want to hit first: education, budgets, spending, and that sort of thing. But we’ve had the relative luxury of having more than one CAR person or database editor, so we made the decision early on not to focus all of our people on those traditional news section topics, but spread out into the other areas as well.

Do you ever do breaking news, as opposed to the long-term projects?

Certainly, when the Census Bureau dumps a new round of the American Community Survey on the world, that’s a breaking dump of data. And we will take that and turn around stories pretty quick. Up until recently, the Census Bureau was releasing data under embargo, so they would give us two or three days. They’d put the data out, and journalists would be able to get it and have two or three days to analyze it and write some stories, and do some maps, or some interactives. So, we would do it that quick.

But, just in the last while, the Census Bureau has made the decision to stop that whole embargo process, and just release the data. Not only are they doing that, but they’ve decided to release the data at midnight. A couple of weeks ago, they were releasing some new data, and Paul Overberg was at his computer at midnight waiting for it to happen, and I think they had a story up on our Web site within an hour or two. I don’t know if they got it into print that day, or followed up the next day. The NFL draft last year, we expanded an interactive that we had and we built some backend database programming that allowed our NFL desk to enter into a Web form every time another player was drafted, they entered the player’s name, and what school they came from, and what position they played, and what team picked them. It automatically updated our interactive But much more of our effort is on non-breaking news, on projects, or longer-term analysis.

This week, Jack Gillum did a story with one of our education reporters, Greg Toppo, where they got some data on Advanced Placement test taking, and they were able to do some analysis and show that the number of people who were taking AP tests has increased, but even as that number has increased, the percentage of test takers who get a failing grade has also increased. They worked for a couple weeks on obtaining that data, cleaning it up, and running statistical tests on it, doing reporting. Sometimes those are things you have a really hard time doing on deadline. It takes a couple days, or sometimes a couple weeks, to really gather the data and do a good analysis on it.

What do you think of the posting of uncontextualized data centers?

We CAR people, those of us who have been a part of IRE, and going to NICAR conferences for years, we’ve been talking about data for – for me, 15 years, for other people, like the Steve Doigs of the world, probably for 10 years or who knows how many more than that. CAR people have been talking about data for a very long time, and we’ve always talked about data along with the context of the data in the stories that come out of the data. We’ve never really divorced the notion of context from data, because to us, data was never just a means to an end in itself, it was always the driver of a story.

I’ve been to many CAR conferences where people said something along the lines of, “Data analysis doesn’t always give you the story, it gives you the questions to ask that lead to the story. Or it gives you the specifics that help you tell the story.” But I think what happened, and I wouldn’t want to point the finger at Gannett, because I think it’s happened all across journalism in general, is that all of a sudden a lot of non-CAR people discovered data and databases, and said, “Hey, this is great, let’s put these up on our Web site.” In doing so, I think we run the risk sometimes of presenting data without that context, and without the story that goes along with it. It’s one thing, for example, to post the salary of every teacher in the county that you live in. Certainly, there’s the nosy factor, where somebody wants to go and look and see what Mrs. Magillacutty makes as principal of that high school. But I would much rather see five or six or ten years worth of that salary data analyzed, and then presented in a way that tells me what the real story is behind that data as a whole. What are the salaries of first-year teachers, and how have they changed over time? How has the recession impacted what teachers are getting paid? What’s happening to teacher salaries as baby-boomer teachers start to retire? These are all the kinds of questions that I think a smart data analyst is going to ask about that series of numbers, and look upon the mere presentation of them with a bit of a skeptical eye, because it’s very hard to just look at a series of numbers and discern what they mean.

If I were to look up Mrs. Magillacutty’s salary, and find out that she makes $65,000, what does that mean? Is it high? Is it low? Has it changed over time? What year is she in? I think we need to tell the readers more. I’m not saying that there’s no value at all to presenting data on Web sites. I just think we always need to think about, what is the story that we’re telling? There always has to be a story. And that holds true no matter what we’re doing. Whether we’re writing a two-paragraph brief, or analyzing five million records of something, or creating a spectacular interactive out of Django and the Python framework.

What are your thoughts on data-driven applications as being distinct from the data centers?

I think Politifact tells a story. I think that does make it different from just a data center. When I go to Politifact, and I just start poking through the various statements that different people have said, and whether or not they’re “Liar, liar, pants on fire” or whether the truth meter is pegged over into the truth zone, that starts to let me tell a story. See, that’s the great thing about really well-done interactives, is that it lets the reader tell his or her own story. And it lets the reader discover the context. It lets the reader discover more than just a list.

For example, the New York Times did a visualization around Netflix. Now, there’s a certain gee whiz factor to that, because they did a really good job with the mapping, and I think anybody who’s kind of a mapping nerd would look at that with a little bit of “Wow, that’s pretty cool.” But if you get beyond that and you really start to play with it, and I like movies a lot, and I kind of understand because I did CAR related to entertainment for many years, I understand how the movie industry works. Once I got past that gee whiz factor, I started to realize that it actually was telling me some stories.

For example, there were some movies that played very well in urban cores that did not play well in the suburban ring around the core. There were some movies that seemed to do much better in the South, than they did in the North, in the Midwest.. And see, that starts to tell me a story. The neat thing is that I was discovering it as I went. Now, I never did see the story that they did around that, but if I were going to jump off and write a story based on the data, boy, that gives me some great thing to explore. I think there’s great potential and great promise in really, really good data-driven interactives.

We’ve done several at USA Today that I think allow the reader to tell a story. Paul Overberg keeps a database of every solider who’s died in the Iraq and Afghanistan wars, and we have an interactive where you can see pictorially represented, each one of those deaths. Then you can go through, and you can filter and select out people by various demographic characteristics. So if I just wanted to find female soldiers who had been killed, I can do that. And the way the interactive is set up, I get a real visual sense right away of what proportion of soldier deaths are women. And I could do that for anything. White vs. Hispanic vs. black. Or age groups. I can quickly look and find everybody over the age of 50.

Do you see a role for CAR specialists in the future?

There’s certainly a lot of interest right now in people learning how to program in frameworks, whether we’re talking about Django or Ruby on Rails, or even like us, we do a lot with ASP.net. There’s no doubt that we really need to develop that area of expertise in newsrooms to present data interactively in ways that tell stories. But I don’t think we’re going to lose the need for people that can mine data to find the trends, whether or not we’re going to present those trends visually or use them very simply to guide the reporting of a topic. I think we’re going to need both. But I think it’s very hard for one person to specialize very well in everything.

I think it’s very hard for people to be an expert in mapping and an expert in frameworks and an expert in statistics and an expert in data parsing and an expert in writing – there’s a lot there. Right now, I think we have a lot of people who are doing their very best to try to cover all those bases, but people being people, we all gravitate toward different parts of that spectrum, because some of us are more interested in coding, and some of us are more interested in getting public documents and parsing them. The ideal situation is to build teams, where you have people who can specialize in different components and then combine them into the power of a team. For smaller news organizations, there’s a real need for the news organizations to identify and train people to get those skills and to be able to be a part of those teams. The days of having the CAR person in the newsroom, and expecting the CAR person to do all this work, they’re gone. Ten years ago, the amount of data that we had access to and that we were expected to deal with, that was like the drip of a leaky faucet compared to the firehose that’s out there now. We’ve got the Obama administration’s Open Government Directive and every government agency directed to put three high-value data sets up online. We desperately need for more people in newsrooms to get in on this.

Do you see the explosion of data in the modern era as a good thing?

Definitely. I’ve always told people to treat data just like you would treat a source, a human being source. Especially when I teach basic CAR to students, I always tell them when you get a data set, you have to interview that data in the same way you would interview a source. You have to find out if it’s credible, you have to find out what its breadth of knowledge and limitations are, you have to find out what its characteristics and its behavior is. As reporters and editors, we all know that there are some people out there that you can call up, and they’re like an encyclopedia about a topic, and there’s some people you call up and they really don’t have much to offer you. Well, it’s the same way with data. There are some data sets out there that are rich and deep and valuable, and you’re going to go to them again and again, and there are some where you’re like, “Meh, doesn’t really do much for me.” But the only way to discover that is to really interview them and find out what they’ve got, and what they’re really telling you.

« « Data Delver: Jennifer LaFleur, ProPublica

Data Delver: Matt Wynn, Arizona Republic » »