Using Google spreadsheets as your database
UPDATE 2/7/12: The whole, copyable script has been added to the bottom of this post, at the request of a few folks. Hopefully that’s helpful.
The database is a key part of data journalism. We spend a good amount of time making structured of the unstructured, and then we need to shove it somewhere so a web app can serve up relevant pieces, perform calculations to help us understand nuance. Straight text files just have their limits. But does that mean you can’t build a data app that dynamically updates if you don’t know the difference between MySQL and PostgreSQL and the newest Harry Potter movie? — Oh wait, that last one’s a sequel, not SQL! *Wow, that was a bad pun. Sorry. It’s a sickness.* No, no. no. You can use Google Spreadsheets, a fantastic tool already being used in many newsrooms to foster collaboration. I wouldn’t trust it with my multi-year investigation, but it can serve your data into an HTML table on your website, and will adjust as you (or other approved contributors) add to the spreadsheet.
Would you like an example? While I was at my LA Times internship, there was this thing called the Bell scandal. If you read this blog, you’re probably plugged in enough to have heard of it, but for the uninitiated, in the city of Bell, an LA Times report on the high salaries of the city manager and other top administrators generated…interest. Turns out Bell wasn’t the only city where this was an issue. So, people started asking, “How much does my city manager make?” Sounds like a job for the data folks!
We came up with this. I can’t take much of the credit here, reporting interns and staffers did the legwork to get the data, and Web producer Anthony Pesce did much of the design and did the front-end implementation, but through Tribune’s proprietary content management system, as opposed to our “guerilla warfare” separate server approach that houses the Data Desk projects.
The part for us Data Desk devs was how to pull this collaborative spreadsheet into the HTML template housed in the CMS. I churned out a rough version of the quick backend stuff overnight, and Ken Schwencke and Ben Welsh helped make it run lighter and smoother, and made many improvements I couldn’t have thought of.
Here’s a walkthrough, in case it’s of use to anyone out there, and also because I’d love to hear about related tips, tricks and improvements you have. It’s not any big secret, all available by viewing source on the Bell database web page.
A caveat: This is ABSOLUTELY NOT to say you don’t need to use a database backend anymore. You have more control over a database, it can take bigger chunks of information, do more complicated analysis, not to mention that you know exactly when it’s going public, and what going public means. (Maybe I’m just a control freak.) But if your data set is small, and you don’t have a full development server, that shouldn’t be a complete road block. There are cool, and better yet, useful, things you can do.
This quick project will depend on Javascript, a key way to get interactivity into your website without resorting to technologies that are less compatible across platforms because of their newness (HTML5) or their proprietary platforms (Flash). I’m actually spending more time on this language than Python/Django at the moment, for PBS-related reasons.
Here we go!
1. First, create your Google Spreadsheet, add in all your data, and then hit Publish in the publication settings. At this point, it will be open to the public. This is necessary because the computer needs to be able to “see” your spreadsheet, which it can’t if the spreadsheet is privacy protected.
2. In your blank doc, create an area to type Javascript. Tell the computer what to expect by typing:
<script type="text/javascript">// <![CDATA[ <pre lang="JAVASCRIPT">var pre_html = ' <table id = "salaries"> <thead id="anthonyHeader"> <tr> <th>City</th> <th>Name</th> \ <th>2009 taxablecompensation*</th> \ <th>Note</th> <th>Pending request</th> <th>Population</th> <th>Median income</th> <th>Contract</th> </tr> </thead> <tbody>'; |
4. Create an empty Javascript variable which I called actual_html. This is the html that correspodns to each record in the spreadsheet, that is, each row. Leave it as an empty string (two quotes next to each other, indicating no text) for now, we’ll get back to it.
var actual_html=''; |
5. Create a variable to hold the end of the table which will run after all records are completed. Close that table up.
var post_html = '</tbody> </table> ' |
6. Calculate the length of filled-out information in the spreadsheet. Json refers back to the spreadsheet, feed refers to how we’re pulling the spreadsheet, entry tells the computer we’re looking at each row as a separate item or entry, and length…well…you get that one.
Json.var len = json.feed.entry.length |
7. Loop through the spreadsheet as many times as there are records. After each loop, increase the counter variable by one, so the count moves forward.
for (var i=0; i<len; i++) { |
8. First item of business, try to grab something from the rollover description field on the given row. These are notes that show up if LAT reporters had additional details to add for context.
var d = json.feed.entry[i].gsx$rolloverdescription.$t; |
9. Check if d exists, in this case, to see if there is text for a rollover or not. Depending on whether it exists, we then generate a row with a whole block of HTML. This isn’t a super efficient way to do it, but got the job done on a deadline. You can go look it all up in our source (just go to the link, view source, and search for ‘function displayContent(json)’ and you’ll get to the right place), but here’s a general structure, built from an excerpt of the code.
actual_html += [' <tr> <td>', \ '<a target="_blank" title="Mapping L.A." href="',\ json.feed.entry[i].gsx$mappinglink.$t, '">', json.feed.entry[i].gsx$city.$t, '</a>', '</td> </tr> '].join(''); |
I cut some lines out, but that starts the table row, calls in a url, generates a tooltip for it, brings in the attribute from the spreadsheet, close the row, all between the []. Then, there’s a .join(“), which means combine everything into one large block of text, and separate them with nothing. It makes more sense if you think of joining ten items as part of a list where you might want to combine into one string of text, but shove commas in between. One more note: The actual_html += means we append, or add, each version of this variable to the last, so as we loop through the spreadsheet, we’re recording the code for all the rows. Otherwise, you would only see the last row.
document.getElementById('salary_table_container').innerHTML = pre_html + actual_html + post_html |
// ]]></script> |
Here’s a copyable version of the complete script:
<html> <head> <title>Completed sample for turning a Google Spreadsheet into an HTML Table</title> <!-- Add a div where the table will go --> <script type='text/javascript'> //Write a function that allows us to add commas to raw numbers in spreadsheet for display. 2413 becomes 2,413. function addCommas(nStr) { nStr += ''; x = nStr.split('.'); x1 = x[0]; x2 = x.length > 1 ? '.' + x[1] : ''; var rgx = /(\d+)(\d{3})/; while (rgx.test(x1)) { x1 = x1.replace(rgx, '$1' + ',' + '$2'); } return x1 + x2; } //this will be executed after we "fetch" the contents of a spreadsheet. The json parameter will hold our spreadsheet's data function displayContent(json) { //start an html table and write out our headers. Using <td> tags so it isn't bold and centered, which is the <th> default. var pre_html = '<table id = "salaries"><tr><td>City</td><td>Name</td>\ <td>2009 taxable<br />compensation*</td>\ <td>Population</td><td>Median income</td></thead><tbody>'; //Create an empty string to hold the HTML. We will put table data here. var actual_html=''; //After we grab the table, close the HTML table. var post_html = '</tbody></table>' //figure out how many rows our spreadsheet has var len = json.feed.entry.length //loop through the spreadsheet, gathering data for (var i=0; i<len; i++) { //for each row, add the following to actual HTML, grabbing it as a list, and then joining the list together as one long string. //Uses HTML for table cells, and then grabs attributes from the spreadsheet, using appropriate syntax. Enter your table header in the Google spreadsheet between //the gsx$ and .$t. actual_html += [ '<tr><td>', '<a target="_blank" title="Mapping L.A." href="', json.feed.entry[i].gsx$mappinglink.$t, '">', json.feed.entry[i].gsx$city.$t, '</a>', '</td><td>', '<a target="_blank" title="City website" href="', json.feed.entry[i].gsx$personwebsite.$t, '">', json.feed.entry[i].gsx$name.$t, '</a>', '<br /><span>', json.feed.entry[i].gsx$title.$t, '</span>', '<a target="_blank" href="', json.feed.entry[i].gsx$email.$t, '"></a>', '</td><td>', addCommas(json.feed.entry[i].gsx$compensation.$t), '</td>', '<td>', addCommas(json.feed.entry[i].gsx$population.$t), '</td>', '<td>', addCommas(json.feed.entry[i].gsx$medianincome.$t), '</td>','</tr>' ].join(''); } //put all three of our HTML strings into our div we made at the top of the page document.getElementById('salary_table_container').innerHTML = pre_html + actual_html + post_html } </script> </head> <body> <div id="salary_table_container"></div> <!-- Grab the data from our google spreadsheet, using the callback to call the function we just created --> <script src="http://spreadsheets.google.com/feeds/list/0AlO_LanGCFltdGlsa2hSNHhOdHlCYmpFeXdlY0ZNS2c/od6/public/values?alt=json-in-script&callback=displayContent" type="text/javascript"></script> </body> </html> |
Related posts you might enjoy:
-
http://apps.chicagotribune.com
-
-
-
-
-
Pingback: Michelle Minkoff » Learning how to…sort a JavaScript/JSON object with Underscore
-
http://www.facebook.com/people/Ben-Marwick/520471502
Mark Ng Reply:
November 23rd, 2010 at 9:01 am
You can also combine with the google charts API to create charts that go with the data, too. I knocked together a proof of concept a while back : ( http://hack.markng.co.uk/broadband/ )
[Reply]