APPROACH

  • Look at outliers and overall trends in three crops, compare across time to 2007 for one crop when data is available
  • Take geographic environment into consideration when seeing how states’ output of various crops is doing
  • Note ratio of planting/harvesting, that is, how much usable crop is being output from farmland itself vs. how many acres of crops the state started with.  This shows how much is being wasted, and wasted crops could point to dollars in the agriculture industry being poorly spent – story idea!
  • Measure states by how much they produce at the end, after all, that’s how they make their money.  It would be great if we could compare final output to what was initially planted, but those are different units.  So, I think it’s worthwhile to look at these two values separately: how much usable product is there, and how much of the original plants are being wasted?
  • I opted not to examine this information by county, as it was just too much detail.  But I suppose once I found interesting states to focus in on, then we could look at the counties to get a better sense of whether the change/anomaly occurred in a specific region of the state.  If it was a specific region, you could localize the story, talk to farmers in the area, etc.

OVERALL QUESTIONS

  • What states are producing the most of each crop?
  • For peanuts (because I have data for two years), how has production changed between the two years? Any anomalies?
  • How does production of oat and wheat differ? They are both grains, and require similar growing environments, so logic dictates that these numbers should be close.
  • How much crop isn’t being directly used (compare planted vs. harvested)?  If it’s a lot (and in some states such as Calif, it is, then an avenue for more reporting might be to investigate if any taxpayer dollars are being spent on agriculture, or even if that waste is being passed on to the retail price of the crops that people are paying in the stores.

WHAT’S INTERESTING ABOUT THE RESULTS

  • Indiana’s loss of oat crop from planting to harvesting really fascinates me, definitely not what I expected.
  • The differences between states that have a high production of oats and wheat.  I’d love to learn more about the growing process and requirements, and see if it really is that different.  I would have thought that states producing a lot of one would be producing a lot of the other.
  • More states are having problems with oats than the other crops I looked at. That is, the states losing the largest percentage from planting to harvest are losing in the 80th and 90th percentiles, which is not echoed in the other crops I looked at.  What is so difficult about harvesting oats?  Has the problem gotten worse over time?  How is affecting the food industry and the nation’s oat supply?
  • How did Georgia double its production of peanuts in a year?  I’m very curious about this.
  • Overall trend of a good amount of crops being lost from planted to harvest – it looked like states were doing well when losing more than 6 percent.  That’s interesting from a reporting standpoint, what can be done to reduce the amount that are never harvested?  How does global warming, and the changes of climates, affect agriculture?  There are obviously a lot of variables, but that’s part of why looking at average temps and rainfall would add a whole additional dimension to this data.

DATA WEAKNESSES

  • Would have liked to see unique ids associated with each record in the data set.  This did not exist, so I was unable to join tables across years, where I had multiple sets.  This would have been helpful where there were crops with two years, and I had to do my comparison across these tables by eyeballing.  Definitely less than ideal.  (NOTE 1/2010: I now know that I could have used multiple fields to perform a join, a concept we had not thoroughly covered at this point.)
  • Would have been helpful to include square footage of states and counties in which the various crops were planted.
  • Lack of years to compare with for much of the crops listed, outside of peanuts and a few others.  This would have been quite helpful in giving the numbers context.

ADDITIONAL INFORMATION NEEDED

  • Square footage of various counties and states listed.  Would have been interesting to examine the amount of crops yielded in comparison to size
  • Some data about weather would have been helpful.  If a climate was unusually or unseasonably warm or cool, that could have affected the crops, and it would be nice to have a better understanding of that
  • We’re missing some states, and the states missing vary from crop to crop.   I would love to know why certain ones aren’t reported, and it would be helpful to get that info.  Without it, it’s not really possible to say a state had the highest production of oats in 2008, let’s say, because one of the missing states might have had a higher one.

LOG

  • Downloaded zip file of CSVs, realized there are 30 small files
  • Thought about combining all of them into one and sorting through it on SQL
  • Decided it would be better to look at them separately, so it would be more manageable
  • Put 30 different tables into a SQL database
  • Thought about joining tables within SQL
  • But before I wrote the query, realized certain tables are missing various columns (in most cases, stats and units on what was planted)
  • Attempted to calculate percent change from planted to harvested, where applicable.  Emailed Derek for help, formula didn’t work.
  • In class
    • Asked about being overwhelmed by all 30 tables, was told to focus on a few specific ones.  This was echoed while working on the KML project the next week, first I tried to map every body of water in the country, then the 10 states surrounding the Mississippi River, finally Louisiana.  Point being, sometimes it’s better to go for quality than quantity, there is such a thing as biting off more than I can chew. Lesson learned.
    • Asked Derek about formula for percent change, we figured it out.
    • Decided to mainly focus on a few tables, very specifically and in-depth (WILL FOCUS ON oats08, allwheat08 and peanuts07 and peanuts08) These were chosen as oats and wheat are multipurpose grains that I think would be interesting to more people, and could be valuable to compare against each other.  Then I chose peanuts because it’s one of the few commodities that had tables for 07 and 08, and I’d like to do some comparisons across time
  • Second week
  • Read through frequently asked questions
  • Sought to understand what various column names mean
    • What’s STFIPS and COFIPS? – State and county versions of a federal information processing standard, used to “ensure uniform identification of geographic entities”
    • PracCode – Cropping Practice Code
    • CommCode – table for decoding these exists on the site, but since my tables are already divided into commodities by table, and there is another actual field listing commodity name, this doesn’t really concern me
    • Planted, Harvested, Yield, Production – what’s the distinction among these?
      • § Answer: “Planted acreage: Acreage planted for all purposes includes: (a) acreage planted that has been or will be harvested; (b) acreage planted and replanted to the same crop (only the first planting is included); ©) acreage planted and later plowed down, grazed, or abandoned; (d) volunteer acreage, only if the acres will be harvested; and (e) acreage planted on land enrolled in Government diversion programs.
      • § Harvested acreage: Acreage harvested includes: (a) all acres already harvested or intended for harvest and (b) the same crop acres (such as hay) harvested two or more times for the same utilization. Acres with multiple harvests from the same planting are included only once.
      • § Net Harvested Yield: The portion of total crop production removed from the field, expressed as a quantity per unit of area, and derived by deducting harvesting and other losses from the biological yield.
      • § Production: The total quantity of an agricultural commodity recovered or removed from the field. In other words, net harvested production computed as harvested acres times net harvested yield.”
      • § What I probably care the most about is planted and production, not harvested as I initially imagined.  Production, in the end, is the total amount of product acquired from the field, makes sense. (http://www.nass.usda.gov/Education_and_Outreach/Understanding_Statistics/yldfrcst2006.pdf)
      • Realized I forgot to set data types on all 30 tables – duh!  Decided to only fix this on the four tables I’m working with, and only as needed when doing calculations.
  • So, looking at peanuts first?  In ’07, what counties/ had the most peanut production?
  • SELECT * FROM Peanuts07 ORDER BY Production DESC
    • Texas, North Carolina and South Carolina all have several up pretty high
  • SELECT * FROM Peanuts07 WHERE County=’State Total’ ORDER BY Production DESC
    • Peanuts07 – only 10 states are reported here – where are the rest of them?
  • We know Georgia is famous for being the peanut state, but I think it’s interesting that Georgia has almost twice as many as Texas, which has the second highest production.  At the same time, Texas is substantially larger than Georgia. Interesting, huh?
  • SELECT * FROM Peanuts07 WHERE County=’State Total’ ORDER BY Production DESC
  • Shows that Georgia is far above Texas in ’08 as well.  But I noticed the number here  for Georgia is 1,638,000,000 in 2007 and 2,329,000,000 in 2008 (How did that doubling happen?)
  • Looked at diff between plant and harvest for peanuts in 08 (SELECT (planted-harvested)/planted*100,  * FROM peanuts08 WHERE county=”State Total”)
  • Nothing too illuminating, no state had more than 6 percent of peanuts they couldn’t harvest.
  • Running the same query with 2007 yielded similar numbers.  Only intriguing exception: Florida went from 8.46 percent to 6.67 percent in 2008.  What did they do to cut down their losses?  Or was there something to do with the weather that gave them more problems in 2007?  Mapping any of this data onto weather patterns would be interesting.
  • Next, I decided to look at % change between planting and harvesting.  Even though we figured this out in class, I then lost the query (brilliant, Michelle, just brilliant).  So, I set to rediscover it, got stuck getting it to display decimals, changed it to FLOAT, and all was well. Got the idea from an Internet tutorial, but I now remember float from when we learned al little of SQL during our Objective C course in the spring.  Anyway…the final statement was:
  • SELECT (planted-harvested)/planted*100 AS percent,  * FROM AllWheat08 WHERE county=”State Total” ORDER BY percent DESC
  • Looking at wheat, and percentage changed in 2008 between planting and harvesting, New Mexico lost 67 percent of what it planted.  That’s 20 percent more than Nevada, which lost 47 percent.  Why is New Mexico losing so much? Was that true in the past? (I don’t have data going back any further for wheat?)  Why are big states losing more (Nevada, Texas and California are all losing in the 30s or 40s?)
  • Continuing to look at wheat (SELECT * FROM AllWheat08 WHERE County=’State Total’ ORDER BY Production DESC), Kansas, the Dakotas, and North Carolina are producing the most, no big surprises here., it’s agriculture in the heartland, and they are growing a lot of wheat, just what I expected.
  • And following the same procedure for Oats, (SELECT (planted-harvested)/planted*100 AS percent,  * FROM Oats08 WHERE county=”State Total” ORDER BY percent DESC)
  • we find California, Utah, Colorado, Texas and Oklahoma all 80 percent or more of what they planted.  Again, what’s going on with California and Texas, and why they are losing so much?  90 percent of a crop in a state as large as Calif. would seem to be a lot.  According to this intensity map from Google Fusion tables (http://tables.googlelabs.com/DataSource?dsrcid=89333/89333) it becomes more obvious that higher percentages of losing crops from planting to harvest are in the Southwest.  (NOTE: You have to set the map to United States, or the state list confuses it, and all it recognizes is Georgia.  Wrong Georgia, computer!)  Maybe oats don’t do well in a really hot climate?  But the map shows me an even more intriguing question: What’s up with Indiana?  The rest of the Midwest states have a relatively low intensity, but Indiana’s is up higher.  It’s hard to believe it has to do with climates when nearby states are having much less problems.  A great example of why it’s important to look at this stuff with a map.  I would have done many more maps and graphs, but the time….
  • Now, looking at the ranking of who produces the most — SELECT * FROM Oats08 WHERE County=’State Total’ ORDER BY Production DESC.  Minnesota and Wisconsin come out on top.  Why are these states in the top for oats, but not wheat?  Do the growing conditions vary that much?  I would think if you are in the right climate for grain, you are in the right climate for grain.  Is it that they aren’t growing that much wheat, since they are putting more effort into oats?   But the Dakotas rank in the top five for oats and wheat.  How are they able to do it, yet others can’t?