This post will be one of the first of many in a series of posts covering things I wish I had known before coming to graduate school. This one is about sources of data in research. The next will look at some types of data and skills necessary to manipulate (in a neutral way) and analyze that data.
The best primer I have found so far on finding data comes from Nathan over at FlowingData. I would highly recommend checking out the various sources he has pointed out and would offer a few suggestions specific to areas of research I am interested in.
For education data there are lots of good choices, but I would suggest starting at the National Center for Education Statistics . While this data isn’t always easy to get into a usable form, it is official data and fairly comprehensive. Especially good for looking at achievement scores, demographics, and other characteristics of school districts over time.
Another interesting source of data is the US Census of Governments if you are looking for information on various government entities in the US. This provides a wealth of information about all of the various governments that exist within the federal structure of the US.
For a great source on federal expenditures I recommend the FAADS dataset from the US Census Bureau.
Finally, I have found a couple of data mining initiatives put forth on the web by different organizations. The first is the Sunlight Labs and the second is DataMasher . Of the two DataMasher is the most interesting because it has a geographic mapping component, but only at the state level. It might be a really useful way to make sense of state level data and generate interesting graphics, but for serious academic analysis it might not be all that useful.