The census has a lot of data. Probably the biggest issue that I have with using census data is that there is so darn much of it. Also you can get it in a heck of a lot of ways.

This particular story is going to be about the five year American Community Survey data because that is reported down to the “block group” level and has information about an enormous number of variables. If you don’t know what a block group is, you can think of it as a collection of roughly 1000 people who all live near each other. A block group is a subset of a census tract.

First off, maybe you don’t want to use the API. Maybe your data are so precious that you don’t want to risk having them logged on some server in the Department of Commerce. In that case, you probably want to download a CSV from the Amerian FactFinder Download Center. The Download Center is really nice! And it has a lot of information organized by ZIP Code Tabulation Area (ZCTA), which is the Census’s answer to the ZIP code. If you already have street addresses with ZIP codes in your precious data set, you don’t even need to geocode anything. You will, however, still need to deal with the the fact that there are a heck of a lot of variables and they all have names like HC03_VC54. Definitely get any metadata that the Census offers you alongside your data.

And what if your data are not precious and you want to use the API? Well, R is my tool of choice, so I’m going to tell you about how this works with R, but you can easily adapt this to your tool of choice as well. Long story short: you construct a URL that passes the variables of interest, and the Census will send back the information that you asked for.

Since I’m talking about Census data, I’m thinking about questions of the form, “Tell me about this feature relating to the people in this location.” So I need to specify a location and a feature. The location can be a state, a county, a census tract, a block group, or one of several other less-well-known political boundaries. The feature can be something as straightforward as the total number of people who live in the location or it can be something pretty complicated, like the number of people of a certain combination of race, ethnicity, and age who rely on a specific mode of transportation to commute to work. You’ll find the names of the variables in the first column of this table. For example, B00001_001E tells you the unweighted sample count of the population in the location.

So now that we know how to specify a variable, we need to also know how to specify a location. A block group is formed by combining the two-digit state code, the three-digit county code, the six-digit census tract code, and the one-digit block group code. Where do we get these codes?

We can use the Census Geocoder API!

In my example, I am going to feed in a latitude/longitude pair; the census geocoder can also work with street addresses.

Here’s the sample code, which should be pretty self-explanatory.


latitude <- "43.1010304"
longitude <- "-75.2919624"

geo_url <- str_c("", longitude,"&y=", latitude, "&benchmark=Public_AR_Current&vintage=Current_Current&layer=10&format=json")

geo_info <- fromJSON(geo_url)

block_group <- geo_info[["result"]][["geographies"]][[1]][["BLKGRP"]]
state <- geo_info[["result"]][["geographies"]][[1]][["STATE"]]
county <- geo_info[["result"]][["geographies"]][[1]][["COUNTY"]]
tract <- geo_info[["result"]][["geographies"]][[1]][["TRACT"]]

I plugged the latitude and longitude in for the correct variables in the URL, sent the request, parsed the JSON, and then extracted the information. If you read through the raw response, you can learn that this location is in Oneida County, NY.

Next I can ask the Census for the total number of people who live in this block group. For this I will need an API key (do note that you can geocode without an API key in case you were wondering where to find some free geocoding). It is not hard to request an API key.

Adding to the code from above:

census_api_key <- "your_key_goes_here"

variable <- "B00001_001E"

query_url = str_c("", variable, ",NAME&for=block%20group:", block_group, "&in=state:", state, "%20county:", county, "%20tract:", tract, "&key=", census_api_key)

my_data <- fromJSON(query_url)

And then the my_data variable will hold the result of the call; the important part of the payload is in my_data[2, 1].

There are R packages that will take care of assembling the URLs and extracting the data for you, but so far I have not found any tool that makes it easier to figure out the name of the variables that report the information that I care about. Watch out that you do not get sucked into an afternoon of sharing with everyone within earshot the median ages of people from various income groups who live in your neighborhood and who bicycle to work.