This piece was originally published on Thomas Gomes’ Medium.
Do you often get tired of grabbing data directly off of census.data.gov? Or has the Census API been throwing errors in your code? Maybe you are just like the rest of us, wanting to streamline your workflows as much as possible.
Well, Dr. Kyle Walker had all of us Census Data users in mind when developing TidyCensus, an R package that makes obtaining Census data so easy, it is actually unbelievable.
Let’s walk through how to obtain Census data in under five minutes, using TidyCensus.
Before you can begin unlocking the hidden superpowers of TidyCensus, you must first acquire a free API Key from the US Census Bureau. Click here to do so.
You should receive an email within a few minutes that includes your new key. This is crucial because the TidyCensus package is built off of the Census API, meaning that none of the functions in the package will work without one.
Luckily, TidyCensus has a neat little function to quickly install the key onto your computer. But first, we need to get the package installed. Here’s how:
As like any other package in R, the first step to begin using it’s functions is to install it into your IDE of choice (my personal preference is RStudio).
install.packages("tidycensus")
Now that we have the package loaded, we can make use of that nifty function for installing the API Key that I mentioned earlier.
census_api_key("YOUR KEY GOES HERE", install = TRUE)
Notice how there are two pieces to this function: the API Key itself (make sure to put it inside of quotes; and the install argument, which in this case is set to TRUE. You will only need to use this line of code once, which is when you load the key for the first time.
The install = TRUE argument is telling your computer to essentially remember this key and use it every time you make an API call. This eliminates you from having to do anything in this process again on this device, so long as your key remains valid.
There are two core functions that will be the basis of working with TidyCensus:
Both operate very similarly, and utilize the following arguments to execute the proper API requests:
Variable Selection
One of the best parts about this package is how easy it makes it to identify your desired variable names by making use of the load_variables() function.
This eliminates the need to go manual searching for every variable name you want online. With a few simple lines of code, you can have a searchable table full out variable names (along with their more detailed names for reference).
Use the following code to create objects containing the list of variable names from a few different surveys:
# 2020 Decennial Census Variables decennial_2020_vars <- load_variables( year = 2020, "pl", cache = TRUE )# 2010 Decennial Census Variables decennial_2010_vars <- load_variables( year = 2010, "pl", cache = TRUE )# 2016 - 2020 5 Year American Community Survey (ACS) Variables acs_20_vars = load_variables( year = 2020, "acs5", cache = TRUE )
You can now access these tables and use the search function in RStudio to quickly identify the variable names you want.
Once you have a list of variables together, you can save them all as a list and pass that through TidyCensus to retrieve their corresponding values in a tidy data frame (and even rename them in the process). Let’s take a look:
desired_vars = c(
all = "P2_001N",
hisp = "P2_002N",
white = "P2_005N",
baa = "P2_006N",
amin = "P2_007N",
asian = "P2_008N",
nhopi = "P2_009N",
other = "P2_010N",
multi = "P2_011N"
)
Passing them through the get_decennial() function:
census_data = get_decennial(
geography = "county",
state = "NC",
variables = vars_reth, <---- here is where I am using the list
summary_var = "P2_001N", <--- creates a column w/'total' variable
year = 2020,
sumfile = "pl"
)
The above code would return a data table containing all of the variables as defined by the list object “deisred_vars.” In addition to that, there will be a new column created by the “summary_var” argument. This data point represents a summary variable, or the total number of all sub-variables combined.
In other words, if you total up all of the race and ethnicity subsectors, that would equal the summary variable for Race & Ethnicity data.
(This comes in handy when wanting to show composition by allowing you to quickly roll up percentages)
When searching for ACS data, there is another neat trick up TidyCensus’ sleeve — the table argument.
Using “table = ‘enter table name here’,” one can easily acquire an entire table from the ACS, rather than typing out a list of variable names one by one:
# Income Data by County for North Carolina nc_county_income = get_acs( geography = "county", state = "NC", table = "B19001")## Note that leaving the 'year' argument blank tells the API to return the most recent year available. As of writing this, that is 2020 for both the ACS and Decennial Census.
Now that we have covered the basics of TidyCensus, let’s gather some data with it. Here is an example, from start to finish, of how to gather race and ethnicity data for every county in New York State:
And there you have it, folks! Census data easily acquired in less than 5 minutes with TidyCensus.
Here is a link to the GitHub repository containing all of the code from this post.
Need help understanding population change and its impacts on your community or business? Carolina Demography offers demographic research tailored to your needs.
Contact us today for a free initial consultation.
Contact UsCategories: Story Recipe
The Center for Women’s Health Research (CWHR) at the University of North Carolina School of Medicine released the 12th edition of our North Carolina Women’s Health Report Card on May 9, 2022. This document is a progress report on the…
Dr. Krista Perreira is a health economist who studies disparities in health, education, and economic well-being. In collaboration with the Urban Institute, she recently co-led a study funded by the Kate B. Reynolds Foundation to study barriers to access to…
Our material helped the NC Local News Lab Fund better understand and then prioritize their funding to better serve existing and future grant recipients in North Carolina. The North Carolina Local News Lab Fund was established in 2017 to strengthen…
Your support is critical to our mission of measuring, understanding, and predicting population change and its impact. Donate to Carolina Demography today.