By on 5.16.22 in Story Recipe

This piece was originally published on Thomas Gomes’ Medium.

Do you often get tired of grabbing data directly off of census.data.gov? Or has the Census API been throwing errors in your code? Maybe you are just like the rest of us, wanting to streamline your workflows as much as possible.

Well, Dr. Kyle Walker had all of us Census Data users in mind when developing TidyCensus, an R package that makes obtaining Census data so easy, it is actually unbelievable.

Let’s walk through how to obtain Census data in under five minutes, using TidyCensus.

Acquiring an API Key From the US Census Bureau

Before you can begin unlocking the hidden superpowers of TidyCensus, you must first acquire a free API Key from the US Census Bureau. Click here to do so.

You should receive an email within a few minutes that includes your new key. This is crucial because the TidyCensus package is built off of the Census API, meaning that none of the functions in the package will work without one.

Luckily, TidyCensus has a neat little function to quickly install the key onto your computer. But first, we need to get the package installed. Here’s how:

Install TidyCensus

As like any other package in R, the first step to begin using it’s functions is to install it into your IDE of choice (my personal preference is RStudio).

install.packages("tidycensus")

Initializing Your API Key

Now that we have the package loaded, we can make use of that nifty function for installing the API Key that I mentioned earlier.

census_api_key("YOUR KEY GOES HERE", install = TRUE)

Notice how there are two pieces to this function: the API Key itself (make sure to put it inside of quotes; and the install argument, which in this case is set to TRUE. You will only need to use this line of code once, which is when you load the key for the first time.

The install = TRUE argument is telling your computer to essentially remember this key and use it every time you make an API call. This eliminates you from having to do anything in this process again on this device, so long as your key remains valid.

The Core Functions

There are two core functions that will be the basis of working with TidyCensus:

Both operate very similarly, and utilize the following arguments to execute the proper API requests:

  • geography: the geographic level which you would like to your data to be parsed out by. See here for the available geographies for each survey.
  • state: the state in which you are selecting data from. Note: if you set geography = “state” you can leave the state argument out of the call entirely, resulting in data at the state level for the entire US. Similarly, if you do not include either in the call, you will get data at the national level.
  • variables: here is where you enter the variables you would like to select — hold tight for a crash course on how to make easy use of this argument.
  • year: the year of data you would like to obtain. The ACS happens every year, while the Decennial Census happens only once every 10 years.
  • sumfile: unique to the get_decennial() function, this argument tells the API which summary file to ask for.

Variable Selection

One of the best parts about this package is how easy it makes it to identify your desired variable names by making use of the load_variables() function.

This eliminates the need to go manual searching for every variable name you want online. With a few simple lines of code, you can have a searchable table full out variable names (along with their more detailed names for reference).

Use the following code to create objects containing the list of variable names from a few different surveys:

# 2020 Decennial Census Variables
decennial_2020_vars <- load_variables(
                              year = 2020, 
                              "pl", 
                              cache = TRUE
                             )# 2010 Decennial Census Variables
decennial_2010_vars <- load_variables(
                              year = 2010, 
                              "pl", 
                              cache = TRUE
                             )# 2016 - 2020 5 Year American Community Survey (ACS) Variables
acs_20_vars = load_variables(
                           year = 2020, 
                           "acs5",
                           cache = TRUE
                          )

You can now access these tables and use the search function in RStudio to quickly identify the variable names you want.

Once you have a list of variables together, you can save them all as a list and pass that through TidyCensus to retrieve their corresponding values in a tidy data frame (and even rename them in the process). Let’s take a look:

desired_vars = c(
        all = "P2_001N",
        hisp = "P2_002N",
        white = "P2_005N",
        baa = "P2_006N",
        amin = "P2_007N",
        asian = "P2_008N",
        nhopi = "P2_009N",
        other = "P2_010N",
        multi = "P2_011N"
       )

Passing them through the get_decennial() function:

census_data = get_decennial(
  geography = "county",
  state = "NC",
  variables = vars_reth, <---- here is where I am using the list
  summary_var = "P2_001N", <--- creates a column w/'total' variable
  year = 2020,
  sumfile = "pl"
)

The above code would return a data table containing all of the variables as defined by the list object “deisred_vars.” In addition to that, there will be a new column created by the “summary_var” argument. This data point represents a summary variable, or the total number of all sub-variables combined.

In other words, if you total up all of the race and ethnicity subsectors, that would equal the summary variable for Race & Ethnicity data.

(This comes in handy when wanting to show composition by allowing you to quickly roll up percentages)

ACS Tables

When searching for ACS data, there is another neat trick up TidyCensus’ sleeve — the table argument.

Using “table = ‘enter table name here’,” one can easily acquire an entire table from the ACS, rather than typing out a list of variable names one by one:

# Income Data by County for North Carolina
nc_county_income = get_acs(
                      geography = "county",
                      state = "NC",
                      table = "B19001")## Note that leaving the 'year' argument blank tells the API to return the most recent year available. As of writing this, that is 2020 for both the ACS and Decennial Census.

Putting it All Together

Now that we have covered the basics of TidyCensus, let’s gather some data with it. Here is an example, from start to finish, of how to gather race and ethnicity data for every county in New York State:

And there you have it, folks! Census data easily acquired in less than 5 minutes with TidyCensus.

Here is a link to the GitHub repository containing all of the code from this post.


Republish our content for free under a Creative Commons license.

Need help understanding population change and its impacts on your community or business? Carolina Demography offers demographic research tailored to your needs.

Contact us today for a free initial consultation.

Contact Us

Categories: Story Recipe

Featured projects


Your support is critical to our mission of measuring, understanding, and predicting population change and its impact. Donate to Carolina Demography today.