I found that the US Census API is difficult to work with and even LLMs don’t provide working code for it. So I thought it might be helpful to share some techniques that did work. In this post, I’m going to focus on both raw API calls and the Python wrapper.

Table of Contents

API key

import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import requests
from census import Census

You need to get an API key. Fortunately, this part is really easy—all you need to do is sign up on their website.

API_KEY = os.getenv('CENSUS_API_KEY')

There are different ways to access the census data, including through their website, through direct API calls, and through a Python wrapper.

Understanding Tables

Here, we’ll use a different table. You can see the different tables here: https://api.census.gov/data/2021/acs/acs5/groups/

In the census data, you’ll also see IDs like B19013A_001E. Let’s take a moment to understand what it means. It’s in the following format:

[TABLE_ID][SUBGROUP]_[LINE][SUFFIX]

So, in this case:

  • B19013A is the table id. This specific table is titled “Median Household Income in the Past 12 Months (in 2021 Inflation-Adjusted Dollars)”.
  • 001 refers to the line number within that table, corresponding to a specific row (e.g., “Median household income”).
  • E stands for Estimate — as opposed to M, which would be the Margin of Error for that estimate.

One table with a lot of data is S0201. S0201 refers to the Selected Population Profile (SPP) table series. This is used for detailed demographic, social, economic, and housing data by race, Hispanic origin, tribal group, or ancestry.

Direct API Calls

We’re going to look at the American Community Survey (ACS) Select Population Profiles (SPP) data.

You need to provide fields and iteration codes. You can find which population is associated with which iteration code here: https://www2.census.gov/programs-surveys/decennial/datasets/summary-file-2/attachment.pdf

For fields, we’re using S0201_214E. S0201 is the table and _214 is the line number within the S0201 table, which corresponds to a specific data item. This is how we can get median household income.

For example, it tells you that 013 is the iteration code for Asian Indians.

YEAR     = 2022  # latest year that I could find that had everything I was looking for
DATASET  = f"https://api.census.gov/data/{YEAR}/acs/acs1/spp"
FIELDS   = "NAME,S0201_214E"                             # median household income
POP_CODE = "013"                                         # <-- Asian Indian alone
URL      = (f"{DATASET}?get={FIELDS}"
            f"&for=us:1&POPGROUP={POP_CODE}&key={API_KEY}")

resp = requests.get(URL, timeout=30)
rows = resp.json()
print(resp.status_code)
print(resp.text[:500])
200
[["NAME","S0201_214E","POPGROUP","us"],
["United States","152341","013","1"]]
df = pd.DataFrame(rows[1:], columns=rows[0])
df["S0201_214E"] = pd.to_numeric(df["S0201_214E"])
print("Median HH income (Asian-Indian-American, 2022):",
      f"${int(df.at[0,'S0201_214E']):,}")
Median HH income (Asian-Indian-American, 2022): $152,341

Verifying Results

It’s good to have a way to verify the data as well. For example, you can verify some of the results simply by Googling the number and making sure that’s what other people got. By Googling $152,341 you can see other newsites that use the same value and describe it as Indian annual median household income.

Getting More Data

OK, so we can get a single data point from a query, but it’s inefficient to do that for lots of data. Let’s grab data for multiple groups in a single request.

Here we also need a field. We’re going to use S0201_214E. You can see on the SPP variables table that S0201_214E corresponds to “Median household income (dollars)”.

YEAR     = 2022
DATASET  = f"https://api.census.gov/data/{YEAR}/acs/acs1/spp"
FIELDS   = "NAME,S0201_214E,POPGROUP"  # median household income + population group
URL      = (f"{DATASET}?get={FIELDS}"
            f"&for=us:1&key={API_KEY}")
resp = requests.get(URL, timeout=30)
rows = resp.json()

# Convert to pandas DataFrame
income_df = pd.DataFrame(rows[1:], columns=rows[0])
income_df['S0201_214E'] = pd.to_numeric(income_df['S0201_214E'], errors='coerce')
len(income_df)
347
income_df.head()
NAME S0201_214E POPGROUP us
0 United States 74755 001 1
1 United States 79933 002 1
2 United States 78636 003 1
3 United States 51374 004 1
4 United States 52238 005 1

The codes are not that helpful directly and need to be converted using the link above. You can find the full dictionary here: code_to_population.py (Gist). We’ll use curl to download it:

!curl -o census_popgroup_dict.py https://gist.githubusercontent.com/jss367/44e041c913f87a11b2830e01e295c241/raw/c54c8ffaf838791c1a1c42fc03d493bbb3fe3b84/gistfile1.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6932  100  6932    0     0  28768      0 --:--:-- --:--:-- --:--:-- 28883
from census_popgroup_dict import code_to_population
income_df['POPGROUP_DESC'] = income_df['POPGROUP'].map(code_to_population)
income_df.head()
NAME S0201_214E POPGROUP us POPGROUP_DESC
0 United States 74755 001 1 Total Population
1 United States 79933 002 1 White alone
2 United States 78636 003 1 White alone or in combination with one or more...
3 United States 51374 004 1 Black or African American alone
4 United States 52238 005 1 Black or African American alone or in combinat...
income_df.tail()
NAME S0201_214E POPGROUP us POPGROUP_DESC
342 United States 126414 931 1 NaN
343 United States 135643 932 1 NaN
344 United States 55352 946 1 NaN
345 United States 80191 9Z8 1 NaN
346 United States 78411 9Z9 1 NaN

Unfortunately, many are missing and I’m not sure what the issue is at the moment.

income_df.dropna(inplace=True)
len(income_df)
86
income_df.sample(10)
NAME S0201_214E POPGROUP us POPGROUP_DESC
129 United States 77024 420 1 Peruvian (237)
146 United States 78234 462 1 Some Other Race alone or in combination with o...
147 United States 72601 463 1 Two or More Races, not Hispanic or Latino
0 United States 74755 001 1 Total Population
132 United States 82993 423 1 Spaniard (200-209)
74 United States 85527 117 1 Asian; Native Hawaiian and Other Pacific Islander
145 United States 75631 461 1 Some Other Race alone, not Hispanic or Latino
113 United States 66241 403 1 Cuban (270-274)
46 United States 76421 060 1 Native Hawaiian and Other Pacific Islander alo...
66 United States 95428 107 1 White; Asian

We can see the Asian Indian data again.

income_df[income_df['POPGROUP'] == '013']
NAME S0201_214E POPGROUP us POPGROUP_DESC
8 United States 152341 013 1 Asian Indian alone (400-401)

Getting Population Groups

params = {
    "get": "POPGROUP,POPGROUP_LABEL",  # Request codes and labels
    "for": "us:1",                     # National level
    "key": API_KEY
}
year = 2023
base_url = f"https://api.census.gov/data/{year}/acs/acs1/spp"
# Make the request
response = requests.get(base_url, params=params)

# Check if request was successful
response.raise_for_status()

# Parse JSON response
data = response.json()

# Create DataFrame from the response (skip header row)
popgroups_df = pd.DataFrame(data[1:], columns=data[0])

# Convert to appropriate data types
popgroups_df = popgroups_df.convert_dtypes()
len(popgroups_df)
5545
popgroups_df.sample(10)
POPGROUP POPGROUP_LABEL us
1066 1462 Native Village of Buckland alone 1
1935 21H Tlingit alone 1
1233 2124 Skull Valley Band of Goshute Indians of Utah a... 1
1262 2193 Upper Chinook alone 1
1523 3885 Rotuman alone 1
3451 2822 Village of Solomon alone or in any combination 1
4873 2907 Cherokee Alabama alone or in any combination 1
1566 563 African 1
1080 095 Mariana Islander alone 1
3867 2590 Central Council of the Tlingit and Haida India... 1

Using the Census Wrapper

There is also a census wrapper you can use that’s available for download at https://pypi.org/project/census/. Let’s use it to get some income data.

Let’s look at the B19013 table. You’ll note that not all subgroups are available here. If you want to dig deeper into, say, Asian subgroups, you need to look at a different table.

{
  "name": "B19013A",
  "description": "MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2021 INFLATION-ADJUSTED DOLLARS) (WHITE ALONE HOUSEHOLDER)",
  "variables": "http://api.census.gov/data/2021/acs/acs5/groups/B19013A.json",
  "universe ": "Households with a householder who is White alone"
}

Now we call the census wrapper.

c = Census(API_KEY)
race_data = c.acs5.get(
    ('NAME', 
     'B19013_001E',  # Total population
     'B19013A_001E', # White alone
     'B19013B_001E', # Black alone
     'B19013C_001E', # American Indian/Alaska Native alone
     'B19013D_001E', # Asian alone
     'B19013E_001E', # Native Hawaiian/Pacific Islander alone
     'B19013F_001E', # Some other race alone
     'B19013G_001E', # Two or more races
     'B19013H_001E', # White alone, not Hispanic
     'B19013I_001E', # Hispanic/Latino origin (any race)
    ),
    {'for': 'us:*'}
)
race_data
df = pd.DataFrame(race_data)
df

It’s… a little ugly. So we can rename the columns.

df = df.rename(columns={
    'B19013_001E': 'Median_Income_Total',
    'B19013A_001E': 'Median_Income_White_Alone',
    'B19013B_001E': 'Median_Income_Black_Alone',
    'B19013C_001E': 'Median_Income_AmIndian_Alone',
    'B19013D_001E': 'Median_Income_Asian_Alone',
    'B19013E_001E': 'Median_Income_Hawaiian_Alone',
    'B19013F_001E': 'Median_Income_Other_Alone',
    'B19013G_001E': 'Median_Income_TwoOrMore',
    'B19013H_001E': 'Median_Income_White_NonHispanic',
    'B19013I_001E': 'Median_Income_Hispanic',
})
df

Using the Website

I don’t find the website particularly easy to use, either. You can see some of the same information though. Here is the page for the American Community Survey, contains a lot of their data:

Here is the ACS data on Asian Indians:

Here’s the same for total population:

You can see in the URL how the iteration codes work. You can either change that value directly or use the filters on the left sidebar.

Errors in the Data

I was surprised to find lots of errors in the data. Here are a couple of examples. Beware, I guess!

Note on Dates

You might have noticed that I used 2022 in the example above. That’s because that’s the 2023 (and beyond) data doesn’t seem to be there for every table. Sometimes they are available though so I think you just have to check.

YEAR     = 2023 
DATASET  = f"https://api.census.gov/data/{YEAR}/acs/acs1/spp"
FIELDS   = "NAME,S0201_214E"
POP_CODE = "013"
URL      = (f"{DATASET}?get={FIELDS}"
            f"&for=us:1&POPGROUP={POP_CODE}&key={API_KEY}")

empty_resp = requests.get(URL, timeout=30)
print(empty_resp.status_code)
print(empty_resp.text[:500])

You can see that I got a 204 back, indicating that there was no content returned.