library(httr)
library(jsonlite)
library(stringr)
library(data.table)
library(withr)
library(sf)
library(kableExtra)
library(ggplot2)
library(RplotterPkg)
library(RcensusPkg)Introduction
The goal of RcensusPkg is to provide easy access to the US Census Bureau’s datasets and collection of TIGER/Line Shapefiles providing plot geometries for states, counties, roads, landmarks, water, enumeration tracts/blocks for the entire United States. The only requirement is for the user to apply for and obtain a free access key issued from the Bureau. See Guidance for Developers for additional information.
The example below illustrates a simple workflow for downloading a dataset, merging the data with shapefile geometries, and plotting the merge to create a choropleth map.
Installation
You can install the development version of RcensusPkg from GitHub with:
# install.packages("pak")
pak::pak("deandevl/RcensusPkg")Using devtools::install_github():
devtools::install_github("deandevl/RcensusPkg")
Also for this example we will need devtools::install_github("deandevl/RplotterPkg")
Example
US Census Bureau API key
All Census Bureau API requests require an access key. Sign-up for a key is free and can be obtained here. The functions will check for a global setting of the key via Sys.getenv(CENSUS_KEY
). Run usethis::edit_r_environ() and edit your .Renviron file with the line: CENSUS_KEY=your key to create the global association.
Setup
We will be using the following packages.
A look at the US Census Bureau’s community resilience estimates (CRE) database.
Among the list of available API, there is the Community Resilience Estimates based on such factors as:
- Income-to-Poverty Ratio (IPR) < 130 percent
- Single or zero caregiver household
- Aged 65 years or older
- No health insurance coverage
- No vehicle access (Household)
- Disability, at least one serious constraint to significant life activity
- No one in the household is employed full-time, year-round
- Households without broadband internet access
- Unit-level crowding with >= 0.75 persons per room
- No one in the household has received a high school diploma
- No one in the household speaks English
very well
The factors are used to estimate the number of people with:
- 0 risk factors (Low risk)
- 1-2 risk factors (Medium risk)
- 3 or more risk factors (High risk)
The workflow for using RcensusPkg is the following:
- Get a database name recognized by the API. Use
RcensusPkg::get_dataset_names()filtering for aresilience
title and vintage of 2022.
datasets_dt <- RcensusPkg::get_dataset_names(
vintage = 2022,
filter_title_str = "resilience"
)| name | vintage | title |
|---|---|---|
| cre | 2022 | Community Resilience Estimates |
| crepuertorico | 2022 | Community Resilience Estimates for Puerto Rico |
The returned dataframe shows a dataset name for CRE 2022 as (surprise) cre
.
- Get the variable names available for the
cre
dataset.
cre_var_names_dt <- RcensusPkg::get_variable_names(
dataset = "cre",
vintage = 2022
) |>
_[, .(name, label)]| name | label |
|---|---|
| COUNTY | Geography |
| GEOCOMP | GEO_ID Component |
| GEO_ID | Geographic Identifier |
| NATION | Geography |
| POPUNI | Population Universe |
| PRED0_E | Estimated number of individuals with zero components of social vulnerability |
| PRED0_PE | Rate of individuals with zero components of social vulnerability |
| PRED12_E | Estimated number of individuals with one-two components of social vulnerability |
| PRED12_PE | Rate of individuals with one-two components of social vulnerability |
| PRED3_E | Estimated number of individuals with three or more components of social vulnerability |
| PRED3_PE | Rate of individuals with three or more components of social vulnerability |
| STATE | Geography |
| SUMLEVEL | Summary Level code |
| TRACT | Geography |
| for | Census API FIPS forclause |
| in | Census API FIPS inclause |
| ucgid | Uniform Census Geography Identifier clause |
We are interested in the percentage of individuals with three or more vulnerabilities (PRED3_PE
)
- Get the regions available for the CRE dataset.
cre_regions_dt <- RcensusPkg::get_geography(
dataset = "cre",
vintage = 2022
)| name | geoLevelDisplay |
|---|---|
| us | 010 |
| state | 040 |
| county | 050 |
| tract | 140 |
So we can get CRE estimates from the entire US, state, county, and tract enumeration levels. We are interested in the counties for the state of Florida.
- Download the data:
florida_fips <- usmap::fips(("FL"))
florida_cre_dt <- RcensusPkg::get_vintage_data(
dataset = "cre",
vintage = 2022,
vars = "PRED3_PE",
region = "county",
regionin = paste0("state:", florida_fips)
) |>
_[, PRED3_PE := as.numeric(PRED3_PE)] |>
data.table::setnames(old = "PRED3_PE", new = "CRE_GE_3")| NAME | CRE_GE_3 | state | county | GEOID |
|---|---|---|---|---|
| Alachua County, Florida | 19.29 | 12 | 001 | 12001 |
| Baker County, Florida | 22.93 | 12 | 003 | 12003 |
| Bay County, Florida | 20.05 | 12 | 005 | 12005 |
| Bradford County, Florida | 24.77 | 12 | 007 | 12007 |
| Brevard County, Florida | 20.67 | 12 | 009 | 12009 |
| Broward County, Florida | 22.44 | 12 | 011 | 12011 |
| Calhoun County, Florida | 31.52 | 12 | 013 | 12013 |
| Charlotte County, Florida | 27.53 | 12 | 015 | 12015 |
- Merge the CRE county data with the Florida county shapefile geographies from the Bureau. Note that we need to establish a folder for receiving the shapefiles.
output_dir <- withr::local_tempdir()
if(!dir.exists(output_dir)){
dir.create(output_dir)
}
express <- parse(text = paste0("STATEFP == ", '"', florida_fips, '"'))
cre_florida_sf <- RcensusPkg::tiger_counties_sf(
output_dir = output_dir,
vintage = 2022,
general = TRUE,
express = express,
datafile = florida_cre_dt,
datafile_key = "county"
)- Plot the choropleth map of the percent of individuals with 3 or more risk factors.
RplotterPkg::create_sf_plot(
sf = cre_florida_sf,
aes_fill = "CRE_GE_3",
hide_x_tics = TRUE,
hide_y_tics = TRUE,
panel_color = "white",
panel_border_color = "white"
)
A more detailed example of the RcensusPkg workflow is available here.