library(findSVI)
library(dplyr)
First introduced in 2011 (Flanagan BE, Gregory EW, Hallisey EJ, Heitgerd JL, Lewis B.), the CDC/ATSDR Social Vulnerability Index (SVI) serves as a tool to assess the resilience of communities by considering socioeconomic and demographic factors. This valuable information plays a crucial role in preparing for and managing public health emergencies, as it enables effective planning of social services and public assistance. The CDC/ATSDR Social Vulnerability Index (CDC/ATSDR SVI) utilizes 16 U.S. census variables grouped into 4 domains/themes, and obtains a relative vulnerability level using percentile ranks for each geographic unit within a region. Communities with higher SVI are considered more vulnerable in public health crisis. For more details, please refer to CDC/ATSDR SVI website.
CDC/ATSDR releases SVI biannually here in both shapefile and csv format, at the counties/census tracts level within an individual state or in the US. While the SVI database is very useful, sometimes we would prefer more up-to-date census data or different geographic levels. For example, if we’d like to address questions about ZCTA-level SVI of Pennsylvania in 2021, or census tract-level SVI within a few counties in Pennsylvania in 2020, we might need to calculate SVI from the census data ourselves.
findSVI aims to support more flexible and specific SVI analysis in these cases with additional options for years (2012-2021) and geographic levels (eg. ZCTA/places, combining multiple states).
This document introduces you to the datasets and basic tools of findSVI for census data retrieval and SVI calculation.
To retrieve census data and calculate SVI based on CDC/ATSDR documentation, a series of lists and tables containing census variables information are included in the package.
These datasets are documented in ?census_variables
and
?variable_calculation
.
Currently, tidycensus::get_acs()
does not support
requests for state-specific ZCTA-level data starting 2019(subject
table)/2020(all tables). This is likely due to changes in Census API, as
ZCTAs are not subgeographies of states (some ZCTAs cross state
boundaries). To obtain state-specific ZCTA-level data, three atasets of
ZCTA-to-state crosswalks are included to help selecting the ZCTAs in the
state(s) of interest after retrieving the ZCTA data at the national
level.
These crosswalk files are documented in
?zcta_state_xwalk
.
get_census_data()
get_census_data()
uses
tidycensus::get_acs()
with a pre-defined list of variables
to retrieves ACS data for SVI calculation. The list of census variables
is built in the function, and changes according to the year of interest.
Importantly, a Census API key is required for this function to work,
which can be obtained online and set up
by tidycensus::census_api_key("YOUR KEY GOES HERE")
. The
arguments are largely the same with tidycensus::get_acs()
,
including year, geography and state.
For example, we can retrieve ZCTA-level data for Rhode Island for 2018:
<- get_census_data(2018, "zcta", "RI")
data 1:10, 1:10] data[
#> # A tibble: 10 × 10
#> GEOID NAME B17001_002E B17001_002M B19301_001E B19301_001M B06009_002E
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 02802 ZCTA5 02802 154 190 24925 14640 80
#> 2 02804 ZCTA5 02804 130 91 39065 6412 56
#> 3 02806 ZCTA5 02806 520 183 61534 3820 383
#> 4 02807 ZCTA5 02807 73 33 39287 7937 19
#> 5 02808 ZCTA5 02808 162 166 29356 3819 272
#> 6 02809 ZCTA5 02809 1619 368 34252 2269 2077
#> 7 02812 ZCTA5 02812 31 52 41718 5771 72
#> 8 02813 ZCTA5 02813 605 271 42612 4889 411
#> 9 02814 ZCTA5 02814 722 253 37750 3056 381
#> 10 02815 ZCTA5 02815 13 21 71975 22744 0
#> # ℹ 3 more variables: B06009_002M <dbl>, B09001_001E <dbl>, B09001_001M <dbl>
(First 10 rows and columns are shown, with the rest of the columns being other census variables.)
Note that for ZCTA-level after 2018, data retrieving by state is not
supported by Census API/tidycensus. For such requests,
get_census_data()
first retrieves ZCTA-level data for the
whole country, and then uses the ZCTA-to-state relationship file
(crosswalk) to select the ZCTAs in the state(s) of interest. This
results in a longer running time for these requests.
get_svi()
get_svi()
takes the year and census data (retrieved by
get_census_data()
) as arguments, and calculate the SVI
based on CDC/ATSDR
documentation. This function uses the built-in
variable_calculation
tables and populate the SVI variables
with census variables directly, or basic summation/percentage
calculation of census variables. For each SVI variable,a geographic unit
is ranked against the others in the selected region, followed by summing
up rankings for variables within each theme to perform percentile
ranking again as the SVI for each theme and overall SVI.
For example, to obtain ZCTA-level SVI for Rhode Island for 2018:
<- get_svi(2018, data)
result glimpse(result)
#> Rows: 77
#> Columns: 60
#> $ GEOID <chr> "02802", "02804", "02806", "02807", "02808", "02809", "0281…
#> $ NAME <chr> "ZCTA5 02802", "ZCTA5 02804", "ZCTA5 02806", "ZCTA5 02807",…
#> $ E_TOTPOP <dbl> 671, 2004, 16192, 827, 2565, 22258, 1208, 7780, 7673, 208, …
#> $ E_HU <dbl> 314, 947, 6393, 1856, 969, 9181, 402, 5173, 3350, 76, 14272…
#> $ E_HH <dbl> 223, 840, 6111, 429, 889, 8442, 402, 3200, 2903, 76, 13304,…
#> $ E_POV <dbl> 154, 130, 520, 73, 162, 1619, 31, 605, 722, 13, 2575, 143, …
#> $ E_UNEMP <dbl> 18, 12, 244, 21, 171, 424, 44, 330, 167, 0, 1016, 123, 459,…
#> $ E_PCI <dbl> 24925, 39065, 61534, 39287, 29356, 34252, 41718, 42612, 377…
#> $ E_NOHSDP <dbl> 80, 56, 383, 19, 272, 2077, 72, 411, 381, 0, 2011, 158, 523…
#> $ E_AGE65 <dbl> 15, 351, 2680, 221, 267, 4578, 144, 1733, 1207, 16, 5520, 8…
#> $ E_AGE17 <dbl> 220, 331, 4375, 143, 598, 3201, 323, 1265, 1489, 74, 6322, …
#> $ E_DISABL <dbl> 194, 200, 1453, 96, 184, 2234, 149, 818, 1172, 53, 5630, 39…
#> $ E_SNGPNT <dbl> 94, 47, 254, 36, 45, 447, 10, 202, 134, 0, 824, 176, 396, 9…
#> $ E_MINRTY <dbl> 87, 0, 1426, 49, 264, 1850, 146, 476, 518, 37, 2058, 606, 2…
#> $ E_LIMENG <dbl> 18, 0, 98, 0, 0, 416, 0, 0, 0, 0, 205, 47, 91, 0, 10, 14, 0…
#> $ E_MUNIT <dbl> 72, 0, 147, 90, 0, 592, 0, 38, 46, 0, 1119, 158, 1163, 60, …
#> $ E_MOBILE <dbl> 0, 13, 0, 37, 0, 0, 0, 232, 174, 0, 841, 98, 100, 231, 8, 0…
#> $ E_CROWD <dbl> 18, 0, 11, 10, 0, 71, 0, 68, 11, 0, 166, 44, 69, 15, 33, 0,…
#> $ E_NOVEH <dbl> 10, 13, 151, 11, 0, 530, 0, 90, 83, 0, 472, 0, 563, 29, 61,…
#> $ E_GROUPQ <dbl> 0, 0, 34, 39, 0, 3559, 0, 49, 10, 0, 452, 33, 59, 288, 20, …
#> $ EP_POV <dbl> 23.0, 6.5, 3.2, 8.8, 6.4, 8.6, 2.6, 7.8, 9.5, 6.3, 8.0, 2.4…
#> $ EP_UNEMP <dbl> 6.4, 1.0, 2.9, 4.6, 11.4, 3.6, 6.7, 7.4, 3.8, 0.0, 5.5, 3.3…
#> $ EP_PCI <dbl> 24925, 39065, 61534, 39287, 29356, 34252, 41718, 42612, 377…
#> $ EP_NOHSDP <dbl> 20.1, 3.9, 3.4, 2.8, 15.7, 14.0, 8.9, 7.0, 6.7, 0.0, 8.4, 3…
#> $ EP_AGE65 <dbl> 2.2, 17.5, 16.6, 26.7, 10.4, 20.6, 11.9, 22.3, 15.7, 7.7, 1…
#> $ EP_AGE17 <dbl> 32.8, 16.5, 27.0, 17.3, 23.3, 14.4, 26.7, 16.3, 19.4, 35.6,…
#> $ EP_DISABL <dbl> 28.9, 10.0, 9.0, 11.6, 7.2, 10.3, 12.5, 10.5, 15.3, 25.5, 1…
#> $ EP_SNGPNT <dbl> 42.2, 5.6, 4.2, 8.4, 5.1, 5.3, 2.5, 6.3, 4.6, 0.0, 6.2, 8.1…
#> $ EP_MINRTY <dbl> 13.0, 0.0, 8.8, 5.9, 10.3, 8.3, 12.1, 6.1, 6.8, 17.8, 6.3, …
#> $ EP_LIMENG <dbl> 3.1, 0.0, 0.6, 0.0, 0.0, 1.9, 0.0, 0.0, 0.0, 0.0, 0.7, 0.8,…
#> $ EP_MUNIT <dbl> 22.9, 0.0, 2.3, 4.8, 0.0, 6.4, 0.0, 0.7, 1.4, 0.0, 7.8, 6.6…
#> $ EP_MOBILE <dbl> 0.0, 1.4, 0.0, 2.0, 0.0, 0.0, 0.0, 4.5, 5.2, 0.0, 5.9, 4.1,…
#> $ EP_CROWD <dbl> 8.1, 0.0, 0.2, 2.3, 0.0, 0.8, 0.0, 2.1, 0.4, 0.0, 1.2, 2.0,…
#> $ EP_NOVEH <dbl> 4.5, 1.5, 2.5, 2.6, 0.0, 6.3, 0.0, 2.8, 2.9, 0.0, 3.5, 0.0,…
#> $ EP_GROUPQ <dbl> 0.0, 0.0, 0.2, 4.7, 0.0, 16.0, 0.0, 0.6, 0.1, 0.0, 1.4, 0.5…
#> $ EPL_POV <dbl> 0.9054, 0.4054, 0.1486, 0.5405, 0.3919, 0.5135, 0.0946, 0.4…
#> $ EPL_UNEMP <dbl> 0.6842, 0.1053, 0.1711, 0.4079, 0.9605, 0.2632, 0.7105, 0.8…
#> $ EPL_PCI <dbl> 0.8684, 0.4605, 0.0263, 0.4211, 0.7763, 0.6711, 0.3158, 0.2…
#> $ EPL_NOHSDP <dbl> 0.9211, 0.2500, 0.1842, 0.1447, 0.8553, 0.8026, 0.5921, 0.4…
#> $ EPL_AGE65 <dbl> 0.0789, 0.5132, 0.4474, 0.9605, 0.1842, 0.7895, 0.2105, 0.8…
#> $ EPL_AGE17 <dbl> 0.9737, 0.2632, 0.9211, 0.3684, 0.8158, 0.1579, 0.9079, 0.2…
#> $ EPL_DISABL <dbl> 1.0000, 0.1867, 0.1467, 0.4000, 0.1067, 0.2267, 0.4667, 0.2…
#> $ EPL_SNGPNT <dbl> 0.9865, 0.4324, 0.2838, 0.7027, 0.3649, 0.3919, 0.1216, 0.5…
#> $ EPL_MINRTY <dbl> 0.6447, 0.0000, 0.4211, 0.2237, 0.5000, 0.4079, 0.5921, 0.2…
#> $ EPL_LIMENG <dbl> 0.8289, 0.0000, 0.4342, 0.0000, 0.0000, 0.7500, 0.0000, 0.0…
#> $ EPL_MUNIT <dbl> 0.9459, 0.0000, 0.2838, 0.3919, 0.0000, 0.4459, 0.0000, 0.2…
#> $ EPL_MOBILE <dbl> 0.0000, 0.7973, 0.0000, 0.8378, 0.0000, 0.0000, 0.0000, 0.9…
#> $ EPL_CROWD <dbl> 1.0000, 0.0000, 0.2973, 0.8243, 0.0000, 0.4865, 0.0000, 0.7…
#> $ EPL_NOVEH <dbl> 0.4054, 0.1757, 0.2162, 0.2297, 0.0000, 0.5946, 0.0000, 0.2…
#> $ EPL_GROUPQ <dbl> 0.0000, 0.0000, 0.2368, 0.8158, 0.0000, 0.9342, 0.0000, 0.4…
#> $ SPL_theme1 <dbl> 3.3791, 1.2212, 0.5302, 1.5142, 2.9840, 2.2504, 1.7130, 1.9…
#> $ SPL_theme2 <dbl> 3.0391, 1.3955, 1.7990, 2.4316, 1.4716, 1.5660, 1.7067, 1.9…
#> $ SPL_theme3 <dbl> 1.4736, 0.0000, 0.8553, 0.2237, 0.5000, 1.1579, 0.5921, 0.2…
#> $ SPL_theme4 <dbl> 2.3513, 0.9730, 1.0341, 3.0995, 0.0000, 2.4612, 0.0000, 2.6…
#> $ RPL_theme1 <dbl> 0.9211, 0.2237, 0.0395, 0.3158, 0.8289, 0.6447, 0.4474, 0.6…
#> $ RPL_theme2 <dbl> 1.0000, 0.1711, 0.3421, 0.7237, 0.2105, 0.2632, 0.3158, 0.3…
#> $ RPL_theme3 <dbl> 0.8026, 0.0000, 0.4868, 0.1447, 0.2763, 0.5921, 0.3158, 0.1…
#> $ RPL_theme4 <dbl> 0.4474, 0.1579, 0.2237, 0.8158, 0.0000, 0.4737, 0.0000, 0.6…
#> $ SPL_themes <dbl> 10.2431, 3.5897, 4.2186, 7.2690, 4.9556, 7.4355, 4.0118, 6.…
#> $ RPL_themes <dbl> 0.8553, 0.1184, 0.1579, 0.5263, 0.2237, 0.5526, 0.1447, 0.4…
Columns include geographic unit information, individual SVI variables (“E_xx” and “EP_xx”), intermediate percentile rankings (“EPL_xx” and “SPL_xx”), and the theme-specific and overall SVIs (“RPL_xx”).
find_svi()
To retrieve census data and compute SVI in one step, we could use
find_svi()
. While get_census_data()
only
accepts a single year for year
(and multiple states for
state
) just like tidycensus::get_acs()
,
find_svi()
accepts pairing vectors of year
and
state
for the SAME geography level. This allows processing
multiple year-state combinations in one function, with separate data
retrieval and SVI calculation for every year-state entry and returning a
summarised SVI table for all pairs of year-state values.
One important difference in data retrieval between
find_svi()
and get_census_data()
is that the
year-state combinations will always be evaluated as “one year and one
state” – that is, the option to get census data for multiple states at
once (for one year) in get_census_data()
will be disabled
in find_svi()
. There is an exception to this one-on-one
rule, when a single year is supplied into year
, you can set
the state = NULL
as default to perform nation-level data
retrieval and SVI calculation.
For SVI table output, find_svi()
by default returns a
summarised SVI table with only the GEOID, theme-specific SVIs and SVI
for all 4 themes for each year-state combination. Alternatively, there’s
an option to return a full SVI table with every SVI variable and
intermediate ranking values (as get_svi()
) by setting
full.table = TRUE
. For both options, corresponding year and
state information will be included as two separate columns in the
table.
Using the same example as above, to obtain ZCTA-level census data and calculate SVI for Rhode Island for 2018 in one step:
<- find_svi(2018, "RI", "zcta")
onestep_result %>% head(10) onestep_result
#> # A tibble: 10 × 8
#> GEOID RPL_theme1 RPL_theme2 RPL_theme3 RPL_theme4 RPL_themes year state
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 02802 0.921 1 0.803 0.447 0.855 2018 RI
#> 2 02804 0.224 0.171 0 0.158 0.118 2018 RI
#> 3 02806 0.0395 0.342 0.487 0.224 0.158 2018 RI
#> 4 02807 0.316 0.724 0.145 0.816 0.526 2018 RI
#> 5 02808 0.829 0.210 0.276 0 0.224 2018 RI
#> 6 02809 0.645 0.263 0.592 0.474 0.553 2018 RI
#> 7 02812 0.447 0.316 0.316 0 0.145 2018 RI
#> 8 02813 0.618 0.382 0.171 0.632 0.460 2018 RI
#> 9 02814 0.5 0.487 0.224 0.342 0.382 2018 RI
#> 10 02815 0.0263 0.513 0.342 0 0.0789 2018 RI
This is a glimpse of the first 10 rows of the summarised SVI table,
with additional columns indicating the year and state information. At
default, the summarised table only keeps the GEOID and SVIs. Set
full.table = TRUE
for a more complete SVI table with all
the individual SVI variables from census data (like the result from
get_svi()
shown in the previous section).
For multiple year-state combinations, we could supply two vectors to
year
and state
arguments and they’ll be
treated as pairs. For example, to obtain county-level SVI of New Jersey
and Pennsylvania for 2017 and 2018, respectively:
<- find_svi(
summarise_results year = c(2017, 2018),
state = c("NJ", "PA"),
geography = "county"
)
%>%
summarise_results group_by(year, state) %>%
slice_head(n = 5)
#> # A tibble: 10 × 8
#> # Groups: year, state [2]
#> GEOID RPL_theme1 RPL_theme2 RPL_theme3 RPL_theme4 RPL_themes year state
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 34001 0.95 0.8 0.65 1 0.95 2017 NJ
#> 2 34003 0.2 0.3 0.55 0.45 0.25 2017 NJ
#> 3 34005 0.3 0.5 0.35 0.4 0.3 2017 NJ
#> 4 34007 0.7 0.9 0.55 0.6 0.75 2017 NJ
#> 5 34009 0.65 0.6 0.1 0.55 0.45 2017 NJ
#> 6 42001 0.212 0.242 0.697 0.227 0.182 2018 PA
#> 7 42003 0.136 0.0758 0.742 0.576 0.212 2018 PA
#> 8 42005 0.621 0.530 0.0152 0.167 0.227 2018 PA
#> 9 42007 0.182 0.409 0.530 0.348 0.197 2018 PA
#> 10 42009 0.712 0.606 0.0758 0.288 0.394 2018 PA
As a result, we have a table summarising the county-level SVI of New Jersey for 2017 and that of Pennsylvania for 2018, after retrieving census data for these two year-state pairs (first 5 rows of SVI results for each pair are shown above). Again, here data retrieval and SVI calculation (percentile ranking) are performed separately for 2017-NJ and 2018-PA, and the resulting SVIs are combined into a summarised table.
As other R functions that accepts vectors in their arguments, another
way to supply year
and state
pairs is to
extract columns from a table. Suppose we have a table called
info_table
containing the year-state information we’d like
to include in the analysis:
#> year state
#> 1 2017 AZ
#> 2 2018 FL
#> 3 2014 FL
#> 4 2018 PA
#> 5 2013 MA
#> 6 2020 KY
We could extract specific columns of interest from
info_table
for the year
and state
arguments:
<- find_svi(
all_results year = info_table$year,
state = info_table$state,
geography = "county"
)
%>%
all_results group_by(year, state) %>%
slice_head(n = 3)
#> # A tibble: 18 × 8
#> # Groups: year, state [6]
#> GEOID RPL_theme1 RPL_theme2 RPL_theme3 RPL_theme4 RPL_themes year state
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 25001 0.231 0.462 0.0769 0 0 2013 MA
#> 2 25003 0.769 0.769 0.308 0.692 0.692 2013 MA
#> 3 25005 0.923 0.923 0.615 0.538 0.846 2013 MA
#> 4 12001 0.333 0 0.485 0.727 0.242 2014 FL
#> 5 12003 0.485 0.803 0.0606 0.424 0.454 2014 FL
#> 6 12005 0.242 0.652 0.197 0.394 0.288 2014 FL
#> 7 04001 1 0.929 0.857 0.714 1 2017 AZ
#> 8 04003 0.214 0.714 0.571 0.429 0.357 2017 AZ
#> 9 04005 0.357 0 0.214 0.857 0.286 2017 AZ
#> 10 12001 0.439 0 0.606 0.636 0.242 2018 FL
#> 11 12003 0.485 0.894 0.0758 0.439 0.439 2018 FL
#> 12 12005 0.318 0.803 0.318 0.5 0.470 2018 FL
#> 13 42001 0.212 0.242 0.697 0.227 0.182 2018 PA
#> 14 42003 0.136 0.0758 0.742 0.576 0.212 2018 PA
#> 15 42005 0.621 0.530 0.0152 0.167 0.227 2018 PA
#> 16 21001 0.580 0.109 0.538 0.689 0.445 2020 KY
#> 17 21003 0.664 0.782 0.277 0.353 0.555 2020 KY
#> 18 21005 0.235 0.622 0.487 0.0084 0.118 2020 KY
Here, only showing first 3 rows of results for each year-state
combination, what we’re actually getting is a table with SVIs for all
the counties in the 6 year-state pairs from the columns of
info_table
. This will likely make things easier especially
there’s a long list of year-state combinations to process.