Usage
The get_education_data
function will return a
data.frame
from a call to the Education Data API.
library(educationdata)
get_education_data(level, source, topic, by, filters, add_labels, csv)
where:
- level (required) - API data level to query.
- source (required) - API data source to query.
- topic (required) - API data topic to query.
- by (optional) - Optional
list
of grouping parameters
for an API call.
- filters (optional) - Optional
list
query to filter the
results from an API call.
- add_labels - Add variable labels as factors (when applicable)?
Defaults to
FALSE
.
- csv - Download the full csv file? Defaults to
FALSE
.
This simple example will obtain ‘college-university’
level
data from the ‘ipeds’ source
for the
‘student-faculty-ratio’ topic
:
library(educationdata)
df <- get_education_data(
level = 'college-university',
source = 'ipeds',
topic = 'student-faculty-ratio'
)
head(df)
#> unitid year fips student_faculty_ratio
#> 1 100654 2009 1 14
#> 2 100663 2009 1 17
#> 3 100690 2009 1 10
#> 4 100706 2009 1 17
#> 5 100724 2009 1 17
#> 6 100751 2009 1 20
A somewhat more complex example will obtain ‘school’
level
data from the ‘ccd’ source
for the
‘enrollment’ topic
, broken out by
‘race’ and
‘sex’. The API query is subset with filters
for the ‘year’
2008, ‘grade’ 9 through 12, and a ‘ncessch’ code of 340606000122.
Finally, the add_labels
flag will map integer codes to
their factor labels (‘race’ and ‘sex’ in this instance).
library(educationdata)
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 2008,
grade = 9:12,
ncessch = '340606000122'),
add_labels = TRUE)
#> Warning in get_education_data(level = "schools", source = "ccd", topic = "enrollment", : The `by` argument has been deprecated in favor of `subtopic`.
#> Please update your script to use `subtopic` instead.
head(df)
#> year ncessch ncessch_num grade race sex
#> 1 2008 340606000122 3.40606e+11 9 Black Male
#> 2 2008 340606000122 3.40606e+11 9 Hispanic Male
#> 3 2008 340606000122 3.40606e+11 9 American Indian or Alaska Native Female
#> 4 2008 340606000122 3.40606e+11 9 American Indian or Alaska Native Male
#> 5 2008 340606000122 3.40606e+11 9 Black Female
#> 6 2008 340606000122 3.40606e+11 9 Asian Female
#> enrollment fips leaid
#> 1 41 New Jersey 3406060
#> 2 39 New Jersey 3406060
#> 3 0 New Jersey 3406060
#> 4 0 New Jersey 3406060
#> 5 46 New Jersey 3406060
#> 6 32 New Jersey 3406060
Main Filters
Due to the way the API is set-up, the variables listed within ‘main
filters’ are often the fastest way to subset an API call.
In addition to year
, the other main filters for certain
endpoints accept the following values:
Grade
grade = 'grade-pk' |
Pre-K |
grade = 'grade-k' |
Kindergarten |
grade = 'grade-1' |
Grade 1 |
grade = 'grade-2' |
Grade 2 |
grade = 'grade-3' |
Grade 3 |
grade = 'grade-4' |
Grade 4 |
grade = 'grade-5' |
Grade 5 |
grade = 'grade-6' |
Grade 6 |
grade = 'grade-7' |
Grade 7 |
grade = 'grade-8' |
Grade 8 |
grade = 'grade-9' |
Grade 9 |
grade = 'grade-10' |
Grade 10 |
grade = 'grade-11' |
Grade 11 |
grade = 'grade-12' |
Grade 12 |
grade = 'grade-13' |
Grade 13 |
grade = 'grade-14' |
Adult Education |
grade = 'grade-15' |
Ungraded |
grade = 'grade-16' |
K-12 |
grade = 'grade-20' |
Grades 7 and 8 |
grade = 'grade-21' |
Grade 9 and 10 |
grade = 'grade-22' |
Grades 11 and 12 |
grade = 'grade-99' |
Total |
Level of Study
level_of_study = 'undergraduate' |
Undergraduate |
level_of_study = 'graduate' |
Graduate |
level_of_study = 'first-professional' |
First Professional |
level_of_study = 'post-baccalaureate' |
Post-baccalaureate |
level_of_study = '99' |
Total |
Examples
Let’s build up some examples, from the following set of
endpoints.
schools |
ccd |
enrollment |
NA |
year, grade |
1986–2020 |
schools |
ccd |
enrollment |
race |
year, grade |
1986–2020 |
schools |
ccd |
enrollment |
race, sex |
year, grade |
1986–2020 |
schools |
ccd |
enrollment |
sex |
year, grade |
1986–2020 |
schools |
crdc |
enrollment |
disability, sex |
year |
2011, 2013, 2015, 2017 |
schools |
crdc |
enrollment |
lep, sex |
year |
2011, 2013, 2015, 2017 |
schools |
crdc |
enrollment |
race, sex |
year |
2011, 2013, 2015, 2017 |
NA |
NA |
NA |
NULL |
NULL |
NA |
The following will return a data.frame
across all years
and grades:
library(educationdata)
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment')
Note that this endpoint is also callable by
certain
variables:
These variables can be added to the by
argument:
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'))
You may also filter the results of an API call. In this case
year
and grade
will provide the most
time-efficient subsets, and can be vectorized:
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8))
Additional variables can also be passed to filters
to
subset further:
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8,
ncessch = '010000200277'))
Finally, the add_labels
flag will map variables to a
factor
from their labels in the API.
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8,
ncessch = '010000200277'),
add_labels = TRUE)
Finally, the csv
flag can be set to download the full
.csv
data frame. In general, the csv
functionality is much faster when retrieving the full data frame (or a
large subset) and much slower when retrieving a small subset of a data
frame (especially ones with a lot of filters
added). In
this example, the full csv
for 2008 must be downloaded and
then subset to the 96 observations.
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8,
ncessch = '010000200277'),
add_labels = TRUE,
csv = TRUE)