Introducing educationdata

The educationdata package allows the user to retrieve data from the Urban Institute’s Education Data API as a data.frame for analysis. The package contains one major function, get_education_data, which will get data from a specified API endpoint and return a data.frame to the user.

NOTE: By downloading and using this programming package, you agree to abide by the Data Policy and Terms of Use of the Education Data Portal. For more information, see https://educationdata.urban.org/documentation/#terms

Usage

The get_education_data function will return a data.frame from a call to the Education Data API.

library(educationdata)
get_education_data(level, source, topic, by, filters, add_labels, csv)

where:

level (required) - API data level to query.
source (required) - API data source to query.
topic (required) - API data topic to query.
by (optional) - Optional list of grouping parameters for an API call.
filters (optional) - Optional list query to filter the results from an API call.
add_labels - Add variable labels as factors (when applicable)? Defaults to FALSE.
csv - Download the full csv file? Defaults to FALSE.

This simple example will obtain ‘college-university’ level data from the ‘ipeds’ source for the ‘student-faculty-ratio’ topic:

library(educationdata)
 
df <- get_education_data(
   level = 'college-university',
   source = 'ipeds',
   topic = 'student-faculty-ratio'
 )

head(df)
#>   unitid year fips student_faculty_ratio
#> 1 100654 2009    1                    14
#> 2 100663 2009    1                    17
#> 3 100690 2009    1                    10
#> 4 100706 2009    1                    17
#> 5 100724 2009    1                    17
#> 6 100751 2009    1                    20

A somewhat more complex example will obtain ‘school’ level data from the ‘ccd’ source for the ‘enrollment’ topic, broken out by ‘race’ and ‘sex’. The API query is subset with filters for the ‘year’ 2008, ‘grade’ 9 through 12, and a ‘ncessch’ code of 340606000122. Finally, the add_labels flag will map integer codes to their factor labels (‘race’ and ‘sex’ in this instance).

library(educationdata)

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'),
                         filters = list(year = 2008,
                                        grade = 9:12,
                                        ncessch = '340606000122'),
                         add_labels = TRUE)
#> Warning in get_education_data(level = "schools", source = "ccd", topic = "enrollment", : The `by` argument has been deprecated in favor of `subtopic`.
#> Please update your script to use `subtopic` instead.

head(df)
#>   year      ncessch ncessch_num grade                             race    sex
#> 1 2008 340606000122 3.40606e+11     9                            Black   Male
#> 2 2008 340606000122 3.40606e+11     9                         Hispanic   Male
#> 3 2008 340606000122 3.40606e+11     9 American Indian or Alaska Native Female
#> 4 2008 340606000122 3.40606e+11     9 American Indian or Alaska Native   Male
#> 5 2008 340606000122 3.40606e+11     9                            Black Female
#> 6 2008 340606000122 3.40606e+11     9                            Asian Female
#>   enrollment       fips   leaid
#> 1         41 New Jersey 3406060
#> 2         39 New Jersey 3406060
#> 3          0 New Jersey 3406060
#> 4          0 New Jersey 3406060
#> 5         46 New Jersey 3406060
#> 6         32 New Jersey 3406060

Available Endpoints

Level	Source	Topic	By	Main Filters	Years Available
college-university	fsa	90-10-revenue-percentages	NA	year	2014–2017
college-university	fsa	campus-based-volume	NA	year	2001–2017
college-university	fsa	financial-responsibility	NA	year	2006–2016
college-university	fsa	grants	NA	year	1999–2018
college-university	fsa	loans	NA	year	1999–2018
college-university	ipeds	academic-libraries	NA	year	2013–2019
college-university	ipeds	academic-year-room-board-other	NA	year	1999–2020
college-university	ipeds	academic-year-tuition-prof-program	NA	year	1986–2008, 2010–2020
college-university	ipeds	academic-year-tuition	NA	year	1986–2020
college-university	ipeds	admissions-enrollment	NA	year	2001–2019
college-university	ipeds	admissions-requirements	NA	year	1990–2019
college-university	ipeds	completers	NA	year	2011–2019
college-university	ipeds	completions-cip-2	NA	year	1991–2019
college-university	ipeds	completions-cip-6	NA	year	1983–2019
college-university	ipeds	directory	NA	year	1980, 1984–2020
college-university	ipeds	enrollment-full-time-equivalent	NA	year, level_of_study	1997–2018
college-university	ipeds	enrollment-headcount	NA	year, level_of_study	1996–2018
college-university	ipeds	fall-enrollment	residence	year	1986, 1988, 1992, 1994, 1996, 1998, 2000–2020
college-university	ipeds	fall-enrollment	age, sex	year, level_of_study	1991, 1993, 1995, 1997, 1999–2020
college-university	ipeds	fall-enrollment	race, sex	year, level_of_study	1986–2020
college-university	ipeds	fall-retention	NA	year	2003–2020
college-university	ipeds	finance	NA	year	1979, 1983–2017
college-university	ipeds	grad-rates-200pct	NA	year	2007–2017
college-university	ipeds	grad-rates-pell	NA	year	2015–2017
college-university	ipeds	grad-rates	NA	year	1996–2017
college-university	ipeds	institutional-characteristics	NA	year	1980, 1984–2020
college-university	ipeds	outcome-measures	NA	year	2015–2018
college-university	ipeds	program-year-room-board-other	NA	year	1999–2020
college-university	ipeds	program-year-tuition-cip	NA	year	1987–2020
college-university	ipeds	salaries-instructional-staff	NA	year	1980, 1984, 1985, 1987, 1989–1999, 2001–2018
college-university	ipeds	salaries-noninstructional-staff	NA	year	2012–2018
college-university	ipeds	sfa-all-undergraduates	NA	year	2007–2017
college-university	ipeds	sfa-by-living-arrangement	NA	year	2008–2017
college-university	ipeds	sfa-by-tuition-type	NA	year	1999–2017
college-university	ipeds	sfa-ftft	NA	year	1999–2017
college-university	ipeds	sfa-grants-and-net-price	NA	year	2008–2017
college-university	ipeds	student-faculty-ratio	NA	year	2009–2020
college-university	nacubo	endowments	NA	year	2012–2018
college-university	nccs	990-forms	NA	year	1993–2016
college-university	nhgis	census-1990	NA	year	1980, 1984–2017
college-university	nhgis	census-2000	NA	year	1980, 1984–2017
college-university	nhgis	census-2010	NA	year	1980, 1984–2017
college-university	scorecard	default	NA	year	1996–2017
college-university	scorecard	earnings	NA	year	2003–2014
college-university	scorecard	institutional-characteristics	NA	year	1996–2017
college-university	scorecard	repayment	NA	year	2007–2016
college-university	scorecard	student-characteristics	aid-applicants	year	1997–2016
college-university	scorecard	student-characteristics	home-neighborhood	year	1997–2016
school-districts	ccd	directory	NA	year	1986–2020
school-districts	ccd	enrollment	NA	year, grade	1986–2020
school-districts	ccd	enrollment	race	year, grade	1986–2020
school-districts	ccd	enrollment	race, sex	year, grade	1986–2020
school-districts	ccd	enrollment	sex	year, grade	1986–2020
school-districts	ccd	finance	NA	year	1991, 1994–2018
school-districts	edfacts	assessments	NA	year, grade_edfacts	2009–2018
school-districts	edfacts	assessments	race	year, grade_edfacts	2009–2018
school-districts	edfacts	assessments	sex	year, grade_edfacts	2009–2018
school-districts	edfacts	assessments	special-populations	year, grade_edfacts	2009–2018
school-districts	edfacts	grad-rates	NA	year	2010–2018
school-districts	saipe	NA	NA	year	1995, 1997, 1999–2018
schools	ccd	directory	NA	year	1986–2020
schools	ccd	enrollment	NA	year, grade	1986–2020
schools	ccd	enrollment	race	year, grade	1986–2020
schools	ccd	enrollment	race, sex	year, grade	1986–2020
schools	ccd	enrollment	sex	year, grade	1986–2020
schools	crdc	algebra1	disability, sex	year	2011, 2013, 2015, 2017
schools	crdc	algebra1	lep, sex	year	2011, 2013, 2015, 2017
schools	crdc	algebra1	race, sex	year	2011, 2013, 2015, 2017
schools	crdc	ap-exams	disability, sex	year	2011, 2013, 2015, 2017
schools	crdc	ap-exams	lep, sex	year	2011, 2013, 2015, 2017
schools	crdc	ap-exams	race, sex	year	2011, 2013, 2015, 2017
schools	crdc	ap-ib-enrollment	disability, sex	year	2011, 2013, 2015, 2017
schools	crdc	ap-ib-enrollment	lep, sex	year	2011, 2013, 2015, 2017
schools	crdc	ap-ib-enrollment	race, sex	year	2011, 2013, 2015, 2017
schools	crdc	chronic-absenteeism	disability, sex	year	2013, 2015
schools	crdc	chronic-absenteeism	lep, sex	year	2013, 2015
schools	crdc	chronic-absenteeism	race, sex	year	2013, 2015
schools	crdc	credit-recovery	NA	year	2015, 2017
schools	crdc	directory	NA	year	2011, 2013, 2015, 2017
schools	crdc	discipline-instances	NA	year	2015, 2017
schools	crdc	discipline	disability, lep, sex	year	2011, 2013, 2015, 2017
schools	crdc	discipline	disability, race, sex	year	2011, 2013, 2015, 2017
schools	crdc	discipline	disability, sex	year	2011, 2013, 2015, 2017
schools	crdc	dual-enrollment	disability, sex	year	2013, 2015, 2017
schools	crdc	dual-enrollment	lep, sex	year	2013, 2015, 2017
schools	crdc	dual-enrollment	race, sex	year	2013, 2015, 2017
schools	crdc	enrollment	disability, sex	year	2011, 2013, 2015, 2017
schools	crdc	enrollment	lep, sex	year	2011, 2013, 2015, 2017
schools	crdc	enrollment	race, sex	year	2011, 2013, 2015, 2017
schools	crdc	harassment-or-bullying	allegations	year	2013, 2015, 2017
schools	crdc	harassment-or-bullying	disability, sex	year	2011, 2013, 2015, 2017
schools	crdc	harassment-or-bullying	lep, sex	year	2011, 2013, 2015, 2017
schools	crdc	harassment-or-bullying	race, sex	year	2011, 2013, 2015, 2017
schools	crdc	math-and-science	disability, sex	year	2011, 2013, 2015, 2017
schools	crdc	math-and-science	lep, sex	year	2011, 2013, 2015, 2017
schools	crdc	math-and-science	race, sex	year	2011, 2013, 2015, 2017
schools	crdc	offenses	NA	year	2015, 2017
schools	crdc	offerings	NA	year	2011, 2013, 2015, 2017
schools	crdc	restraint-and-seclusion	disability, lep, sex	year	2011, 2013, 2015, 2017
schools	crdc	restraint-and-seclusion	disability, race, sex	year	2011, 2013, 2015, 2017
schools	crdc	restraint-and-seclusion	disability, sex	year	2011, 2013, 2015, 2017
schools	crdc	restraint-and-seclusion	instances	year	2013, 2015, 2017
schools	crdc	retention	disability, sex	year, grade	2011, 2013, 2015, 2017
schools	crdc	retention	lep, sex	year, grade	2011, 2013, 2015, 2017
schools	crdc	retention	race, sex	year, grade	2011, 2013, 2015, 2017
schools	crdc	sat-act-participation	disability, sex	year	2011, 2013, 2015, 2017
schools	crdc	sat-act-participation	lep, sex	year	2011, 2013, 2015, 2017
schools	crdc	sat-act-participation	race, sex	year	2011, 2013, 2015, 2017
schools	crdc	school-finance	NA	year	2011, 2013, 2015, 2017
schools	crdc	suspensions-days	disability, sex	year	2015, 2017
schools	crdc	suspensions-days	lep, sex	year	2015, 2017
schools	crdc	suspensions-days	race, sex	year	2015, 2017
schools	crdc	teachers-staff	NA	year	2011, 2013, 2015, 2017
schools	edfacts	assessments	NA	year, grade_edfacts	2009–2018
schools	edfacts	assessments	race	year, grade_edfacts	2009–2018
schools	edfacts	assessments	sex	year, grade_edfacts	2009–2018
schools	edfacts	assessments	special-populations	year, grade_edfacts	2009–2018
schools	edfacts	grad-rates	NA	year	2010–2018
schools	meps	NA	NA	year	2013–2018
schools	nhgis	census-1990	NA	year	1986–2020
schools	nhgis	census-2000	NA	year	1986–2020
schools	nhgis	census-2010	NA	year	1986–2020

Main Filters

Due to the way the API is set-up, the variables listed within ‘main filters’ are often the fastest way to subset an API call.

In addition to year, the other main filters for certain endpoints accept the following values:

Grade

Filter Argument	Grade
`grade = 'grade-pk'`	Pre-K
`grade = 'grade-k'`	Kindergarten
`grade = 'grade-1'`	Grade 1
`grade = 'grade-2'`	Grade 2
`grade = 'grade-3'`	Grade 3
`grade = 'grade-4'`	Grade 4
`grade = 'grade-5'`	Grade 5
`grade = 'grade-6'`	Grade 6
`grade = 'grade-7'`	Grade 7
`grade = 'grade-8'`	Grade 8
`grade = 'grade-9'`	Grade 9
`grade = 'grade-10'`	Grade 10
`grade = 'grade-11'`	Grade 11
`grade = 'grade-12'`	Grade 12
`grade = 'grade-13'`	Grade 13
`grade = 'grade-14'`	Adult Education
`grade = 'grade-15'`	Ungraded
`grade = 'grade-16'`	K-12
`grade = 'grade-20'`	Grades 7 and 8
`grade = 'grade-21'`	Grade 9 and 10
`grade = 'grade-22'`	Grades 11 and 12
`grade = 'grade-99'`	Total

Level of Study

Filter Argument	Level of Study
`level_of_study = 'undergraduate'`	Undergraduate
`level_of_study = 'graduate'`	Graduate
`level_of_study = 'first-professional'`	First Professional
`level_of_study = 'post-baccalaureate'`	Post-baccalaureate
`level_of_study = '99'`	Total

Examples

Let’s build up some examples, from the following set of endpoints.

Level	Source	Topic	By	Main Filters	Years Available
schools	ccd	enrollment	NA	year, grade	1986–2020
schools	ccd	enrollment	race	year, grade	1986–2020
schools	ccd	enrollment	race, sex	year, grade	1986–2020
schools	ccd	enrollment	sex	year, grade	1986–2020
schools	crdc	enrollment	disability, sex	year	2011, 2013, 2015, 2017
schools	crdc	enrollment	lep, sex	year	2011, 2013, 2015, 2017
schools	crdc	enrollment	race, sex	year	2011, 2013, 2015, 2017
NA	NA	NA	NULL	NULL	NA

The following will return a data.frame across all years and grades:

library(educationdata)
df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment')

Note that this endpoint is also callable by certain variables:

race
sex
race, sex

These variables can be added to the by argument:

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'))

You may also filter the results of an API call. In this case year and grade will provide the most time-efficient subsets, and can be vectorized:

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'),
                         filters = list(year = 1988:1990,
                                        grade = 6:8))

Additional variables can also be passed to filters to subset further:

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'),
                         filters = list(year = 1988:1990,
                                        grade = 6:8,
                                        ncessch = '010000200277'))

Finally, the add_labels flag will map variables to a factor from their labels in the API.

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'),
                         filters = list(year = 1988:1990,
                                        grade = 6:8,
                                        ncessch = '010000200277'),
                         add_labels = TRUE)

Finally, the csv flag can be set to download the full .csv data frame. In general, the csv functionality is much faster when retrieving the full data frame (or a large subset) and much slower when retrieving a small subset of a data frame (especially ones with a lot of filters added). In this example, the full csv for 2008 must be downloaded and then subset to the 96 observations.

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'),
                         filters = list(year = 1988:1990,
                                        grade = 6:8,
                                        ncessch = '010000200277'),
                         add_labels = TRUE,
                         csv = TRUE)