There a two common formats for representing longitudinal (time dependent) student assessment data: WIDE and LONG format. For WIDE format data, each case/row represents a unique student and columns represent variables associated with the student at different times. For LONG format data, time dependent data for the student is spread out across multiple rows in the data set. The SGPdata package, installed when one installs the SGP package, includes exemplar WIDE and LONG data sets (sgpData and sgpData_LONG, respectively) to assist in setting up your data.
Deciding whether to format in WIDE or LONG format is driven by many
conditions. In terms of the analyses that can be performed using the SGP
package, the WIDE data format is used by the lower level functions
studentGrowthPercentiles
and
studentGrowthProjections
whereas the higher level wrapper
functions utilize the LONG data format. For all but the simplest,
one-off, analyses, you’re likely better off formatting your data in the
LONG format and using the higher level functions. This is particularly
true is you plan on running SGP analyses operationally year after year
where LONG data has numerous preparation and storage benefits over WIDE
data.
Longitudinal data in WIDE format is usually the most “intuitive” longitudinal format for those new to longitudinal/time-dependent data. Each row of the data set provides all the data for the individual case with the variable names indicating what time period the data is from. Though intuitive, the data is often difficult to work with, particularly in situations where data is frequently added to the
The data set sgpData
is an anonymized, panel data set
comprisong 5 years of annual, vertically scaled, assessment data in WIDE
format. This exemplar data set models the format for data used with the
lower level studentGrowthPercentiles
and studentGrowthProjections
functions.
> head(sgpData)
Key: <ID>
ID GRADE_2020 GRADE_2021 GRADE_2022 GRADE_2023 GRADE_2024 SS_2020 SS_2021 SS_2022 SS_2023 SS_2024
<int> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: 1000185 NA NA NA NA 7 NA NA NA NA 520
2: 1000486 3 4 5 6 7 524 548 607 592 656
3: 1000710 8 NA NA NA NA 713 NA NA NA NA
4: 1000715 NA NA 4 5 6 NA NA 469 492 551
5: 1000803 NA 5 NA NA NA NA 558 NA NA NA
6: 1000957 5 6 7 8 NA 651 660 666 663 NA
The Wide data format illustrated by sgpData
and utilized
by the SGP package can accomodate any number of occurrences but must
follow a specific column order. Variable names are irrelevant, position
in the data set is what’s important:
In sgpData
above, the first column, ID,
provides the unique student identifier. The next 5 columns,
GRADE_2013, GRADE_2014, GRADE_2015,
GRADE_2016, and GRADE_2017, provide the grade level of
the student assessment score in each of the 5 years. The last 5 columns,
SS_2013, SS_2014, SS_2015, SS_2016,
and SS_2017, provide the scale scores associated with the
student in each of the 5 years. In most cases the student does not have
5 years of test data so the data shows the missing value (NA).
Using wide-format data like sgpData
with the SGP package
is, in general, straight forward.
> sgp_g4 <- studentGrowthPercentiles(
+ panel.data=sgpData,
+ sgp.labels=list(my.year=2015, my.subject="Reading"),
+ percentile.cuts=c(1,35,65,99),
+ grade.progression=c(3,4))
Please consult the SGP data analysis
vignette for more comprehensive documentation on how to use
sgpData
(and WIDE data formats in general) for SGP
analyses.
The data set sgpData_LONG
is an anonymized, panel data
set comprising 5 years of annual, vertcially scaled, assessment data in
LONG format for two content areas (ELA and Mathematics). This exemplar
data set models the format for data used with the higher level functions
abcSGP
,
prepareSGP
,
analyzeSGP
,
combineSGP
,
summarizeSGP
,
visualizeSGP
,
and outputSGP
> head(sgpData_LONG)
VALID_CASE CONTENT_AREA YEAR ID LAST_NAME FIRST_NAME GRADE SCALE_SCORE ACHIEVEMENT_LEVEL GENDER ETHNICITY FREE_REDUCED_LUNCH_STATUS ELL_STATUS IEP_STATUS GIFTED_AND_TALENTED_PROGRAM_STATUS SCHOOL_NUMBER SCHOOL_NAME EMH_LEVEL DISTRICT_NUMBER DISTRICT_NAME SCHOOL_ENROLLMENT_STATUS DISTRICT_ENROLLMENT_STATUS STATE_ENROLLMENT_STATUS
<char> <char> <char> <char> <fctr> <fctr> <char> <num> <char> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <int> <fctr> <fctr> <int> <fctr> <fctr> <fctr> <fctr>
1: VALID_CASE MATHEMATICS 2021_2022 1000372 Daniels Corey 3 435 Proficient Gender: Male Hispanic Free Reduced Lunch: Yes ELL: Yes IEP: No Gifted and Talented Program: No 1851 Silk-Royal Elementary School Elementary 470 Apple Valley School District Enrolled School: Yes Enrolled District: Yes Enrolled State: Yes
2: VALID_CASE MATHEMATICS 2022_2023 1000372 Daniels Corey 4 461 Proficient Gender: Male Hispanic Free Reduced Lunch: Yes ELL: Yes IEP: No Gifted and Talented Program: No 1851 Silk-Royal Elementary School Elementary 470 Apple Valley School District Enrolled School: Yes Enrolled District: Yes Enrolled State: Yes
3: VALID_CASE MATHEMATICS 2023_2024 1000372 Daniels Corey 5 444 Partially Proficient Gender: Male Hispanic Free Reduced Lunch: Yes ELL: Yes IEP: No Gifted and Talented Program: No 1851 Silk-Royal Elementary School Elementary 470 Apple Valley School District Enrolled School: Yes Enrolled District: Yes Enrolled State: Yes
4: VALID_CASE READING 2021_2022 1000372 Daniels Corey 3 523 Partially Proficient Gender: Male Hispanic Free Reduced Lunch: Yes ELL: Yes IEP: No Gifted and Talented Program: No 1851 Silk-Royal Elementary School Elementary 470 Apple Valley School District Enrolled School: Yes Enrolled District: Yes Enrolled State: Yes
5: VALID_CASE READING 2022_2023 1000372 Daniels Corey 4 540 Partially Proficient Gender: Male Hispanic Free Reduced Lunch: Yes ELL: Yes IEP: No Gifted and Talented Program: No 1851 Silk-Royal Elementary School Elementary 470 Apple Valley School District Enrolled School: Yes Enrolled District: Yes Enrolled State: Yes
6: VALID_CASE READING 2023_2024 1000372 Daniels Corey 5 473 Unsatisfactory Gender: Male Hispanic Free Reduced Lunch: Yes ELL: Yes IEP: No Gifted and Talented Program: No 1851 Silk-Royal Elementary School Elementary 470 Apple Valley School District Enrolled School: Yes Enrolled District: Yes Enrolled State: Yes
We recommend LONG formated data for use with operational analyses.
Managing data in long format is more simple than data in the wide
format. For example, when updating analyses with another year of data,
the data is appended onto the bottom of the currently existing long data
set. All higher level functions in the SGP package are designed for use
with LONG format data. In addition, these functions often assume the
existence of state specific meta-data in the embedded SGPstateData
meta-data. See the SGP
package documentation for more comprehensive documentation on how to
use sgpData
for SGP calculations.
There are 7 required variables when using LONG data with SGP
analyses: VALID_CASE
, CONTENT_AREA
,
YEAR
, ID
, SCALE_SCORE
,
GRADE
and ACHIEVEMENT_LEVEL
(on required if
running student growth projections). LAST_NAME
and
FIRST_NAME
are required if creating individual level
student growth and achievement plots. All other variables are
demographic/student categorization variables used for creating student
aggregates by the summarizeSGP
function.
The sgpData_LONG
data set contains data for 5 years
across 2 content areas (ELA and Mathematics)
The data set sgptData_LONG
is an anonymized, panel data
set comprising 8 windows (3 windows annually) of assessment data in LONG
format for 3 content areas (Early Literacy, Mathematics, and Reading).
This data set is similar to the sgpData_LONG
data set
without the demographic variables and with an additional
DATE
variable indicating the date associated with the
student assessment record.
> head(sgptData_LONG)
Key: <VALID_CASE, CONTENT_AREA, YEAR, ID, GRADE>
VALID_CASE CONTENT_AREA YEAR ID GRADE DATE SCALE_SCORE SCALE_SCORE_RASCH COUNTRY STATE SEM ACHIEVEMENT_LEVEL
<char> <char> <char> <char> <char> <Date> <num> <num> <char> <char> <num> <char>
1: VALID_CASE EARLY_LITERACY 2014_2015.2 ANON_130 K.2 2015-01-14 622 0.3449 US OH 55 <NA>
2: VALID_CASE EARLY_LITERACY 2014_2015.2 ANON_1314 1.2 2015-01-08 500 -0.6556 US NJ 49 <NA>
3: VALID_CASE EARLY_LITERACY 2014_2015.2 ANON_133 K.2 2015-01-17 566 -0.1010 US OH 57 <NA>
4: VALID_CASE EARLY_LITERACY 2014_2015.2 ANON_1429 2.2 2015-03-12 621 0.3368 US WI 58 <NA>
5: VALID_CASE EARLY_LITERACY 2014_2015.2 ANON_1498 K.2 2015-01-09 577 -0.0129 US IL 57 <NA>
6: VALID_CASE EARLY_LITERACY 2014_2015.2 ANON_1533 K.2 2015-01-23 443 -1.2131 US IL 38 <NA>
The data set sgpData_INSTRUCTOR_NUMBER
is an anonymized,
student-instructor lookup table that provides insturctor information
associated with each students test record. Note that just as each
teacher can (and will) have more than 1 student associated with them, a
student can have more than one teacher associated with their test
record. That is, multiple teachers could be assigned to the student in a
single content area for a given year.
> head(sgpData_INSTRUCTOR_NUMBER)
ID CONTENT_AREA YEAR INSTRUCTOR_NUMBER INSTRUCTOR_LAST_NAME INSTRUCTOR_FIRST_NAME INSTRUCTOR_WEIGHT INSTRUCTOR_ENROLLMENT_STATUS
<char> <char> <char> <char> <fctr> <fctr> <num> <fctr>
1: 1000372 MATHEMATICS 2020_2021 185103004 Kang Alexis 1.0 Enrolled Instructor: Yes
2: 1000372 MATHEMATICS 2021_2022 185104002 Mills Karl 1.0 Enrolled Instructor: Yes
3: 1000372 MATHEMATICS 2022_2023 185105002 Intavong Michael 0.2 Enrolled Instructor: Yes
4: 1000372 MATHEMATICS 2022_2023 185105004 Price Angel 0.8 Enrolled Instructor: Yes
5: 1000372 READING 2020_2021 185103003 Mccord Guadalupe 1.0 Enrolled Instructor: Yes
6: 1000372 READING 2021_2022 185104001 Rivera Kailynn 0.7 Enrolled Instructor: Yes
If you have a contribution or topic request for this vignette, don’t hesitate to write or set up an issue on GitHub.