This vignette contains detailed information regarding the automatic and manual cleanup of blinks and artifacts in the pupil size data. The package contains functions which are meant to provide cleaning that is reproducible and that can be done across multiple sessions.
There are two functions (clean_blink
and clean_artifact
) for detecting and removing artifacts related to blinks and other eye movement. The functions, which can be used independently or sequentially, implement differential and distributional detection algorithms (explained in more detail below). The detected data points are then marked with NA. In both cases, the algorithm can be adjusted using parameters which control the scope and sensitivity of the detection. Note that the two functions can either be used in conjunction with one another sequentially, or completely independently.
The example below shows how to use them sequentially
Here will we will focus on Event 16892.8, seen in the figure below.
There is one marked blink (the first) and one un-marked artifact (the second).
The first function clean_blink
operates only on blinks marked by the SR eyetracking software. It first identifies the marked blinks and adds padding around them to create a marked time window within which data may be removed.
This padding is given in BlinkPadding
specifying the number of milliseconds to pad on either side of the blink. Within this padded window the data are examined in two passes.
The first pass calculates a difference in pupil size between subsequent data points. If the difference is larger than the value specified in Delta
, the data point is marked for removal. Note that if Delta
is not specified, the function will attempt to estimate a reasonable value based on the 95th percentile of differences in the data.
The second pass attempts to identify remaining data points or islands of data points (small runs surrounded by NAs) which remain in the window. This is done using MaxValueRun
and NAsAroundRun
. MaxValueRun
controls the longest run (i.e., sequence) of data points flanked by NAs that could be marked for removal NAsAroundRun
controls the number of NAs required on either side of the run in order for it to be removed. The argument LogFile
specifies the path and name of the file into which information will be saved about cleaning status. Please note, for the purposes of this vignette we are saving the file into the vignette’s temporary folder; however, users will likely want to save the file into their working directory by simply specifying LogFile = "BlinkCleanupLog.rds"
.
datblink <- clean_blink(dat, BlinkPadding = c(100, 100), Delta = 5,
MaxValueRun = 5, NAsAroundRun = c(2,2),
LogFile = paste0(tempdir(),"/BlinkCleanupLog.rds"))
## Running clean-up based on Eyelink marked blinks.
##
## Writing cleanup info to /tmp/RtmpBoEzMG/BlinkCleanupLog.rds.
Looking again at Event 16892.8, we can see that the function clean_blink
successfully cleaned the marked blink using the default values. The removed data points are marked in red.
Because clean_blink
is an automatic cleaning function, users may want to verify the effect of the cleaning. This may be helpful in determining if cleaning was effective, or to selective revert events for which cleaning was overly aggressive. The function verify_cleanup_app
opens and interactive app and loads the desired log file. The user can quickly scan through the events, and if desired, can click the Revert Event Cleanup
button to revert the event back to its most recent previous state (prior to running the cleanup function). Upon clicking the revert button, the status of the event in the log file will be changed and the file will be rewritten into the working directory.
Importantly, this app only modifies the entry in the log file, not the pupil size data in the data frame. This is to provide both a record of changes as well as control over the processing. In order to carry out any modifications to the cleanup, it is necessary to use the function apply_cleanup_change
. As the changes are based solely on the log file, the specific filename must be provided.
## Loading file /tmp/RtmpBoEzMG/BlinkCleanupLog.rds
## Changing cleanup based on /tmp/RtmpBoEzMG/BlinkCleanupLog.rds
Not all artifacts related to blinks are automatically detected by the SR eyetracking software, see figure above. To detect these unmarked blinks and other artifacts, a second function, clean_artifact
, is provided.
It implements a distributional method (described in more detail below) to detect potentially extreme data points.
This algorithm first divides the times series into windows, the size of which is specified in milliseconds using MADWindow
. Within each window the median absolute deviation (MAD) of the pupil size data is calculated. This is used to detect which windows contain extreme variability (potentially containing outliers). This is determined based on the value provided in MADConstant
, which controls the sensitivity threshold. The higher the constant the more extreme value is needed to trigger cleaning.
Next the identified extreme windows have padding added around them using MADPadding
(again in milliseconds). Within this padded window, a multidimensional distributional distance (specifically Mahalanobis distance) is calculated. This distance can be calculated using one of two methods: Basic or Robust.
The Basic method uses the standard Mahalanobis distance and the Robust uses a robust version of the Mahalanobis distance. The latter is based on Minimum Covariance Determinant (as implemented in the package robustbase), which uses a sampling method for determining multivariate location and scatter. Both the basic and robust calculations are based on multiple variables covarying with pupil size. By default, the calcuation uses the following columns: Pupil
, Velocity_Y
, and Acceleration_Y
. However, the parameter XandY
can be set to TRUE in which case the calculation will additionally include the X-axis: Velocity_X
and Acceleration_X
. This is intended to capture potential horizontal eye movement affecting pupil size. N.B., missing values are automatically excluded from the distance calculation.
The function will inform the user if a particular window is skipped as there are safeguards built in which will skip a given window if: 1) there are not enough data points or 2) there are not enough columns with non-zero data to estimate covariance. To determine whether a given pupil size is extreme, the argument MahaConstant
is used to set the sensitivity. The default value of the parameter is 2 (standard deviations). The higher the constant, the more extreme value of the parameter is needed to trigger cleaning.
Lastly, this function can optionally perform a second pass (setting Second
to TRUE), which is identical to the second pass in clean_blink
. This attempts to identify remaining data points or islands of data points (small runs surrounded by NAs) which remain.
The arguments MaxValueRun
and NAsAroundRun
are identical in function and meaning. As with clean_blink
, the argument LogFile
specifies the path and name of the file into which information about the cleaning status is written. Again, please note that for the purposes of this vignette we are saving the file into the vignette’s temporary folder; however, users will likely want to save the file into their working directory by simply specifying LogFile = "ArtifactCleanupLog.rds"
.
datart <- clean_artifact(datblink, MADWindow = 100, MADConstant = 2,
MADPadding = c(200, 200), MahaConstant = 2,
Method = "Robust", XandY = TRUE, Second = T,
MaxValueRun = 5, NAsAroundRun = c(2,2),
LogFile = paste0(tempdir(),"/ArtifactCleanupLog.rds"))
##
## Running cleanup based on MAD (median absolute deviation) and robust Mahalanobis distance.
##
## Writing cleanup info to /tmp/RtmpBoEzMG/ArtifactCleanupLog.rds.
Looking, yet again, at Event 16892.8, we can see that the function clean_artifact
successfully detected and partially cleaned the un-marked artifact, using the default values. Below, we will describe in detail how to manually clean the remainder of the artifact using the functionality provided in the package.
Again, because clean_artifact
is an automatic cleaning function, users may want to verify the effect of the cleaning. It is possible that this automated cleaning procedure, depending on the parameters specified, can remove more or less data points that appear to be “good”. This is part and parcel of automatic detection and cleaning. However, the algorithm is designed to detect extreme values based on the data in as targeted a way as possible. Based on our experience and testing, we have set default values which perform well in most scenarios.
The function verify_cleanup_app
can be used to scan through the events, and if desired, revert an event back to its most recent previous state (in this case to the state of the data in data frame dat4a
, i.e., prior to having run clean_artifact
, but after having run clean_blink
).
Again, any changes made using verify_cleanup_app
must be applied to the data using apply_cleanup_change
and specifying “ArtifactCleanupLog” as the log file.
## Loading file /tmp/RtmpBoEzMG/ArtifactCleanupLog.rds
## Changing cleanup based on /tmp/RtmpBoEzMG/ArtifactCleanupLog.rds
In order to help evaluate the results of the cleanup, we provide a data visualization tool that explicitly displays the difference in the pupil data before and after carrying out the automatic cleanup. The function plot_compare_app
opens an interactive Shiny app for viewing the results of the cleanup. It plots each event and shows which data points are now different. Additionally, it states how much of the data (as a percentage) is missing, i.e., was removed by the cleaning procedure.
Alternatively, the function compare_summary
produces a summary output of the comparison by Event. The data can be returned by setting the argument ReturnData = TRUE.
## There are 4 events with differences between Pupil and Pupil_Previous.
## Set ReturnData to TRUE to output full information.
Automatic cleanup may not capture and clean all artifacts. Thus a manual cleanup function (user_cleanup_app
) is provided. This function opens an interactive Shiny app for viewing Events and specifying which data points to remove. Data can be removed either by specifying a point in time (i.e., removing one specific data point) or a range of time (i.e., removing a sequence of data points), or any combination of these. For example, type in 1550
to remove a data point at time 1550 ms; type in 1600:1700
to remove all data points between 1600 ms and 1700 ms (inclusive); or type in 1550, 1600:1700
to remove the data point at 1550 ms and all data points between 1600 ms and 1700 ms (inclusive).
The user-specified data points are saved into a log file. The path and filename are specified in LogFile
. Again, please note that for the purposes of this vignette we are saving the file into the vignette’s temporary folder; however, users will likely want to save the file into their working directory by simply specifying LogFile = "UserCleanupLog.rds"
.
This allows the user to clean part of the data in one session, and return to cleaning at later point. The function will read the log file from the working directory the next time the app is opened. Additionally, this log file ensures that the manual preprocessing step can be repeated if necessary as long as the log file exists. In the example below we will finish cleaning Event 16892.8.
Here is a brief example of how the Shiny app works.
1835:1995
. The selected data points will be highlighted in red in the middle panel (“Preview”).Note that, while the example above uses the cleanup app to further clean the data already processed with the automatic cleaning functions, the app can be used completely independently (i.e., without first doing automatic cleanup) if the user wishes to manually clean all events.
Once manual cleanup is done, the user must apply the cleanup to the data based on the contents of the log file. This is done with the function apply_user_cleanup
. Note that it is also possible to visualize the results of the applied manual cleanup using plot_compare_app
.
## Loading file /tmp/RtmpBoEzMG/UserCleanupLog.rds
## Applying cleanup based on /tmp/RtmpBoEzMG/UserCleanupLog.rds
Looking, one last time, at Event 16892.8, we can see that both the marked blink and the un-marked artifact are now both fully cleaned.
At this point it is possible to proceed with preprocessing as usual. Please refer back to the Basic Preprocessing vignette and continue by removing events with sparse data.