Tools for Handling Extraction of Features from Time series (theft)
You can install the stable version of theft
from
CRAN:
install.packages("theft")
You can install the development version of theft
from
GitHub using the following:
::install_github("hendersontrent/theft") devtools
Please also check out our paper Feature-Based Time-Series
Analysis in R using the theft Package which discusses the motivation
and theoretical underpinnings of theft
and walks through
all of its functionality using the Bonn EEG dataset —
a well-studied neuroscience dataset.
theft
is a software package for R that facilitates
user-friendly access to a consistent interface for the extraction of
time-series features. The package provides a single point of access to
\(>1200\) time-series features from
a range of existing R and Python packages. The packages which
theft
‘steals’ features from currently are:
Rcatch22
for the native implementation on CRAN)Note that Kats
, tsfresh
and
TSFEL
are Python packages. theft
has built-in
functionality for helping you install these libraries—all you need to do
is install Python 3.9 on your machine. If you wish to access the Python
feature sets, please run ?install_python_pkgs
in R after
downloading theft
or consult the vignette in the package
for more information. For a comprehensive comparison of these six
feature sets across a range of domains (including computation speed,
within-set feature composition, and between-set feature correlations),
please refer to the paper An Empirical
Evaluation of Time-Series Feature Sets.
As of v0.6.1
, users can also supply their own features
to theft
(see the vignette for more information)!
The companion package theftdlc
(‘theft
downloadable content’—just like you get DLCs
and expansions for video games) contains an extensive suite of
functions for analysing, interpreting, and visualising time-series
features calculated from theft
. Collectively, these
packages are referred to as the ‘theft
ecosystem’.
A high-level overview of how the theft
ecosystem for R
is typically accessed by users is shown below. Note that prior to
v0.6.1
of, many of the theftdlc
functions were
contained in theft
but under other names. To ensure the
theft
ecosystem is as user-friendly as possible and can
scale to meet future demands, theft
has been refactored to
be just feature extraction, while theftdlc
handles all the
analysis of the extracted features. The deprecated names—such as
tsfeature_classifier()
being the outdated version of
classify()
—are also still available for now in
theftdlc
.
Many more functions and options for customisation are available within the packages and users are encouraged to explore the vignettes and helper files for more information.
theft
and theftdlc
combine to create an
intuitive and efficient tidy feature-based workflow. Here is an example
of a single code chunk that calculates features using catch22
and a custom set of mean and standard deviation, and projects the
feature space into an interpretable two-dimensional space using
principal components analysis:
library(dplyr)
library(theft)
library(theftdlc)
calculate_features(data = theft::simData,
group_var = "process",
feature_set = "catch22",
features = list("mean" = mean, "sd" = sd)) %>%
project(norm_method = "RobustSigmoid",
unit_int = TRUE,
low_dim_method = "PCA") %>%
plot()
In that example, calculate_features
comes from
theft
, while project
and the plot
generic come from theftdlc
.
Similarly, we can perform time-series classification using a similar
simple workflow to compare the performance of catch22
against our custom set of the first two moments of the distribution:
calculate_features(data = theft::simData,
group_var = "process",
feature_set = "catch22",
features = list("mean" = mean, "sd" = sd)) %>%
classify(by_set = TRUE,
n_resamples = 5,
use_null = TRUE) %>%
compare_features(by_set = TRUE,
hypothesis = "null") %>%
head()
hypothesis feature_set metric set_mean null_mean
1 All features != own null All features accuracy 0.8400000 0.1688889
2 User-supplied != own null User-supplied accuracy 0.7066667 0.1111111
3 catch22 != own null catch22 accuracy 0.7066667 0.1600000
t_statistic p.value
1 9.089132 0.0004062310
2 5.512023 0.0026431488
3 7.363817 0.0009059762
In this example, classify
and
compare_features
come from theftdlc
.
Please see the vignette for more information and the full functionality of both packages.
If you use theft
or theftdlc
in your own
work, please cite both the paper:
T. Henderson and Ben D. Fulcher. Feature-Based Time-Series Analysis in R using the theft Package. arXiv, (2022).
and the software:
To cite package 'theft' in publications use:
Henderson T (2024). _theft: Tools for Handling Extraction of Features
from Time Series_. R package version 0.6.3,
<https://CRAN.R-project.org/package=theft>.
A BibTeX entry for LaTeX users is
@Manual{,
title = {theft: Tools for Handling Extraction of Features from Time Series},
author = {Trent Henderson},
year = {2024},
note = {R package version 0.6.3},
url = {https://CRAN.R-project.org/package=theft},
}
To cite package 'theftdlc' in publications use:
Henderson T (2024). _theftdlc: Analyse and Interpret Time Series
Features_. R package version 0.1.2,
<https://CRAN.R-project.org/package=theftdlc>.
A BibTeX entry for LaTeX users is
@Manual{,
title = {theftdlc: Analyse and Interpret Time Series Features},
author = {Trent Henderson},
year = {2024},
note = {R package version 0.1.2},
url = {https://CRAN.R-project.org/package=theftdlc},
}