Basic Functions

To begin, we’ll load foqat and show three datasets in foqat:
aqi is a dataset about time series of air quality with 1-second resolution.
voc is a dataset about time series of volatile organic compounds with 1-hour resolution.
met is a dataset about time series of meterological conditions with 1-hour resolution.

library(foqat)
head(aqi)
#>                  Time        NO     NO2       CO SO2      O3
#> 1 2017-05-01 01:00:00 0.0376578 2.79326 0.256900  NA 56.5088
#> 2 2017-05-01 01:01:00 0.0341483 2.76094 0.254692  NA 57.0546
#> 3 2017-05-01 01:02:00 0.0310285 2.65239 0.265178  NA 57.6654
#> 4 2017-05-01 01:03:00 0.0357016 2.60257 0.269691  NA 58.7863
#> 5 2017-05-01 01:04:00 0.0337507 2.59527 0.273395  NA 59.0342
#> 6 2017-05-01 01:05:00 0.0238120 2.57260 0.276464  NA 59.2240
head(voc)
#>                  Time Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1 2020-05-01 00:00:00     0.233    0.1750    0.544          0.020      0.1020
#> 2 2020-05-01 01:00:00     0.376    0.2025    0.704          0.028      0.1045
#> 3 2020-05-01 02:00:00     0.519    0.2300    0.864          0.036      0.1070
#> 4 2020-05-01 03:00:00     0.805    0.2850    1.184          0.052      0.1120
#> 5 2020-05-01 04:00:00     0.658    0.2920    1.304          0.075      0.1230
#> 6 2020-05-01 05:00:00     0.538    0.3700    0.904          0.049      0.1110
head(met)
#>                  Time  TEM  HUM  WS WD
#> 1 2017-05-01 00:00:00 21.4 87.0 3.0 39
#> 2 2017-05-01 00:05:00 21.2 86.7 3.6 68
#> 3 2017-05-01 00:10:00 21.0 86.3 3.5 76
#> 4 2017-05-01 00:15:00 20.9 85.8 3.4 73
#> 5 2017-05-01 00:20:00 20.8 86.0 2.8 68
#> 6 2017-05-01 00:25:00 20.8 86.0 2.3 68

Summary time series

The statdf() allows you to statistics time series:

statdf(aqi)
#>      mean    sd   min   25%   50%   75%    max integrity
#> NO   0.33  0.61 -0.08  0.08  0.13  0.37  19.02     0.765
#> NO2  3.06  2.68 -0.15  1.07  2.21  4.13  20.53     0.786
#> CO   0.30  0.09  0.17  0.25  0.27  0.34   0.73     0.709
#> SO2  1.80  2.76 -0.15  0.25  0.97  2.11  34.08     0.734
#> O3  52.86 19.53  7.95 38.89 49.50 64.38 106.61     0.783

Resample time series

We can resample time series by using trs().
You can use bkip to set a new time resolution.
The time series can be clipped by using st (start time) and et (end time).
The default function of resampling is mean. The wind data is acceptable by setting wind to TRUE and specifying coliws (the column index of the wind speed) and coliwd (the column index of the wind speed).

new_met=trs(met, bkip = "1 hour", st = "2017-05-01 01:00:00", wind = TRUE, coliws = 4, coliwd = 5)
#> Joining with `by = join_by(temp_datetime)`
head(new_met)
#>                  Time      TEM      HUM       WS       WD
#> 1 2017-05-01 01:00:00 21.18333 83.15833 4.555427 72.52891
#> 2 2017-05-01 02:00:00 21.54167 77.62500 4.238292 72.02753
#> 3 2017-05-01 03:00:00 20.71667 80.22500 5.287611 82.34847
#> 4 2017-05-01 04:00:00 20.52500 79.80000 5.653918 89.15400
#> 5 2017-05-01 05:00:00 21.12500 61.41667 7.417430 98.62400
#> 6 2017-05-01 06:00:00 21.30000 51.44167 8.401939 89.26818

You can also change the default function of resampling to sum, median, min, max, sd, quantile. If you choose quantile, you will also need to fill probs (e.g., 0.5).

Calculate the variation of time series

svri() helps you compute the variation of time series (e.g. calculate the max value of all values grouped by hours of day).

The parameters of bkip, st, et, fun is same as trs. The wind data is acceptable just like trs().

mode allows you to choose modes of calculation, value is the sub parameter of mode.There have three modes: recipes, ncycle, custom which will be introduced below:

mode = recipes

recipes stands for built-in solutions.
The mode recipes corresponds to three values: day, week, month. day means the time series will group by hours from 0 to 23.
week means the time series will group by hours from 1 to 7.
month means the time series will group by hours from 1 to 31. Below is an example which calculate the median values for time series group by hour (e.g., 0:00, 1:00 …).

new_voc=svri(voc, bkip="1 hour", mode="recipes", value="day", fun="median")
#> Joining with `by = join_by(temp_datetime)`
head(new_voc)
#>   hour of day Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1           0     0.461    0.3555    0.583          0.051      0.1020
#> 2           1     0.581    0.3710    0.704          0.048      0.1045
#> 3           2     0.583    0.4020    0.864          0.041      0.1120
#> 4           3     0.805    0.4530    1.184          0.052      0.1220
#> 5           4     0.658    0.4180    1.304          0.075      0.1230
#> 6           5     0.572    0.5620    0.923          0.049      0.1210

mode = ncycle

ncycle stands for grouping time series by the order number of each row in each cycle.
Below is an example which calculate the median values for time series group by hour (e.g., 0:00, 1:00 …).

new_voc=svri(voc, bkip="1 hour", st="2020-05-01 00:00:00", mode="ncycle", value=24, fun="median")
#> Joining with `by = join_by(temp_datetime)`
head(new_voc)
#>   cycle Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1     0     0.461    0.3555    0.583          0.051      0.1020
#> 2     1     0.581    0.3710    0.704          0.048      0.1045
#> 3     2     0.583    0.4020    0.864          0.041      0.1120
#> 4     3     0.805    0.4530    1.184          0.052      0.1220
#> 5     4     0.658    0.4180    1.304          0.075      0.1230
#> 6     5     0.572    0.5620    0.923          0.049      0.1210

mode = custom

custom stands for grouping time series by a reference column in time serires. If you select mode = custom, value stands for the column index of the reference column. Below is an example which calculate the median values for time series group by hour (e.g., 0:00, 1:00 …).

#add a new column stands for hour.
voc$hour=lubridate::hour(voc$Time)
#calculate according to the index of reference column.
new_voc=svri(voc, bkip = "1 hour", mode="custom", value=7, fun="median")
head(new_voc[,-2])
#>   custom cycle Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1            0     0.461    0.3555    0.583          0.051      0.1020
#> 2            1     0.581    0.3710    0.704          0.048      0.1045
#> 3            2     0.583    0.4020    0.864          0.041      0.1120
#> 4            3     0.805    0.4530    1.184          0.052      0.1220
#> 5            4     0.658    0.4180    1.304          0.075      0.1230
#> 6            5     0.572    0.5620    0.923          0.049      0.1210
#rmove voc 
rm(voc)

Calculate average of variation

avri() is a customized version of svri() which helps you to calculate the average variation (with standard deviation) of time series.

The output is a data frame which contains both the average variations and the standard deviations. An example is a time series of 3 species. The second to the fourth column are the average variations, and the fifth to the seventh column are the standard deviations.

new_voc=avri(voc, bkip = "1 hour", st = "2020-05-01 01:00:00")
#> Joining with `by = join_by(temp_datetime)`
head(new_voc)
#>   hour of day Propylene_ave Acetylene_ave n.Butane_ave trans.2.Butene_ave
#> 1           0      0.735375       0.48525       1.1655             0.0695
#> 2           1      0.737650       0.39920       1.0683             0.0575
#> 3           2      0.831800       0.37320       1.1748             0.0534
#> 4           3      1.420300       0.38060       2.2370             0.0910
#> 5           4      1.051800       0.42100       1.8614             0.0664
#> 6           5      1.133200       0.59140       1.8872             0.0604
#>   Cyclohexane_ave Propylene_sd Acetylene_sd n.Butane_sd trans.2.Butene_sd
#> 1          0.0910    0.5876756    0.2137766   1.1091851        0.04648835
#> 2          0.1034    0.5677007    0.2099156   0.9505957        0.03992493
#> 3          0.1098    0.6452141    0.1634861   0.8482224        0.02864961
#> 4          0.1210    1.5527906    0.1721926   2.1616755        0.06951619
#> 5          0.1392    0.8127953    0.2090957   1.4807749        0.03008820
#> 6          0.1652    0.8916562    0.2569549   1.6192165        0.02475480
#>   Cyclohexane_sd
#> 1     0.02184414
#> 2     0.02442181
#> 3     0.02115892
#> 4     0.02736786
#> 5     0.06293012
#> 6     0.10819057

Convert time series into proportion time series

prop() helps you convert time series into proportion time series (e.g., convert a time series of concentrations of species into a time series of contributions of species).

prop_voc=prop(voc)
head(prop_voc)
#>                  Time Propylene Acetylene  n.Butane trans.2.Butene Cyclohexane
#> 1 2020-05-01 00:00:00 0.2169460 0.1629423 0.5065177     0.01862197  0.09497207
#> 2 2020-05-01 01:00:00 0.2657244 0.1431095 0.4975265     0.01978799  0.07385159
#> 3 2020-05-01 02:00:00 0.2955581 0.1309795 0.4920273     0.02050114  0.06093394
#> 4 2020-05-01 03:00:00 0.3301887 0.1168991 0.4856440     0.02132896  0.04593929
#> 5 2020-05-01 04:00:00 0.2683524 0.1190865 0.5318108     0.03058728  0.05016313
#> 6 2020-05-01 05:00:00 0.2728195 0.1876268 0.4584178     0.02484787  0.05628803

Analysis of linear regression for time series in batch

anylm() allows you to analyze linear regression for time series in batch.
xd are the index of columns you want to put in x axis (independent variables).
yd are the index of columns you want to put in y axis (dependent variables).
zd are the index of columns you want to put as color scales. td are the index of columns you want to use as a basis for grouping.

A simple example is demonstrated below to illustrate the functionality.
This example explores the correlation of the built-in dataset aqi. Grouped by day, it explores the correlation of O3 with NO and NO2 for each day. and explores the effect of CO on correlations using CO as the fill color.

df=data.frame(aqi,day=day(lubridate::aqi$Time))
lr_result=anylm(df, xd=c(2,3), yd=6, zd=4, td=7,dign=3)
View(lr_result)