The heatwaveR
package was designed to include many different methods for the creation of climatologies and thresholds for detecting extremes events (e.g. heatwaves and cold-spells) in time series data. To this end we made a very large change in the event detection pipeline when we moved from the RmarineHeatWaves
package to heatwaveR
. This change may primarily be seen with the inclusion of the ts2clm()
function and the removal of the climatology performed in RmarineHeatWaves::detect()
in favour of detect_event()
, which does not calculate climatologies. In this way we have allowed for the introduction of a multitude of more complex climatology calculation and event detection/filtering methods. It is our overarching goal to provide one package that allows climate scientists to calculate these events in both the atmosphere and oceans. But rather than talking about it, let’s walk through some case studies on how this package can be used for diverse applications.
Brought to our attention by Mr. Haouari from the Hydrometeorological Institute of Training and Research (IHFR) in Algeria was the concept of using a flat (e.g. 19\(^\circ\)C) tMin
bottom boundary when calculating events from tMax
with the standard 90th percentile upper threshold. As the authors of the heatwaveR
package are admittedly marine oriented, we tend to work with daily time series that have only one mean value per day (e.g. tMean
, temp
, sst
). This is why there are not arguments in the heatwaveR
suite of functions that call on tMin
and tMax
explicitly, but that does not mean that one cannot do so. Below we will work through the steps one would take to calculate (atmospheric) heatwaves, as per their definition in Perkins and Alexander (2013) (but excluding the calculation of EHF), and with the additional step proposed by Mr. Haouari.
The following sub-sections will show the step-by-step process one may use to calculate atmospheric heatwaves using a 90th percentile threshold created from the tMax
time series for a location, but will only use the days when the corresponding tMin
also exceeds a pre-determined flat bottom boundary on the same days to quantify the heatwave metrics. We will finish by visualising the results with the built-in heatwaveR
graphing functions of event_line()
and lolli_plot
as well as a bubble plot. The data we will use for these examples will be a 45 year time series of daily atmospheric tMin
and tMax
temperatures over Algiers, Algeria contributed by Mr. Haouari.
The first step with any analysis in R should be the loading of the packages to be used.
library(dplyr)
library(ggpubr)
library(heatwaveR)
With our libraries loaded, we will now go about explicitly calling the Algiers
data into our environment. These data are automatically loaded for us when we load the heatwaveR
library, but we perform this step here just for clarity. Anyone following along should feel free to use whatever data they would like. As long as the data have a date (t
), tMin
, and tMax
column this code will function as designed.
<- Algiers Algiers
With our libraries and data loaded, we will now calculate the two thresholds we need to correctly detect the heatwaves and accurately quantify their metrics. The first is the 90th percentile threshold based on the tMax
time series. The second is the flat exceedance of 19\(^\circ\)C based on the tMin
data. We use 19\(^\circ\)C as the bottom threshold as this is roughly the mean tMin
for summer temperatures.
# The tMax threshold
# The current WMO standard climatology period is 1981-01-01 to 2010-12-31 and should be used when possible
# We rather use 1961-01-01 to 1990-01-01 as this is the oldest 30 year period available in the data
<- ts2clm(data = Algiers, y = tMax, climatologyPeriod = c("1961-01-01", "1990-12-31"), pctile = 90)
tMax_clim
# The tMin exceedance
# Note the use here of 'minDuration = 3' and 'maxGap = 1' as the default atmospheric arguments
# The default marine arguments are 'minDuration = 5' and 'maxGap = 2'
<- exceedance(data = Algiers, y = tMin, threshold = 19, minDuration = 3, maxGap = 1)$threshold tMin_exc
Now that the two thresholds have been calculated we use the detect_event()
function as usual, but provide the second threshold to the argument threshClim2
that would normally lay dormant.
# Note that because we calculated our 90th percentile threshold on a column named 'tMax'
# and not the default column name 'temp', we must specify this below with 'y = tMax'
<- detect_event(data = tMax_clim, y = tMax, # The 90th percentile threshold
events threshClim2 = tMin_exc$exceedance) # The flat exceedance threshold
Please note that even though the use of the second threshold does allow for the resultant event metrics to differ, the values themselves are still being calculated against the seasonal climatology and daily temperatures for the time series given to the 90th percentile threshold calculation (in this case tMax
) and so using a second threshold (in this case tMin
) won’t generally have much of an effect on the event metrics. Rather it mostly screens out smaller or larger events depending on how one chooses to set the threshold. In the case when an exceedance threshold is chosen for a temperature that would typically only occur in summer (e.g. 19\(^\circ\)C, as used here), one is also effectively screening events by season. There are many use cases where this would be desirable. For example, if one is only interested in events that would occur during a season in which night time heat stress becomes an issue for the young and elderly.
Even though we have used two thresholds to calculate our events, the results are output the same as though only one threshold (default) were used. This means that we may use the visualisation functions that come with heatwaveR
without any extra fuss.
# The code to create a bubble plot for the heatwave results
<- ggplot(data = events$event, aes(x = date_peak, y = intensity_max)) +
bubble_plot geom_point(aes(size = intensity_cumulative), shape = 21, fill = "salmon", alpha = 0.8) +
labs(x = NULL, y = "Maximum Intensity [°C] ", size = "Cumulative Intensity [°C x days]") +
scale_size_continuous(range = c(1, 10),
guide = guide_legend(title.position = "top", direction = "horizontal")) +
theme_bw() +
theme(legend.position = c(0.3, 0.12),
legend.box.background = element_rect(colour = "black"))
# Don't forget to set 'event_line(y = tMax)'
ggarrange(event_line(events, y = tMax, metric = "intensity_max"),
event_line(events, y = tMax, metric = "intensity_max", category = T),
lolli_plot(events),
bubble_plot,ncol = 2, nrow = 2, align = "hv")
Using a percentile based second threshold is not much different than using a static second threshold. Rather than using exceedance()
to get our second threshold we can use ts2clm
nested within detect_event()
. It must also be pointed out that in addition to using multiple thresholds, we can adjust the minimum duration (minDuration
) and maximum gap (maxGap
) arguments for our multiple thresholds, too. This allows us to provide different ‘flavours’ of criteria for our events. For example, let’s say we are interested in night-time events (tMin
) when the temperatures remain above the 80th percentile threshold (pctile = 80
) for 10 or more days (minDuration = 10
) without dipping below that threshold for more than 2 consecutive days (maxGap = 2
). But on top of that, we are also only interested in those parts of the event when the daytime temperatures exceed the 90th percentile threshold (pctile = 90
) for 3 or more days (minDuration = 3
) non-stop (maxGap = 0
).
Below we will look at how to detect/calculate events that meet these rather specific criteria. We will also calculate events with just the first threshold and compare the results visually. It must be noted here that whichever criteria is the most strict, in this case minDuration = 3
and maxGap = 0
, will be the predominant filter through which the event metrics are quantified.
# Note that because we are not using the standard column name 'temp' we must
# specify the chosen column name twice, once for ts2clm() and again for detect_event()
# First threshold based on tMin
<- ts2clm(data = Algiers, y = tMin, pctile = 80,
thresh_tMin climatologyPeriod = c("1961-01-01", "1990-12-31"))
# Second threshold based on tMax
# Be careful here that you put the arguments within the correct brackets
<- detect_event(ts2clm(data = Algiers, y = tMax, pctile = 90,
thresh_tMax climatologyPeriod = c("1961-01-01", "1990-12-31")),
# These arguments are passed to detect_event(), not ts2clm()
minDuration = 3, maxGap = 0, y = tMax, protoEvents = T)
# Detect/calculate events using the two precalculated thresholds
# Because detect_event() is not able to deduce which arguments we used above,
# we must again tell it explicitly here
<- detect_event(data = thresh_tMin, y = tMin, minDuration = 10, maxGap = 2,
events_two_thresh threshClim2 = thresh_tMax$event, minDuration2 = 3, maxGap2 = 0)
# Or to simply use one threshold
<- detect_event(data = thresh_tMin, y = tMin, minDuration = 10, maxGap = 2) events_one_thresh
Here are the differences in lolliplot format:
ggarrange(lolli_plot(events_one_thresh), lolli_plot(events_two_thresh), labels = c("One threshold", "Two thresholds"))
Here is a brief display of the top few events from each method:
head(events_one_thresh$event)
## # A tibble: 5 × 22
## event_no index_start index_peak index_end duration date_start date_peak
## <int> <int> <int> <int> <dbl> <date> <date>
## 1 1 6135 6137 6144 10 1977-10-18 1977-10-20
## 2 2 10844 10845 10856 13 1990-09-09 1990-09-10
## 3 3 11191 11200 11201 11 1991-08-22 1991-08-31
## 4 4 13292 13300 13301 10 1997-05-23 1997-05-31
## 5 5 15534 15542 15546 13 2003-07-13 2003-07-21
## # … with 15 more variables: date_end <date>, intensity_mean <dbl>,
## # intensity_max <dbl>, intensity_var <dbl>, intensity_cumulative <dbl>,
## # intensity_mean_relThresh <dbl>, intensity_max_relThresh <dbl>,
## # intensity_var_relThresh <dbl>, intensity_cumulative_relThresh <dbl>,
## # intensity_mean_abs <dbl>, intensity_max_abs <dbl>, intensity_var_abs <dbl>,
## # intensity_cumulative_abs <dbl>, rate_onset <dbl>, rate_decline <dbl>
head(events_two_thresh$event)
## # A tibble: 5 × 22
## event_no index_start index_peak index_end duration date_start date_peak
## <int> <int> <int> <int> <dbl> <date> <date>
## 1 1 6135 6137 6137 3 1977-10-18 1977-10-20
## 2 2 10851 10851 10856 6 1990-09-16 1990-09-16
## 3 3 11191 11200 11201 11 1991-08-22 1991-08-31
## 4 4 13292 13294 13294 3 1997-05-23 1997-05-25
## 5 5 15540 15542 15545 6 2003-07-19 2003-07-21
## # … with 15 more variables: date_end <date>, intensity_mean <dbl>,
## # intensity_max <dbl>, intensity_var <dbl>, intensity_cumulative <dbl>,
## # intensity_mean_relThresh <dbl>, intensity_max_relThresh <dbl>,
## # intensity_var_relThresh <dbl>, intensity_cumulative_relThresh <dbl>,
## # intensity_mean_abs <dbl>, intensity_max_abs <dbl>, intensity_var_abs <dbl>,
## # intensity_cumulative_abs <dbl>, rate_onset <dbl>, rate_decline <dbl>
If we look at these results we see that the use of two thresholds did not detected fewer events than the use of one threshold. This is because even though the second threshold was much more ‘difficult’ for the time series to surpass than the first, the heatwaves in the time series are so pronounced that they emerge regardless. This method allows for a lot of flexibility, but users should also be cautious that they understand what exactly they are asking their machines to do. In the case above, it may be that we would actually prefer to calculate our event metrics based entirely on the first threshold, but filter out the events that didn’t meet our second threshold criteria. We will see how to do this in the following section.
The methodology outlined below for the detection and filtering of events with two thresholds is somewhat cumbersome. A potential issue with this technique is that the multiple filters do not affect the calculation of the event metrics (e.g. intensity_cumulative
), as only the primary threshold given to detect_event()
is used when calculating event metrics. This may however be the desired case if one is still interested in knowing the cumulative intensity above the given percentile threshold, but only wants to filter the full event based on some other threshold criteria. I can imagine real-world use cases for both scenarios, which is why this seemingly less sophisticated approach is detailed below.
Because we have already calculated our single threshold events (events_one_thresh
) and our second threshold (thresh_tMax
) we may directly begin filtering the results. Before we do so, let’s pull out the list components of our results into dataframes for easier use down the line.
# Pull out each data.frame as their own object for easier use
<- events_one_thresh$event
events_one_event <- events_one_thresh$climatology events_one_climatology
This is where things may get tricky for some users, and where the default use of the functions in the heatwaveR
package ends. We are now going ‘off-road’ so to speak. But do not despair! The tidyverse
suite of packages makes data wrangling like this much more user friendly than it was in the dark days of Base R coding.
In order to make the filtering of events easier, we will combine the two different dataframes that we are using as threshold/filtering guides to chose the events that meet all of our selection criteria.
# Join the two threshold dataframes
<- left_join(events_one_climatology, thresh_tMax, by = c("t"))
two_thresh
# Remove all days that did not qualify as events in both thresholds
<- two_thresh %>%
two_thresh_filtered filter(event.x == TRUE,
== TRUE) event.y
With our filtering guide created, we may now apply it to events_one_thresh
to get our filtered results.
# Copy data with a new name
<- events_one_thresh
events_one_thresh_filtered
# Then filter
$event <- events_one_thresh_filtered$event %>%
events_one_thresh_filteredfilter(event_no %in% two_thresh_filtered$event_no.x)
# Compare results
head(events_one_thresh_filtered$event)
## # A tibble: 5 × 22
## event_no index_start index_peak index_end duration date_start date_peak
## <int> <int> <int> <int> <dbl> <date> <date>
## 1 1 6135 6137 6144 10 1977-10-18 1977-10-20
## 2 2 10844 10845 10856 13 1990-09-09 1990-09-10
## 3 3 11191 11200 11201 11 1991-08-22 1991-08-31
## 4 4 13292 13300 13301 10 1997-05-23 1997-05-31
## 5 5 15534 15542 15546 13 2003-07-13 2003-07-21
## # … with 15 more variables: date_end <date>, intensity_mean <dbl>,
## # intensity_max <dbl>, intensity_var <dbl>, intensity_cumulative <dbl>,
## # intensity_mean_relThresh <dbl>, intensity_max_relThresh <dbl>,
## # intensity_var_relThresh <dbl>, intensity_cumulative_relThresh <dbl>,
## # intensity_mean_abs <dbl>, intensity_max_abs <dbl>, intensity_var_abs <dbl>,
## # intensity_cumulative_abs <dbl>, rate_onset <dbl>, rate_decline <dbl>
head(events_two_thresh$event)
## # A tibble: 5 × 22
## event_no index_start index_peak index_end duration date_start date_peak
## <int> <int> <int> <int> <dbl> <date> <date>
## 1 1 6135 6137 6137 3 1977-10-18 1977-10-20
## 2 2 10851 10851 10856 6 1990-09-16 1990-09-16
## 3 3 11191 11200 11201 11 1991-08-22 1991-08-31
## 4 4 13292 13294 13294 3 1997-05-23 1997-05-25
## 5 5 15540 15542 15545 6 2003-07-19 2003-07-21
## # … with 15 more variables: date_end <date>, intensity_mean <dbl>,
## # intensity_max <dbl>, intensity_var <dbl>, intensity_cumulative <dbl>,
## # intensity_mean_relThresh <dbl>, intensity_max_relThresh <dbl>,
## # intensity_var_relThresh <dbl>, intensity_cumulative_relThresh <dbl>,
## # intensity_mean_abs <dbl>, intensity_max_abs <dbl>, intensity_var_abs <dbl>,
## # intensity_cumulative_abs <dbl>, rate_onset <dbl>, rate_decline <dbl>
The event numbers found in events_one_thresh_filtered
are the same as the event numbers found in events_two_thresh
with the important difference that the event metrics in events_two_thresh
were calculated only on the days that exceeded both thresholds, while the events in events_one_thresh_filtered
have had their metrics calculated from all of the days that exceeded only the first threshold.
To better understand how different the results from these two different techniques may be we will use lolliplots to visualise them.
ggarrange(lolli_plot(events_two_thresh, metric = "duration"),
lolli_plot(events_one_thresh_filtered, metric = "duration"),
labels = c("Double threshold", "Filter threshold"))
ggarrange(lolli_plot(events_two_thresh, metric = "intensity_cumulative"),
lolli_plot(events_one_thresh_filtered, metric = "intensity_cumulative"),
labels = c("Double threshold", "Filter threshold"))
One may of course visualise the outputs from the events calculated here with geom_flame()
and geom_lolli()
as well, but this will not differ from the default method of using these functions as outlined in their help files so we will not go into that here.
This vignette serves as a guideline for how to implement multiple methodologies for using two thresholds (tMin
and tMax
) with atmospheric data. We also showed in this vignette a more straight forward approach to using a second threshold through the built-in arguments in detect_event()
. The use of a second threshold in this way, whether it be based on a static threshold or one derived from a percentile, is useful for the consideration of events that may be more specifically relevant to a given season or organism.
I hope the techniques shown in this vignette will be useful both technically and theoretically. The authors of heatwaveR
are very happy to receive any further input on the development of the package as well as other potential methods for calculating heatwaves and cold-spells in air or sea.