This vignette details the use of template-based detection in ohun.Template-based detection method is better suited for highly stereotyped sound events. As it is less affected by signal-to-noise ratio it’s more robust to higher levels of background noise:
The procedure for template-based detection is divided in three steps:
get_templates()
)template_correlator()
)template_detector()
)First, we need to install the package. It can be installed from CRAN as follows:
# From CRAN would be
install.packages("ohun")
#load package
library(ohun)
To install the latest developmental version from github you will need the R package remotes:
# install package
remotes::install_github("maRce10/ohun")
#load packages
library(ohun)
library(tuneR)
library(warbleR)
The package comes with an example reference table containing annotations of long-billed hermit hummingbird songs from two sound files (also supplied as example data: ‘lbh1’ and ‘lbh2’), which will be used in this vignette. The example data can be load and explored as follows:
# load example data
data("lbh1", "lbh2", "lbh_reference")
# save sound files
tuneR::writeWave(lbh1, file.path(tempdir(), "lbh1.wav"))
tuneR::writeWave(lbh2, file.path(tempdir(), "lbh2.wav"))
# select a subset of the data
lbh1_reference <-
lbh_reference[lbh_reference$sound.files == "lbh1.wav",]
# print data
lbh1_reference
Object of class 'selection_table'
* The output of the following call:
`[.selection_table`(X = lbh_reference, i = lbh_reference$sound.files == "lbh1.wav")
Contains:
* A selection table data frame with 10 rows and 6 columns:
| |sound.files | selec| start| end| bottom.freq| top.freq|
|:--|:-----------|-----:|------:|------:|-----------:|--------:|
|10 |lbh1.wav | 10| 0.0881| 0.2360| 1.9824| 8.4861|
|11 |lbh1.wav | 11| 0.5723| 0.7202| 2.0520| 9.5295|
|12 |lbh1.wav | 12| 1.0564| 1.1973| 2.0868| 8.4861|
|13 |lbh1.wav | 13| 1.7113| 1.8680| 1.9824| 8.5905|
|14 |lbh1.wav | 14| 2.1902| 2.3417| 2.0520| 8.5209|
|15 |lbh1.wav | 15| 2.6971| 2.8538| 1.9824| 9.2513|
... and 4 more row(s)
* A data frame (check.results) with 10 rows generated by check_sels() (as attribute)
created by warbleR < 1.1.21
We can also plot the annotations on top of the spectrograms to further explore the data (this function only plots one wave object at the time, not really useful for long files):
# print spectrogram
label_spectro(wave = lbh1, reference = lbh1_reference, hop.size = 10, ovlp = 50, flim = c(1, 10))
The function get_templates()
can help you find a
template closer to the average acoustic structure of the sound events in
a reference table. This is done by finding the sound events closer to
the centroid of the acoustic space. When the acoustic space is not
supplied (‘acoustic.space’ argument) the function estimates it by
measuring several acoustic parameters using the function spectro_analysis()
from warbleR
)
and summarizing it with Principal Component Analysis (after
z-transforming parameters). If only 1 template is required the function
returns the sound event closest to the acoustic space centroid. The
rationale here is that a sound event closest to the average sound event
structure is more likely to share structural features with most sounds
across the acoustic space than a sound event in the periphery of the
space. These ‘mean structure’ templates can be obtained as follows:
# get mean structure template
template <-
get_templates(reference = lbh1_reference, path = tempdir())
The first 2 principal components explained 0.53 of the variance
The graph above shows the overall acoustic spaces, in which the sound
closest to the space centroid is highlighted. The highlighted sound is
selected as the template and can be used to detect similar sound events.
The function get_templates()
can also select several
templates. This can be helpful when working with sounds that are just
moderately stereotyped. This is done by dividing the acoustic space into
sub-spaces defined as equal-size slices of a circle centered at the
centroid of the acoustic space:
# get 3 templates
get_templates(reference = lbh1_reference,
n.sub.spaces = 3, path = tempdir())
The first 2 principal components explained 0.53 of the variance
1 sub-space centroid(s) coincide with the overall centroid and was (were) removed
We will use the single template object (‘template’) to run a detection on the example ‘lbh1’ data:
# get correlations
correlations <-
template_correlator(templates = template,
files = "lbh1.wav",
path = tempdir())
The output is an object of class ‘template_correlations’, with its own printing method:
# print
correlations
Object of class 'template_correlations'
* The output of the following template_correlator() call:
template_correlator(templates = template, files = "lbh1.wav",
path = tempdir())
* Contains 1 correlation score vector(s) from 1 template(s):
lbh1.wav-19
... and 1 sound files(s):
lbh1.wav
* Created by ohun 1.0.2
This object can then be used to detect sound events using
template_detector()
:
# run detection
detection <-
template_detector(template.correlations = correlations, threshold = 0.7)
detection
Object of class 'selection_table'
* The output of the following call:
template_detector(template.correlations = correlations, threshold = 0.7)
Contains:
* A selection table data frame with 2 rows and 6 columns:
|sound.files | selec| start| end|template | scores|
|:-----------|-----:|------:|------:|:-----------|------:|
|lbh1.wav | 1| 0.5703| 0.7287|lbh1.wav-19 | 0.7122|
|lbh1.wav | 2| 4.1549| 4.3133|lbh1.wav-19 | 0.7189|
* A data frame (check.results) with 2 rows generated by check_sels() (as attribute)
created by warbleR 1.1.32
The output can be explored by plotting the spectrogram along with the detection and correlation scores:
# plot spectrogram
label_spectro(
wave = lbh1,
detection = detection,
template.correlation = correlations[[1]],
flim = c(0, 10),
threshold = 0.7,
hop.size = 10, ovlp = 50)
The performance can be evaluated using
diagnose_detection()
:
#diagnose
diagnose_detection(reference = lbh1_reference, detection = detection)
detections true.positives false.positives false.negatives splits merges overlap
1 2 2 0 8 0 0 0.9004082
recall precision f.score
1 0.2 1 0.3333333
The function optimize_template_detector()
allows to
evaluate the performance under different correlation thresholds:
# run optimization
optimization <-
optimize_template_detector(
template.correlations = correlations,
reference = lbh1_reference,
threshold = seq(0.1, 0.5, 0.1)
)
5 thresholds will be evaluated:
# print output
optimization
threshold templates detections true.positives false.positives false.negatives splits
1 0.1 lbh1.wav-19 99 10 89 0 16
2 0.2 lbh1.wav-19 56 10 46 0 16
3 0.3 lbh1.wav-19 25 10 15 0 13
4 0.4 lbh1.wav-19 11 10 1 0 2
5 0.5 lbh1.wav-19 10 10 0 0 0
merges overlap recall precision f.score
1 0 0.8789903 1 0.1010101 0.1834862
2 0 0.8789903 1 0.1785714 0.3030303
3 0 0.8789903 1 0.4000000 0.5714286
4 0 0.8789903 1 0.9090909 0.9523810
5 0 0.8789903 1 1.0000000 1.0000000
Additional threshold values can be evaluated without having to run it
all over again. We just need to supplied the output from the previous
run with the argument previous.output
(the same trick can
be done when optimizing an energy-based detection):
# run optimization
optimize_template_detector(
template.correlations = correlations,
reference = lbh1_reference,
threshold = c(0.6, 0.7),
previous.output = optimization
)
2 thresholds will be evaluated:
threshold templates detections true.positives false.positives false.negatives splits
1 0.1 lbh1.wav-19 99 10 89 0 16
2 0.2 lbh1.wav-19 56 10 46 0 16
3 0.3 lbh1.wav-19 25 10 15 0 13
4 0.4 lbh1.wav-19 11 10 1 0 2
5 0.5 lbh1.wav-19 10 10 0 0 0
6 0.6 lbh1.wav-19 10 10 0 0 0
7 0.7 lbh1.wav-19 2 2 0 8 0
merges overlap recall precision f.score
1 0 0.8789903 1.0 0.1010101 0.1834862
2 0 0.8789903 1.0 0.1785714 0.3030303
3 0 0.8789903 1.0 0.4000000 0.5714286
4 0 0.8789903 1.0 0.9090909 0.9523810
5 0 0.8789903 1.0 1.0000000 1.0000000
6 0 0.8789903 1.0 1.0000000 1.0000000
7 0 0.9004082 0.2 1.0000000 0.3333333
In this case 2 threshold values (0.5 and 0.6) can achieve an optimal detection.
Several templates can be used within the same call. Here we correlate two templates on the two example sound files, taking one template from each sound file:
# get correlations
correlations <-
template_correlator(
templates = lbh1_reference[c(1, 10),],
files = "lbh1.wav",
path = tempdir()
)
# run detection
detection <-
template_detector(template.correlations = correlations, threshold = 0.6)
Note that in these cases we can get the same sound event detected several times (duplicates), one by each template. We can check if that is the case just by diagnosing the detection:
#diagnose
diagnose_detection(reference = lbh1_reference, detection = detection)
detections true.positives false.positives false.negatives splits merges overlap
1 20 10 10 0 20 0 0.9306396
recall precision f.score
1 1 0.5 0.6666667
In this we got perfect recall, but low precision. That is due to the fact that the same reference events were picked up by both templates. We can actually diagnose the detection by template to see this more clearly:
#diagnose
diagnose_detection(reference = lbh1_reference, detection = detection, by = "template")
template detections true.positives false.positives false.negatives splits merges
1 lbh1.wav-10 10 10 0 0 0 0
2 lbh1.wav-19 10 10 0 0 0 0
overlap recall precision f.score
1 0.9241771 1 1 1
2 0.8789903 1 1 1
We see that independently each template achieved a good detection. We
can create a consensus detection by leaving only those with detections
with the highest correlation across templates. To do this we first need
to label each row in the detection using label_detection()
and then remove duplicates using consensus_detection()
:
# labeling detection
labeled <-
label_detection(reference = lbh_reference, detection = detection, by = "template")
Now we can filter out duplicates and diagnose the detection again,
telling the function to select a single row per duplicate using the
correlation score as a criterium (by = "scores"
, this
column is part of the template_detector()
output):
# filter
consensus <- consensus_detection(detection = labeled, by = "scores")
# diagnose
diagnose_detection(reference = lbh1_reference, detection = consensus)
detections true.positives false.positives false.negatives splits merges overlap
1 10 10 0 0 0 0 0.9046387
recall precision f.score
1 1 1 1
We successfully get rid of duplicates and detected every single target sound event.
Please cite ohun like this:
Araya-Salas, M. (2021), ohun: diagnosing and optimizing automated sound event detection. R package version 0.1.0.
Session information
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Costa_Rica
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_3.5.1 warbleR_1.1.32 NatureSounds_1.0.4 knitr_1.48
[5] seewave_2.2.3 tuneR_1.4.7 ohun_1.0.2
loaded via a namespace (and not attached):
[1] gtable_0.3.5 rjson_0.2.21 xfun_0.47 bslib_0.7.0
[5] vctrs_0.6.5 tools_4.3.2 bitops_1.0-8 generics_0.1.3
[9] parallel_4.3.2 tibble_3.2.1 proxy_0.4-27 fansi_1.0.6
[13] highr_0.11 pkgconfig_2.0.3 KernSmooth_2.23-20 checkmate_2.3.1
[17] lifecycle_1.0.4 farver_2.1.2 compiler_4.3.2 brio_1.1.5
[21] munsell_0.5.1 codetools_0.2-18 htmltools_0.5.8.1 class_7.3-20
[25] sass_0.4.9 RCurl_1.98-1.16 yaml_2.3.10 pillar_1.9.0
[29] jquerylib_0.1.4 MASS_7.3-55 classInt_0.4-10 cachem_1.1.0
[33] iterators_1.0.14 viridis_0.6.5 foreach_1.5.2 Deriv_4.1.3
[37] tidyselect_1.2.1 digest_0.6.37 sf_1.0-16 dplyr_1.1.4
[41] labeling_0.4.3 fastmap_1.2.0 grid_4.3.2 colorspace_2.1-0
[45] cli_3.6.3 magrittr_2.0.3 utf8_1.2.4 e1071_1.7-14
[49] withr_3.0.1 shinyBS_0.61.1 scales_1.3.0 backports_1.5.0
[53] rmarkdown_2.27 Sim.DiffProc_4.9 signal_1.8-1 igraph_2.0.3
[57] gridExtra_2.3 pbapply_1.7-2 evaluate_0.24.0 dtw_1.23-1
[61] fftw_1.0-8 testthat_3.2.1.1 viridisLite_0.4.2 rlang_1.1.4
[65] Rcpp_1.0.13 glue_1.7.0 DBI_1.2.3 rstudioapi_0.16.0
[69] jsonlite_1.8.8 soundgen_2.6.2 R6_2.5.1 units_0.8-5