In order to be able to work with the whole database at the sapwood or plant level it is recommended at least \(16GB\) of RAM memory. This is because loading all data objects already consumes \(4GB\) and any operation like aggregation or metric calculation results in extra memory needed:
library(sapfluxnetr)
# This will need at least 5GB of memory during the process
<- 'RData/plant'
folder <- read_sfn_metadata(folder)
sfn_metadata
<- sfn_sites_in_folder(folder) %>%
daily_results filter_sites_by_md(
%in% c("Temperate forest", 'Woodland/Shrubland'),
si_biome sites = sites, metadata = sfn_metadata
%>%
) read_sfn_data(folder) %>%
daily_metrics(tidy = TRUE, metadata = sfn_metadata)
# Important to save, this way you will have access to the object in the future
save(daily_results, file = 'daily_results.RData')
To circumvent this in less powerful systems, we recommend to work in small subsets of sites (25-30) and join the tidy results afterwards:
library(sapfluxnetr)
<- 'RData/plant'
folder <- read_sfn_metadata(folder)
metadata <- sfn_sites_in_folder(folder) %>%
sites filter_sites_by_md(
%in% c("Temperate forest", 'Woodland/Shrubland'),
si_biome sites = sites, metadata = sfn_metadata
)
<- read_sfn_data(sites[1:30], folder) %>%
daily_results_1 daily_metrics(tidy = TRUE, metadata = sfn_metadata)
<- read_sfn_data(sites[31:60], folder) %>%
daily_results_2 daily_metrics(tidy = TRUE, metadata = sfn_metadata)
<- read_sfn_data(sites[61:90], folder) %>%
daily_results_3 daily_metrics(tidy = TRUE, metadata = sfn_metadata)
<- read_sfn_data(sites[91:110], folder) %>%
daily_results_4 daily_metrics(tidy = TRUE, metadata = sfn_metadata)
<- bind_rows(
daily_results_steps
daily_results_1, daily_results_2,
daily_results_3, daily_results_4
)
rm(daily_results_1, daily_results_2, daily_results_3, daily_results_4)
save(daily_results_steps, file = 'daily_results_steps.RData')
sapfluxnetr
includes the capability to parallelize the
metrics calculation when performed on a sfn_data_multi
object. This is made thenks to the furrr package, which
uses the future
package behind the scenes. By default, the code will run in a sequential
process, which is the usual way the R code runs. But setting the
future::plan
to multicore
(in Linux),
multisession
(in Windows) or multiprocess
(automatically choose between the previous plans depending on the
system) will run the code in parallel, dividing the sites between the
available cores.
> Be advised, parallelization usually means more RAM used, so in systems
with less then 16GB maybe is not a good idea.
Also, the time benefits start to show when analysing 10 sites or more.
# loading future package
library(future)
# setting the plan
plan('multiprocess')
# metrics!!
<- sfn_sites_in_folder(folder) %>%
daily_results_parallel filter_sites_by_md(
%in% c("Temperate forest", 'Woodland/Shrubland'),
si_biome sites = sites, metadata = sfn_metadata
%>%
) read_sfn_data(folder) %>%
daily_metrics(tidy = TRUE, metadata = sfn_metadata)
# Important to save, this way you will have access to the object in the future
save(daily_results_parallel, file = 'daily_results_parallel.RData')
When using furrr
, even in the sequential
plan, the future
package sets a limit of \(500MB\) for each core. With sapfluxnet data
this limit is easily exceeded, causing an error. To avoid this we may
want to set the future.globals.maxSize
limit to a higher
value (\(1GB\) for example, but the
limit wanted really depend on the plan and the number of sites):
# future library
library(future)
# plan sequential, not really needed, as it is the default, but for the sake of
# clarity
plant('sequential')
# up the limit to 1GB, this in bytes is 1014*1024^2
options('future.globals.maxSize' = 1014*1024^2)
# do the metrics
<- sfn_sites_in_folder(folder) %>%
daily_results_limit filter_sites_by_md(
%in% c("Temperate forest", 'Woodland/Shrubland'),
si_biome sites = sites, metadata = sfn_metadata
%>%
) read_sfn_data(folder) %>%
daily_metrics(tidy = TRUE, metadata = sfn_metadata)
# Important to save, this way you will have access to the object in the future
save(daily_results_limit, file = 'daily_results_limit.RData')