cassandRa Vignette

Chris Terry

2024-06-07

Introduction

This package provides tools to determine the probably effect of undersampling in bipartite ecological networks and tools to determine likely gaps in ecological data following the approaches discussed in Finding missing links in interaction networks by Terry & Lewis (in prep).

While this cannot replace further sampling, by making it as easy as possible to assess the most likely missing links, we hope we can encourage further consideration of the quality of ecological networks.

We provide two ‘high level functions’ that are designed to work as easily as possible out of the box: PredictLinks() and RarefyNetwork(). Both functions are designed to ‘play nicely’ with the bipartite package (Dormann et al., 2008). They both come with accompanying plotting functions that return ggplot objects displaying the key information.

Source code is hosted on github at https://github.com/jcdterry/cassandRa/. Any issues, comments or suggestions are probably best routed through there. If you make use of the link prediction aspect of the package, please cite the paper (either the preprint, or the paper if/when it comes out. Otherwise, if you use resampling, then please just cite the package.

Data format requirements

Both functions assume that you have a quantitative ecological network matrix in the format ready to go with the widely used bipartite package. It should be in bipartite form, i.e. a potentially rectangular matrix of side lengths NxM rather than SxS. There should not be any additional columns with totals or other summaries.

The package bipartite provides some useful tools to help with this such as frame2webs().

Data in this format has rows being each of the focal layer (referred to as ‘Host’ in the underlying function documentation) and columns detailing the response layer (referred to as ‘Wasp’ in the underlying documentation). The distinction is that the focal layer is the layer where you have control over the sampling - for example which plants were watched, which hosts were gathered for parasite-rearing etc.

If present, row names and column names are retained for use. Species that are not observed interacting with any others are removed.

For this vignette, we will use one of the datasets bundled with the bipartite package:

data(Safariland, package= 'bipartite')

Safariland[1:5, 1:2]
#>                       Policana albopilosa Bombus dahlbomii
#> Aristotelia chilensis                 673                0
#> Alstroemeria aurea                      0              154
#> Schinus patagonicus                     0                0
#> Berberis darwinii                       0               67
#> Rosa eglanteria                         0                0

RarefyNetwork() - Metric Responses to Resampling

The purpose of this function is to provide an easy way to get some information on the confidence with which network metrics can be assigned, given a particular level of sampling. It does this by redrawing observations with probabilities proportional to the relative interaction frequency, and assessing the spread of the network metrics. Before starting, I would recommend also looking at capacity built into bipartite, specifically bipartite::nullmodel and the discussion in the Intro2bipartite vignette. These functions include other ways of generating null models that may be more relevant.

This has three main purposes: to (visually) assess whether a metric has stabilised with the level of sampling, to assess (via bootstrapped confidence intervals) the precision with which that metric can be reported, and to allow the rarefaction of different networks to a matched sample size to allow a more equitable comparison of metrics that are known to be correlated with sample size.

The function calls networklevel() from the bipartite package, which offers the vast majority of bipartite network metrics used by ecologists. Note that if the re-sampling procedure means that certain species are no longer present in the network, then they will be dropped by the function by default. This could lead to some problems for some metrics such as overall connectance since the number of possible interactions will decrease with a smaller web size. This can be overcome by passing empty.web=FALSE to the ... argument.

A basic way used to calculate confidence intervals is to sort the metric values from the re-sampled networks and find the value at the 5th and 95th percentile. These values must still be treated with some caution as the re-sampling procedure will not generate unobserved interaction. Nonetheless it can give an indication of the extent to which the sampling is sufficient to make a particular conclusion. Any NA values are excluded. These are only likely to occur for certain quantitative metrics that may fail for small networks (e.g. weighted nestedness can do this). By excluding NA’s a value can still be returned, by since these NA’s are non-random they may skew the results. The underlying networklevel() function should warn which metric and for what size this may be an issue.

Function Arguments

Beware that each metric will be called potentially many 1000s of times, so don’t be too greedy! The degree distribution fit metrics will not work as they cannot be coerced easily into a vector. ?networklevel() discusses time considerations.

and then ONE of:

Then there are two optional arguments if you want to take advantage of parallel computation. If PARALLEL=TRUE, it will use the parallel package to split the network calculations over the number of cores set by cores.

By default it will return a data frame of the raw recalculated metrics, with a separate column for each metric, and the last column specifying the resample size.

This data frame can then be passed to ComputeCI() to return basic confidence intervals (in a data frame containing 5 columns: Metric, LowerCI, UpperCI, Mean & SampleSize), or to PlotRarefaction() to make a simple ggplot. Both these functions have no additional arguments and are pretty basic. The look of the ggplots can be modified as normal.

If you want to skip this step, it is possible to pass an output argument direct to RarefyNetwork(). If 'plot' will return a ggplot object using PlotRarefaction(), and if output='CI' it will return a data frame using ComputeCI().


MetricsAfterReSampling<- RarefyNetwork(Safariland, 
                                       n_per_level = 50, # v. small for the vignette
                                       metrics = c('H2', 'nestedness', 'links per species'))

PlotRarefaction(MetricsAfterReSampling)


ComputeCI(MetricsAfterReSampling)
#>                                   Metric    LowerCI    UpperCI       Mean
#> H2...1                                H2  0.8293225  0.9249341  0.8796841
#> H2...2                                H2  0.8248465  0.9233817  0.8663997
#> H2...3                                H2  0.8350408  0.9090144  0.8724597
#> H2...4                                H2  0.8264402  0.9014620  0.8597829
#> H2...5                                H2  0.8277802  0.8961996  0.8568270
#> links per species...6  links per species  0.7600000  0.9583333  0.8414517
#> links per species...7  links per species  0.8518519  1.0333333  0.9256467
#> links per species...8  links per species  0.9032258  1.0645161  0.9780459
#> links per species...9  links per species  0.9677419  1.0909091  1.0158670
#> links per species...10 links per species  1.0000000  1.0909091  1.0330300
#> nestedness...11               nestedness 22.4026640 38.6126773 27.3835563
#> nestedness...12               nestedness 19.4509950 32.1398845 24.2783014
#> nestedness...13               nestedness 19.0295559 27.3292434 22.6104222
#> nestedness...14               nestedness 18.2888526 26.5684360 22.1101334
#> nestedness...15               nestedness 17.9240325 25.7960163 21.8303458
#>                        SampleSize
#> H2...1                        226
#> H2...2                        452
#> H2...3                        678
#> H2...4                        904
#> H2...5                       1130
#> links per species...6         226
#> links per species...7         452
#> links per species...8         678
#> links per species...9         904
#> links per species...10       1130
#> nestedness...11               226
#> nestedness...12               452
#> nestedness...13               678
#> nestedness...14               904
#> nestedness...15              1130
RarefyNetwork(Safariland,
              n_per_level = 50,
              abs_sample_levels = c(500, 1000, 5000),
              metrics = 'info', 
              output = 'plot')

Session Information

sessionInfo()
#> R version 4.4.0 (2024-04-24 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=C                           
#> [2] LC_CTYPE=English_United Kingdom.utf8   
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.utf8    
#> 
#> time zone: Europe/London
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] ggplot2_3.5.1        tidyr_1.3.1          dplyr_1.1.4         
#>  [4] purrr_1.0.2          cassandRa_0.2.0      bipartite_2.20      
#>  [7] sna_2.7-2            network_1.18.2       statnet.common_4.9.0
#> [10] vegan_2.6-6.1        lattice_0.22-6       permute_0.9-7       
#> 
#> loaded via a namespace (and not attached):
#>  [1] sass_0.4.9        utf8_1.2.4        generics_0.1.3    stringi_1.8.4    
#>  [5] digest_0.6.35     magrittr_2.0.3    evaluate_0.23     grid_4.4.0       
#>  [9] fastmap_1.2.0     maps_3.4.2        plyr_1.8.9        jsonlite_1.8.8   
#> [13] Matrix_1.7-0      mgcv_1.9-1        fansi_1.0.6       spam_2.10-0      
#> [17] viridisLite_0.4.2 scales_1.3.0      jquerylib_0.1.4   cli_3.6.2        
#> [21] rlang_1.1.3       munsell_0.5.1     splines_4.4.0     withr_3.0.0      
#> [25] cachem_1.1.0      yaml_2.3.8        tools_4.4.0       parallel_4.4.0   
#> [29] reshape2_1.4.4    coda_0.19-4.1     colorspace_2.1-0  boot_1.3-30      
#> [33] vctrs_0.6.5       R6_2.5.1          lifecycle_1.0.4   stringr_1.5.1    
#> [37] MASS_7.3-60.2     cluster_2.1.6     pkgconfig_2.0.3   pillar_1.9.0     
#> [41] bslib_0.7.0       gtable_0.3.5      glue_1.7.0        Rcpp_1.0.12      
#> [45] fields_15.2       highr_0.11        xfun_0.44         tibble_3.2.1     
#> [49] tidyselect_1.2.1  rstudioapi_0.16.0 knitr_1.46        farver_2.1.2     
#> [53] htmltools_0.5.8.1 nlme_3.1-164      igraph_2.0.3      labeling_0.4.3   
#> [57] rmarkdown_2.27    dotCall64_1.1-1   compiler_4.4.0