Using your own matrix data

Owen Jones

2024-10-16

The utility of Rcompadre extends beyond the use of data from the COMPADRE and COMADRE matrix databases. By coercing user-provided matrix population model (MPM) data (and metadata) into the standardised format used by the Rcompadre package (a CompadreDB object) you can make use of all the functionality of the package. The central function to carry out this is cdb_build_cdb().

This vignette illustrates some simple use cases.

The CompadreDB object

Before illustrating the construction of a CompadreDB object using cdb_build_cdb() is it first necessary to outline the anatomy of the object.

The CompadreDB object consists of four parts: (1) the matrices; (2) the stage information; (3) the metadata describing the matrices; and (4) version information. Much of this information can be generated automatically by the cdb_build_cdb() function.

The matrices

MPM data can exist as A matrices (i.e. the whole MPM model) but can also exist as a series of submatrices that sum to the A matrix. Typically these matrices are based on demographic processes such as growth and survival, sexual reproduction and clonal reproduction. These matrices are commonly denoted as the U, F and C matrices respectively, and A = U + F + C (Caswell 2001, Salguero-Gomez et al. 2015, 2016).

Thus, each MPM in the CompadreDB object is provided as a list object with four elements representing this set of matrices: A, the full MPM and the three demographic process-based submatrices (U, F and C).

In some cases it is not desirable (or perhaps impossible) to provide information for the set of submatrices. For example, it may not be possible to distinguish between sexual (F) and clonal (C) reproduction, or between growth/survival (U) and reproduction (F and/or C). Alternatively, it may simply be that the planned analyses do not require the potentially laborious splitting of the A matrix into these submatrices. Nevertheless, CompadreDB requires the full set of four matrices. Thankfully, the matrices can be provided as NA matrices and can often be generated automatically from the provided data (see below).

Sets of matrices of the same type must be provided as a list for each type. For example, you could provide two lists: one for the U matrices and one for the matching F matrices. The function conducts some error checks to ensure that these lists have the same length, and that all matrices in each set has the same dimensions.

The stage information

Each MPM has a life-cycle divided into two or more discrete stages. The CompadreDB object must include this information, and it is provided as a list of data.frames (one for each MPM).

The metadata

A valid CompadreDB object MUST include a data.frame of metadata with a number of rows equal to the number of MPMs.

This metadata, can be minimal or very extensive, depending on the users’ needs. In the simplest case, for example with simulated data, this might simply be an ID number, or perhaps parameters used in simulation. In cases with empirical MPMs the metadata will typically include taxonomic information on the species, the geographic location, the name of the study site, the year or time-frame of study and so on. Thus, the metadata data.frame can include anything from one to hundreds of columns.

Version information

Finally, some version information must be included. This can simply be a name, or a date intended to help keep track of multiple version of the data.

Data preparation

The main function used for creating a CompadreDB object from user-defined data is cdb_build_cdb(). This function takes the components described above, performs some error checks, and combines them into a single CompadreDB object.

A simple example

First we need to load the library, and the dplyr package.

library(Rcompadre)
library(dplyr)

In this example we generate a series of 2 dimension A matrices using a series of uniform distributions for the U submatrix, and a gamma distribution, to approximate the average of a Poisson process. In this case, the matrices all have the same dimension, but it is not necessary for dimension to be the same. This is a bit long-winded, and there are certainly better ways to simulate these data (e.g. using a Dirichlet distribution), but the example serves a useful purpose here.

nMat <- 20
mort1 <- runif(nMat, 0, 1)
u1 <- runif(nMat, 0, 1 - mort1)
u2 <- 1 - mort1 - u1
mort2 <- runif(nMat, 0, 1)
u3 <- runif(nMat, 0, 1 - mort2)
u4 <- 1 - mort2 - u3

Uvals <- cbind(u1, u2, u3, u4)
Fvals <- rgamma(nMat, rep(1:4, each = 5))
Avals <- Uvals
Avals[, 3] <- Avals[, 3] + Fvals

Alist <- lapply(as.list(as.data.frame(t(Avals))), matrix,
  byrow = FALSE,
  nrow = 2, ncol = 2
)

Next we use cdb_build_cdb() to convert this list of matrices into a COMPADRE object. Here I am adding an identifier to each matrix, and a column for the shape parameter for the Gamma distribution used to simulate the data.

meta <- data.frame(idNum = 1:20, shapeParam = rep(1:4, each = 5))

x <- cdb_build_cdb(mat_a = Alist, metadata = meta)
#> Warning in cdb_build_cdb(mat_a = Alist, metadata = meta): Metadata does not include a `SpeciesAccepted` column, so number
#>               of species not provided when viewing object.
x
#> A COM(P)ADRE database ('CompadreDB') object with ?? SPECIES and 20 MATRICES.
#> 
#> # A tibble: 20 × 3
#>    mat        idNum shapeParam
#>    <list>     <int>      <int>
#>  1 <CompdrMt>     1          1
#>  2 <CompdrMt>     2          1
#>  3 <CompdrMt>     3          1
#>  4 <CompdrMt>     4          1
#>  5 <CompdrMt>     5          1
#>  6 <CompdrMt>     6          2
#>  7 <CompdrMt>     7          2
#>  8 <CompdrMt>     8          2
#>  9 <CompdrMt>     9          2
#> 10 <CompdrMt>    10          2
#> 11 <CompdrMt>    11          3
#> 12 <CompdrMt>    12          3
#> 13 <CompdrMt>    13          3
#> 14 <CompdrMt>    14          3
#> 15 <CompdrMt>    15          3
#> 16 <CompdrMt>    16          4
#> 17 <CompdrMt>    17          4
#> 18 <CompdrMt>    18          4
#> 19 <CompdrMt>    19          4
#> 20 <CompdrMt>    20          4

We can look at the matrices using the normal Rcompadre function matA().

matA(x)[1]
#> [[1]]
#>           [,1]      [,2]
#> [1,] 0.1659634 0.7000140
#> [2,] 0.1794142 0.2390083

Now the matrices are stored in a CompadreDB object they can be manipulated in the same diverse ways as the CompadreDB object downloaded from the COMPADRE/COMADRE database.

For example, filtering based on part of the metadata, in this case, shapeParam.

x %>%
  filter(shapeParam > 2)
#> A COM(P)ADRE database ('CompadreDB') object with ?? SPECIES and 10 MATRICES.
#> 
#> # A tibble: 10 × 3
#>    mat        idNum shapeParam
#>    <list>     <int>      <int>
#>  1 <CompdrMt>    11          3
#>  2 <CompdrMt>    12          3
#>  3 <CompdrMt>    13          3
#>  4 <CompdrMt>    14          3
#>  5 <CompdrMt>    15          3
#>  6 <CompdrMt>    16          4
#>  7 <CompdrMt>    17          4
#>  8 <CompdrMt>    18          4
#>  9 <CompdrMt>    19          4
#> 10 <CompdrMt>    20          4

Including stage descriptions and version information

In the above example, I did not include any information about the stage definitions. Since these information were not provided, cdb_build_cdb() automatically creates some information. You can view that information like this (using square brackets to choose a particular matrix model):

matrixClass(x)[1]
#> [[1]]
#>   MatrixClassOrganized MatrixClassAuthor MatrixClassNumber
#> 1               active                 1                 1
#> 2               active                 2                 2

In the following example I illustrate how one can include descriptions of the stages

First I create a data frame describing the matrix stages.

(stageDescriptor <- data.frame(
  MatrixClassOrganized = rep("active", 2),
  MatrixClassAuthor = c("small", "large"),
  stringsAsFactors = FALSE
))
#>   MatrixClassOrganized MatrixClassAuthor
#> 1               active             small
#> 2               active             large

In this case, all stages are the same and I can simply repeat the stageDescriptor in a list. However, the size of these data frames, and the information within them may vary.

stageDesc <- list()
stageDesc[1:20] <- list(stageDescriptor)
y <- cdb_build_cdb(
  mat_a = Alist, metadata = meta, stages = stageDesc,
  version = "Matrices Rock!"
)
#> Warning in cdb_build_cdb(mat_a = Alist, metadata = meta, stages = stageDesc, : Metadata does not include a `SpeciesAccepted` column, so number
#>               of species not provided when viewing object.

Now you can access the stage/class description information like this, using square brackets to find the information for particular matrices.

matrixClass(y)[5]
#> [[1]]
#>   MatrixClassOrganized MatrixClassAuthor MatrixClassNumber
#> 1               active             small                 1
#> 2               active             large                 2
MatrixClassAuthor(y)[5]
#> [[1]]
#> [1] "small" "large"

You can also obtain the version information.

Version(y)
#> [1] "Matrices Rock!"
DateCreated(y)
#> [1] "2024-10-16"

The newly-created database can be saved like this:

save(y, "myMatrixDatabase.Rdata")

References

Caswell, H. (2001). Matrix Population Models: Construction, Analysis, and Interpretation. 2nd edition. Sinauer Associates, Sunderland, MA. ISBN-10: 0878930965

Salguero-Gomez, R. , Jones, O. R., Archer, C. R., Buckley, Y. M., Che-Castaldo, J. , Caswell, H. , Hodgson, D. , Scheuerlein, A. , Conde, D. A., Brinks, E. , Buhr, H. , Farack, C. , Gottschalk, F. , Hartmann, A. , Henning, A. , Hoppe, G. , Romer, G. , Runge, J. , Ruoff, T. , Wille, J. , Zeh, S. , Davison, R. , Vieregg, D. , Baudisch, A. , Altwegg, R. , Colchero, F. , Dong, M. , Kroon, H. , Lebreton, J. , Metcalf, C. J., Neel, M. M., Parker, I. M., Takada, T. , Valverde, T. , Velez-Espino, L. A., Wardle, G. M., Franco, M. and Vaupel, J. W. (2015), The COMPADRE Plant Matrix Database: an open online repository for plant demography. J Ecol, 103: 202-218. <:10.1111/1365-2745.12334>

Salguero-Gomez, R. , Jones, O. R., Archer, C. R., Bein, C. , Buhr, H. , Farack, C. , Gottschalk, F. , Hartmann, A. , Henning, A. , Hoppe, G. , Romer, G. , Ruoff, T. , Sommer, V. , Wille, J. , Voigt, J. , Zeh, S. , Vieregg, D. , Buckley, Y. M., Che-Castaldo, J. , Hodgson, D. , Scheuerlein, A. , Caswell, H. and Vaupel, J. W. (2016), COMADRE: a global data base of animal demography. J Anim Ecol, 85: 371-384. <:10.1111/1365-2656.12482>