TileDBArray 1.10.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.03834394 0.86230721 -0.58461405 . -1.9089236 0.8738508
## [2,] -0.31297189 -0.29636735 0.94755459 . 0.1585017 0.5333344
## [3,] 0.13759901 0.51049959 0.38687517 . 1.0827049 0.1100603
## [4,] 0.51491654 1.10576931 1.84692665 . -1.1263693 0.5673945
## [5,] -0.05191994 -1.56189999 1.30467951 . -0.3044160 -1.8350364
## ... . . . . . .
## [96,] 0.9860682 0.1921533 1.8045108 . -0.1521114 0.5709975
## [97,] -1.2552504 0.2948000 -0.3396026 . -0.2810650 0.7886316
## [98,] -1.3130589 -0.6875371 1.2188508 . 1.3352788 -2.0186456
## [99,] -1.5490050 0.8560352 1.5085939 . -0.8632906 -1.8273165
## [100,] -0.7047650 1.1472692 1.2136661 . 1.2466240 0.1023914
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.03834394 0.86230721 -0.58461405 . -1.9089236 0.8738508
## [2,] -0.31297189 -0.29636735 0.94755459 . 0.1585017 0.5333344
## [3,] 0.13759901 0.51049959 0.38687517 . 1.0827049 0.1100603
## [4,] 0.51491654 1.10576931 1.84692665 . -1.1263693 0.5673945
## [5,] -0.05191994 -1.56189999 1.30467951 . -0.3044160 -1.8350364
## ... . . . . . .
## [96,] 0.9860682 0.1921533 1.8045108 . -0.1521114 0.5709975
## [97,] -1.2552504 0.2948000 -0.3396026 . -0.2810650 0.7886316
## [98,] -1.3130589 -0.6875371 1.2188508 . 1.3352788 -2.0186456
## [99,] -1.5490050 0.8560352 1.5085939 . -0.8632906 -1.8273165
## [100,] -0.7047650 1.1472692 1.2136661 . 1.2466240 0.1023914
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.03834394 0.86230721 -0.58461405 . -1.9089236 0.8738508
## GENE_2 -0.31297189 -0.29636735 0.94755459 . 0.1585017 0.5333344
## GENE_3 0.13759901 0.51049959 0.38687517 . 1.0827049 0.1100603
## GENE_4 0.51491654 1.10576931 1.84692665 . -1.1263693 0.5673945
## GENE_5 -0.05191994 -1.56189999 1.30467951 . -0.3044160 -1.8350364
## ... . . . . . .
## GENE_96 0.9860682 0.1921533 1.8045108 . -0.1521114 0.5709975
## GENE_97 -1.2552504 0.2948000 -0.3396026 . -0.2810650 0.7886316
## GENE_98 -1.3130589 -0.6875371 1.2188508 . 1.3352788 -2.0186456
## GENE_99 -1.5490050 0.8560352 1.5085939 . -0.8632906 -1.8273165
## GENE_100 -0.7047650 1.1472692 1.2136661 . 1.2466240 0.1023914
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.03834394 -0.31297189 0.13759901 0.51491654 -0.05191994 0.20711895
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.03834394 0.86230721 -0.58461405 0.28257580 0.03030761
## GENE_2 -0.31297189 -0.29636735 0.94755459 -1.09220891 0.42474126
## GENE_3 0.13759901 0.51049959 0.38687517 -0.69451227 -0.91895789
## GENE_4 0.51491654 1.10576931 1.84692665 0.09451804 -1.24921084
## GENE_5 -0.05191994 -1.56189999 1.30467951 0.20145877 -1.07624065
out * 2
## <100 x 10> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.07668787 1.72461442 -1.16922811 . -3.8178471 1.7477016
## GENE_2 -0.62594377 -0.59273470 1.89510917 . 0.3170034 1.0666689
## GENE_3 0.27519802 1.02099919 0.77375033 . 2.1654098 0.2201207
## GENE_4 1.02983308 2.21153862 3.69385331 . -2.2527387 1.1347890
## GENE_5 -0.10383989 -3.12379997 2.60935902 . -0.6088320 -3.6700729
## ... . . . . . .
## GENE_96 1.9721364 0.3843066 3.6090216 . -0.3042229 1.1419949
## GENE_97 -2.5105009 0.5895999 -0.6792053 . -0.5621300 1.5772632
## GENE_98 -2.6261178 -1.3750742 2.4377015 . 2.6705576 -4.0372912
## GENE_99 -3.0980100 1.7120704 3.0171879 . -1.7265813 -3.6546330
## GENE_100 -1.4095299 2.2945384 2.4273322 . 2.4932479 0.2047828
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## -0.06897837 9.99744414 6.86017653 0.79895663 -2.00328626 -11.56968833
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## 23.19639037 -10.03123315 -9.21924860 -2.83429429
out %*% runif(ncol(out))
## <100 x 1> matrix of class DelayedMatrix and type "double":
## y
## GENE_1 -0.4334545
## GENE_2 0.1539059
## GENE_3 -0.1227004
## GENE_4 -0.2150128
## GENE_5 -1.4143542
## ... .
## GENE_96 3.70811814
## GENE_97 2.28015087
## GENE_98 0.04413752
## GENE_99 -3.23984321
## GENE_100 2.62167172
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.11943000 -0.29822655 -1.97206285 . -2.54924065 -0.08369754
## [2,] 0.53007863 -0.29934992 -0.52280057 . 0.21592562 0.97282378
## [3,] -1.25484930 0.58885269 -0.82269891 . 0.16252055 0.14531917
## [4,] 0.05322834 0.78375131 0.18008708 . -1.63394189 1.22820045
## [5,] 1.03264454 1.02227519 1.50770654 . -0.12499813 0.81785414
## ... . . . . . .
## [96,] -0.33571852 0.13632443 -1.78436813 . 1.1419033 0.6017303
## [97,] -0.27350471 -0.44581612 0.38047350 . -1.1118260 -0.3723451
## [98,] -0.60035362 -0.33219440 -1.51663882 . -0.3219664 -1.1978307
## [99,] 0.95517836 0.68152437 -0.03518602 . -1.5949784 0.9115162
## [100,] 0.30386001 1.48369246 -1.03874597 . 1.7136956 -0.1398910
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.11943000 -0.29822655 -1.97206285 . -2.54924065 -0.08369754
## [2,] 0.53007863 -0.29934992 -0.52280057 . 0.21592562 0.97282378
## [3,] -1.25484930 0.58885269 -0.82269891 . 0.16252055 0.14531917
## [4,] 0.05322834 0.78375131 0.18008708 . -1.63394189 1.22820045
## [5,] 1.03264454 1.02227519 1.50770654 . -0.12499813 0.81785414
## ... . . . . . .
## [96,] -0.33571852 0.13632443 -1.78436813 . 1.1419033 0.6017303
## [97,] -0.27350471 -0.44581612 0.38047350 . -1.1118260 -0.3723451
## [98,] -0.60035362 -0.33219440 -1.51663882 . -0.3219664 -1.1978307
## [99,] 0.95517836 0.68152437 -0.03518602 . -1.5949784 0.9115162
## [100,] 0.30386001 1.48369246 -1.03874597 . 1.7136956 -0.1398910
sessionInfo()
## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.12 TileDBArray_1.10.0 DelayedArray_0.26.0
## [4] IRanges_2.34.0 S4Vectors_0.38.0 MatrixGenerics_1.12.0
## [7] matrixStats_0.63.0 BiocGenerics_0.46.0 Matrix_1.5-4
## [10] BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] cli_3.6.1 knitr_1.42 rlang_1.1.0
## [4] xfun_0.39 data.table_1.14.8 jsonlite_1.8.4
## [7] zoo_1.8-12 bit_4.0.5 htmltools_0.5.5
## [10] nanotime_0.3.7 sass_0.4.5 rmarkdown_2.21
## [13] grid_4.3.0 evaluate_0.20 jquerylib_0.1.4
## [16] fastmap_1.1.1 yaml_2.3.7 bookdown_0.33
## [19] BiocManager_1.30.20 compiler_4.3.0 Rcpp_1.0.10
## [22] RcppCCTZ_0.2.12 lattice_0.21-8 digest_0.6.31
## [25] R6_2.5.1 tiledb_0.19.0 bslib_0.4.2
## [28] bit64_4.0.5 tools_4.3.0 spdl_0.0.4
## [31] cachem_1.0.7