TileDBArray 1.19.1
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.13821122 -0.86623927 -2.10998930 . 1.1373180 -0.3090889
## [2,] 0.31001814 -0.34958575 -0.16302431 . 0.3260926 -0.2985764
## [3,] 0.29073928 -0.09901479 0.01328709 . -1.1982264 0.7696583
## [4,] -2.74527409 -0.69321474 -0.37283281 . 0.8793309 1.5450930
## [5,] -1.16005207 -1.00773800 -1.33584728 . -0.5953840 -0.5596157
## ... . . . . . .
## [96,] -0.78035182 -1.31056846 -0.79922955 . 0.7291775 -0.1704391
## [97,] 1.16045302 0.94822492 -0.38901178 . 0.8897917 0.7466136
## [98,] -0.42240566 -0.58163552 -0.06863176 . 0.1299725 -1.3957369
## [99,] -0.27234091 -0.33936337 0.67659824 . -0.2235346 0.9952467
## [100,] -0.83417685 1.34683288 2.44576320 . 0.3530397 -1.4013021
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.13821122 -0.86623927 -2.10998930 . 1.1373180 -0.3090889
## [2,] 0.31001814 -0.34958575 -0.16302431 . 0.3260926 -0.2985764
## [3,] 0.29073928 -0.09901479 0.01328709 . -1.1982264 0.7696583
## [4,] -2.74527409 -0.69321474 -0.37283281 . 0.8793309 1.5450930
## [5,] -1.16005207 -1.00773800 -1.33584728 . -0.5953840 -0.5596157
## ... . . . . . .
## [96,] -0.78035182 -1.31056846 -0.79922955 . 0.7291775 -0.1704391
## [97,] 1.16045302 0.94822492 -0.38901178 . 0.8897917 0.7466136
## [98,] -0.42240566 -0.58163552 -0.06863176 . 0.1299725 -1.3957369
## [99,] -0.27234091 -0.33936337 0.67659824 . -0.2235346 0.9952467
## [100,] -0.83417685 1.34683288 2.44576320 . 0.3530397 -1.4013021
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -1.13821122 -0.86623927 -2.10998930 . 1.1373180 -0.3090889
## GENE_2 0.31001814 -0.34958575 -0.16302431 . 0.3260926 -0.2985764
## GENE_3 0.29073928 -0.09901479 0.01328709 . -1.1982264 0.7696583
## GENE_4 -2.74527409 -0.69321474 -0.37283281 . 0.8793309 1.5450930
## GENE_5 -1.16005207 -1.00773800 -1.33584728 . -0.5953840 -0.5596157
## ... . . . . . .
## GENE_96 -0.78035182 -1.31056846 -0.79922955 . 0.7291775 -0.1704391
## GENE_97 1.16045302 0.94822492 -0.38901178 . 0.8897917 0.7466136
## GENE_98 -0.42240566 -0.58163552 -0.06863176 . 0.1299725 -1.3957369
## GENE_99 -0.27234091 -0.33936337 0.67659824 . -0.2235346 0.9952467
## GENE_100 -0.83417685 1.34683288 2.44576320 . 0.3530397 -1.4013021
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -1.13821122 0.31001814 0.29073928 -2.74527409 -1.16005207 -0.05472509
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -1.13821122 -0.86623927 -2.10998930 0.05871478 0.47669209
## GENE_2 0.31001814 -0.34958575 -0.16302431 0.67941651 0.52342584
## GENE_3 0.29073928 -0.09901479 0.01328709 -0.84174333 -1.48718411
## GENE_4 -2.74527409 -0.69321474 -0.37283281 -0.97171058 -1.67877413
## GENE_5 -1.16005207 -1.00773800 -1.33584728 -0.04290909 0.32632209
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -2.27642245 -1.73247854 -4.21997859 . 2.2746360 -0.6181778
## GENE_2 0.62003628 -0.69917150 -0.32604862 . 0.6521851 -0.5971528
## GENE_3 0.58147855 -0.19802958 0.02657418 . -2.3964528 1.5393166
## GENE_4 -5.49054818 -1.38642948 -0.74566563 . 1.7586617 3.0901860
## GENE_5 -2.32010414 -2.01547600 -2.67169456 . -1.1907680 -1.1192315
## ... . . . . . .
## GENE_96 -1.5607036 -2.6211369 -1.5984591 . 1.4583550 -0.3408781
## GENE_97 2.3209060 1.8964498 -0.7780236 . 1.7795834 1.4932272
## GENE_98 -0.8448113 -1.1632710 -0.1372635 . 0.2599450 -2.7914738
## GENE_99 -0.5446818 -0.6787267 1.3531965 . -0.4470693 1.9904935
## GENE_100 -1.6683537 2.6936658 4.8915264 . 0.7060794 -2.8026042
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## -12.4344898 -15.6468789 -10.8595262 6.5753328 -2.0595735 13.1270446
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## -9.7028770 -0.1362843 1.9506625 -22.7949964
out %*% runif(ncol(out))
## [,1]
## GENE_1 0.17554515
## GENE_2 0.90670002
## GENE_3 -1.89096868
## GENE_4 -3.44212577
## GENE_5 -0.39819515
## GENE_6 -0.81285393
## GENE_7 -1.95404907
## GENE_8 0.32356278
## GENE_9 -0.32105075
## GENE_10 -1.84575148
## GENE_11 1.31573945
## GENE_12 0.61511910
## GENE_13 0.72644424
## GENE_14 2.39890804
## GENE_15 -0.92772227
## GENE_16 0.42920362
## GENE_17 0.26971977
## GENE_18 -0.61615589
## GENE_19 -2.29410890
## GENE_20 0.70949424
## GENE_21 0.66068810
## GENE_22 -1.34924598
## GENE_23 -1.92459397
## GENE_24 -4.12164903
## GENE_25 -0.69883032
## GENE_26 0.66448490
## GENE_27 -1.83312952
## GENE_28 -1.64811242
## GENE_29 -3.38261630
## GENE_30 0.15539161
## GENE_31 -0.70960137
## GENE_32 -1.60232391
## GENE_33 0.83112681
## GENE_34 2.40272717
## GENE_35 0.03890551
## GENE_36 0.98350158
## GENE_37 0.96735748
## GENE_38 1.69013915
## GENE_39 -1.49832344
## GENE_40 -0.03918895
## GENE_41 -0.76881884
## GENE_42 3.17009935
## GENE_43 -1.81819072
## GENE_44 3.27089771
## GENE_45 0.75438473
## GENE_46 1.78400163
## GENE_47 1.13328099
## GENE_48 -0.33029191
## GENE_49 0.07738381
## GENE_50 4.72069560
## GENE_51 -1.47972806
## GENE_52 2.03739461
## GENE_53 -0.97653481
## GENE_54 0.46698627
## GENE_55 0.41925802
## GENE_56 2.33984325
## GENE_57 -0.97967562
## GENE_58 1.12448179
## GENE_59 -1.35078473
## GENE_60 -2.04203082
## GENE_61 1.78912046
## GENE_62 -1.21799216
## GENE_63 0.10850978
## GENE_64 -0.84328175
## GENE_65 0.84373001
## GENE_66 0.65336655
## GENE_67 -0.72506319
## GENE_68 -0.08322401
## GENE_69 0.50649188
## GENE_70 0.26481236
## GENE_71 0.65386287
## GENE_72 -1.50003017
## GENE_73 0.22496895
## GENE_74 0.12302293
## GENE_75 1.01257759
## GENE_76 -1.92928711
## GENE_77 1.44792248
## GENE_78 -1.13096718
## GENE_79 0.20673924
## GENE_80 -1.73895484
## GENE_81 0.96201179
## GENE_82 -1.65877596
## GENE_83 0.04103225
## GENE_84 2.97205914
## GENE_85 -0.96725382
## GENE_86 -0.27331904
## GENE_87 -2.10682531
## GENE_88 0.96085011
## GENE_89 -0.13206093
## GENE_90 0.20069683
## GENE_91 1.08832539
## GENE_92 2.30263375
## GENE_93 -0.21949505
## GENE_94 1.05742507
## GENE_95 -4.36060238
## GENE_96 -0.13152727
## GENE_97 0.35858972
## GENE_98 -1.88307384
## GENE_99 -2.88850179
## GENE_100 0.32071995
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.8959572 -1.3618301 -1.6385210 . -0.3508163 1.3486112
## [2,] 0.9726899 -0.6289266 0.8220171 . -0.5496118 -0.2319677
## [3,] 0.8501530 -1.1006372 -0.2394401 . 0.7790787 -1.2611292
## [4,] 0.9464958 0.7233123 -0.6984727 . 0.9405294 0.5704473
## [5,] 0.6155097 -0.6127545 -0.6548375 . 1.5691703 0.7680794
## ... . . . . . .
## [96,] 0.70820208 0.04647427 -1.24979700 . -1.331159166 1.076008567
## [97,] -0.53426595 -0.89413080 0.63636157 . -0.369306897 -1.116075189
## [98,] 1.01949382 -1.09399122 -0.65649551 . 0.498320402 -0.958025586
## [99,] -1.87653276 -0.27831781 1.35493503 . -1.112743373 0.208129806
## [100,] -0.64774073 1.82946573 0.54835700 . -0.002619921 -0.050055142
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.8959572 -1.3618301 -1.6385210 . -0.3508163 1.3486112
## [2,] 0.9726899 -0.6289266 0.8220171 . -0.5496118 -0.2319677
## [3,] 0.8501530 -1.1006372 -0.2394401 . 0.7790787 -1.2611292
## [4,] 0.9464958 0.7233123 -0.6984727 . 0.9405294 0.5704473
## [5,] 0.6155097 -0.6127545 -0.6548375 . 1.5691703 0.7680794
## ... . . . . . .
## [96,] 0.70820208 0.04647427 -1.24979700 . -1.331159166 1.076008567
## [97,] -0.53426595 -0.89413080 0.63636157 . -0.369306897 -1.116075189
## [98,] 1.01949382 -1.09399122 -0.65649551 . 0.498320402 -0.958025586
## [99,] -1.87653276 -0.27831781 1.35493503 . -1.112743373 0.208129806
## [100,] -0.64774073 1.82946573 0.54835700 . -0.002619921 -0.050055142
sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-apple-darwin20
## Running under: macOS Monterey 12.7.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.22 TileDBArray_1.19.1 DelayedArray_0.35.2
## [4] SparseArray_1.9.0 S4Arrays_1.9.1 IRanges_2.43.0
## [7] abind_1.4-8 S4Vectors_0.47.0 MatrixGenerics_1.21.0
## [10] matrixStats_1.5.0 BiocGenerics_0.55.0 generics_0.1.4
## [13] Matrix_1.7-3 BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.1
## [4] BiocManager_1.30.26 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0-1 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.49.0 tiledb_0.32.0
## [16] knitr_1.50 bookdown_0.43 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.5
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.1
## [28] lifecycle_1.0.4 data.table_1.17.6 evaluate_1.0.4
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.1 htmltools_0.5.8.1