TileDBArray 1.18.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.0421274 1.0468158 -0.0700364 . 0.49453621 0.79417256
## [2,] 0.3182539 -1.7030798 0.8891478 . 0.30214164 1.24432531
## [3,] -0.1533577 -1.2295564 0.9758458 . -0.09806889 0.37846319
## [4,] -0.1460586 1.0256887 -2.4092284 . -1.90882417 1.97540590
## [5,] 1.3592640 -0.4060619 -1.2191937 . 1.20227284 0.62953378
## ... . . . . . .
## [96,] -1.380104862 0.296422444 -0.034571231 . 0.6721887 -0.2791695
## [97,] -1.311129702 -0.007575617 -0.311813940 . -0.8292955 -0.3682780
## [98,] 0.810983071 -1.156878066 0.524555342 . -0.7448223 -0.1864248
## [99,] 1.207536950 0.964030805 0.232851079 . 1.4373558 -0.6086856
## [100,] 2.227261320 0.873626053 -1.625586685 . -1.4345963 -0.3656842
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.0421274 1.0468158 -0.0700364 . 0.49453621 0.79417256
## [2,] 0.3182539 -1.7030798 0.8891478 . 0.30214164 1.24432531
## [3,] -0.1533577 -1.2295564 0.9758458 . -0.09806889 0.37846319
## [4,] -0.1460586 1.0256887 -2.4092284 . -1.90882417 1.97540590
## [5,] 1.3592640 -0.4060619 -1.2191937 . 1.20227284 0.62953378
## ... . . . . . .
## [96,] -1.380104862 0.296422444 -0.034571231 . 0.6721887 -0.2791695
## [97,] -1.311129702 -0.007575617 -0.311813940 . -0.8292955 -0.3682780
## [98,] 0.810983071 -1.156878066 0.524555342 . -0.7448223 -0.1864248
## [99,] 1.207536950 0.964030805 0.232851079 . 1.4373558 -0.6086856
## [100,] 2.227261320 0.873626053 -1.625586685 . -1.4345963 -0.3656842
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.0421274 1.0468158 -0.0700364 . 0.49453621 0.79417256
## GENE_2 0.3182539 -1.7030798 0.8891478 . 0.30214164 1.24432531
## GENE_3 -0.1533577 -1.2295564 0.9758458 . -0.09806889 0.37846319
## GENE_4 -0.1460586 1.0256887 -2.4092284 . -1.90882417 1.97540590
## GENE_5 1.3592640 -0.4060619 -1.2191937 . 1.20227284 0.62953378
## ... . . . . . .
## GENE_96 -1.380104862 0.296422444 -0.034571231 . 0.6721887 -0.2791695
## GENE_97 -1.311129702 -0.007575617 -0.311813940 . -0.8292955 -0.3682780
## GENE_98 0.810983071 -1.156878066 0.524555342 . -0.7448223 -0.1864248
## GENE_99 1.207536950 0.964030805 0.232851079 . 1.4373558 -0.6086856
## GENE_100 2.227261320 0.873626053 -1.625586685 . -1.4345963 -0.3656842
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 1.0421274 0.3182539 -0.1533577 -0.1460586 1.3592640 -0.1898066
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 1.04212744 1.04681577 -0.07003640 0.03644090 0.01732797
## GENE_2 0.31825395 -1.70307983 0.88914782 0.10530883 1.26573068
## GENE_3 -0.15335771 -1.22955644 0.97584580 0.12905100 -1.08241093
## GENE_4 -0.14605859 1.02568868 -2.40922842 0.01962360 -1.42868167
## GENE_5 1.35926395 -0.40606192 -1.21919371 -1.85377543 0.33760995
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 2.0842549 2.0936315 -0.1400728 . 0.9890724 1.5883451
## GENE_2 0.6365079 -3.4061597 1.7782956 . 0.6042833 2.4886506
## GENE_3 -0.3067154 -2.4591129 1.9516916 . -0.1961378 0.7569264
## GENE_4 -0.2921172 2.0513774 -4.8184568 . -3.8176483 3.9508118
## GENE_5 2.7185279 -0.8121238 -2.4383874 . 2.4045457 1.2590676
## ... . . . . . .
## GENE_96 -2.76020972 0.59284489 -0.06914246 . 1.3443774 -0.5583389
## GENE_97 -2.62225940 -0.01515123 -0.62362788 . -1.6585910 -0.7365559
## GENE_98 1.62196614 -2.31375613 1.04911068 . -1.4896445 -0.3728496
## GENE_99 2.41507390 1.92806161 0.46570216 . 2.8747115 -1.2173711
## GENE_100 4.45452264 1.74725211 -3.25117337 . -2.8691927 -0.7313684
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## 7.1200404 -7.4189468 0.5127972 -3.3258066 7.4078817 1.3680938 -6.5896608
## SAMP_8 SAMP_9 SAMP_10
## 11.6686220 -6.4098964 -8.1621305
out %*% runif(ncol(out))
## [,1]
## GENE_1 0.584011571
## GENE_2 0.885879281
## GENE_3 -0.417137154
## GENE_4 -2.195930848
## GENE_5 1.332320577
## GENE_6 0.840402720
## GENE_7 -1.181239147
## GENE_8 2.299593836
## GENE_9 -0.840564466
## GENE_10 -2.575719715
## GENE_11 2.053594956
## GENE_12 -0.771812147
## GENE_13 -1.387953758
## GENE_14 1.482654524
## GENE_15 -0.384170008
## GENE_16 -0.752002105
## GENE_17 1.301798673
## GENE_18 0.850167091
## GENE_19 0.594462061
## GENE_20 -3.268747174
## GENE_21 3.461469237
## GENE_22 1.062517840
## GENE_23 0.891272658
## GENE_24 -2.531667140
## GENE_25 -2.354477678
## GENE_26 -1.574684880
## GENE_27 4.303559299
## GENE_28 0.854653458
## GENE_29 4.043503192
## GENE_30 0.868960872
## GENE_31 -2.361521676
## GENE_32 1.323031078
## GENE_33 -2.762135270
## GENE_34 -0.827808221
## GENE_35 -0.273159085
## GENE_36 2.475547495
## GENE_37 -3.821344837
## GENE_38 -1.769124166
## GENE_39 -1.578580950
## GENE_40 0.004908437
## GENE_41 -2.534091163
## GENE_42 -1.539139889
## GENE_43 -3.149742659
## GENE_44 -0.380678408
## GENE_45 -1.813803637
## GENE_46 1.849697674
## GENE_47 1.766104579
## GENE_48 -1.108570223
## GENE_49 -3.102656611
## GENE_50 0.322419486
## GENE_51 0.345100258
## GENE_52 -0.974541549
## GENE_53 -0.556217455
## GENE_54 0.981192581
## GENE_55 2.039192204
## GENE_56 1.053086253
## GENE_57 1.184140208
## GENE_58 -0.633424040
## GENE_59 0.349483409
## GENE_60 -0.040194845
## GENE_61 -5.841193954
## GENE_62 0.941724738
## GENE_63 -2.096474575
## GENE_64 2.247262537
## GENE_65 -0.612168612
## GENE_66 1.186140375
## GENE_67 -1.918434396
## GENE_68 -1.021336901
## GENE_69 0.093749411
## GENE_70 -0.226988347
## GENE_71 -0.551805357
## GENE_72 -0.563297952
## GENE_73 -0.611382580
## GENE_74 -1.038515848
## GENE_75 0.627089162
## GENE_76 2.933365871
## GENE_77 -1.076796306
## GENE_78 4.726855784
## GENE_79 2.929821070
## GENE_80 -0.077876761
## GENE_81 1.896101486
## GENE_82 1.297664418
## GENE_83 -1.370285996
## GENE_84 -0.607674160
## GENE_85 0.594452436
## GENE_86 0.151886755
## GENE_87 1.042893928
## GENE_88 -1.675560118
## GENE_89 -0.478261708
## GENE_90 3.131148862
## GENE_91 3.007036462
## GENE_92 1.212823596
## GENE_93 2.515836262
## GENE_94 -0.437789694
## GENE_95 0.230020377
## GENE_96 0.221966430
## GENE_97 -1.536573646
## GENE_98 0.511727435
## GENE_99 2.135639024
## GENE_100 -1.122700985
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.22901968 -1.37849191 -0.09460866 . -1.7562796 0.3690051
## [2,] -2.10773738 -1.47567710 -0.39447941 . -0.4117401 0.8435158
## [3,] 3.20880458 -0.62788542 0.14499698 . -1.1173297 -0.9595698
## [4,] -0.17566209 -0.35642555 -0.45963393 . 0.3266065 1.0918240
## [5,] 0.11137647 0.43693641 -1.69947832 . 1.5989953 0.0559530
## ... . . . . . .
## [96,] 0.55798255 -0.88061777 0.26772187 . -0.3540679 -2.3281529
## [97,] 0.97122402 0.73326332 -0.49425651 . -0.8104330 -2.3596304
## [98,] -0.31448935 0.53001316 -0.51935528 . -1.0542701 0.1503223
## [99,] 0.04018307 -0.22144708 0.32877242 . -0.4777731 -1.7997643
## [100,] -2.22230279 0.28416559 -0.09274885 . 0.6117607 -1.1478358
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.22901968 -1.37849191 -0.09460866 . -1.7562796 0.3690051
## [2,] -2.10773738 -1.47567710 -0.39447941 . -0.4117401 0.8435158
## [3,] 3.20880458 -0.62788542 0.14499698 . -1.1173297 -0.9595698
## [4,] -0.17566209 -0.35642555 -0.45963393 . 0.3266065 1.0918240
## [5,] 0.11137647 0.43693641 -1.69947832 . 1.5989953 0.0559530
## ... . . . . . .
## [96,] 0.55798255 -0.88061777 0.26772187 . -0.3540679 -2.3281529
## [97,] 0.97122402 0.73326332 -0.49425651 . -0.8104330 -2.3596304
## [98,] -0.31448935 0.53001316 -0.51935528 . -1.0542701 0.1503223
## [99,] 0.04018307 -0.22144708 0.32877242 . -0.4777731 -1.7997643
## [100,] -2.22230279 0.28416559 -0.09274885 . 0.6117607 -1.1478358
sessionInfo()
## R version 4.5.0 RC (2025-04-04 r88126 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.21 TileDBArray_1.18.0 DelayedArray_0.34.0
## [4] SparseArray_1.8.0 S4Arrays_1.8.0 IRanges_2.42.0
## [7] abind_1.4-8 S4Vectors_0.46.0 MatrixGenerics_1.20.0
## [10] matrixStats_1.5.0 BiocGenerics_0.54.0 generics_0.1.3
## [13] Matrix_1.7-3 BiocStyle_2.36.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.48.0 tiledb_0.30.2
## [16] knitr_1.50 bookdown_0.43 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.4
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.0
## [28] lifecycle_1.0.4 data.table_1.17.0 evaluate_1.0.3
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.0 htmltools_0.5.8.1