Contents

1 Overview

The alabaster.matrix package implements methods to save matrix-like objects to file artifacts and load them back into R. Check out the alabaster.base for more details on the motivation and the alabaster framework.

2 Quick start

Given an array-like object, we can use saveObject() to save it inside a staging directory:

library(Matrix)
y <- rsparsematrix(1000, 100, density=0.05)

library(alabaster.matrix)
tmp <- tempfile()
saveObject(y, tmp)

list.files(tmp, recursive=TRUE)
## [1] "OBJECT"    "matrix.h5"

We then load it back into our R session with loadObject(). This creates a HDF5-backed S4 array that can be easily coerced into the desired format, e.g., a dgCMatrix.

roundtrip <- readObject(tmp)
class(roundtrip)
## [1] "ReloadedMatrix"
## attr(,"package")
## [1] "alabaster.matrix"

This process is supported for all base arrays, Matrix objects and DelayedArray objects.

3 Saving delayed operations

For DelayedArrays, we may instead choose to save the delayed operations themselves to file. This creates a HDF5 file following the chihaya format, containing the delayed operations rather than the results of their evaluation.

library(DelayedArray)
y <- DelayedArray(rsparsematrix(1000, 100, 0.05))
y <- log1p(abs(y) / 1:100) # adding some delayed ops.

tmp <- tempfile()
saveObject(y, tmp, DelayedArray.preserve.ops=TRUE)

# Inspecting the HDF5 file reveals many delayed operations:
rhdf5::h5ls(file.path(tmp, "array.h5"))
##                            group          name       otype  dclass   dim
## 0                              / delayed_array   H5I_GROUP              
## 1                 /delayed_array        method H5I_DATASET  STRING ( 0 )
## 2                 /delayed_array          seed   H5I_GROUP              
## 3            /delayed_array/seed         along H5I_DATASET INTEGER ( 0 )
## 4            /delayed_array/seed        method H5I_DATASET  STRING ( 0 )
## 5            /delayed_array/seed          seed   H5I_GROUP              
## 6       /delayed_array/seed/seed        method H5I_DATASET  STRING ( 0 )
## 7       /delayed_array/seed/seed          seed   H5I_GROUP              
## 8  /delayed_array/seed/seed/seed     by_column H5I_DATASET INTEGER ( 0 )
## 9  /delayed_array/seed/seed/seed          data H5I_DATASET   FLOAT  5000
## 10 /delayed_array/seed/seed/seed      dimnames   H5I_GROUP              
## 11 /delayed_array/seed/seed/seed       indices H5I_DATASET INTEGER  5000
## 12 /delayed_array/seed/seed/seed        indptr H5I_DATASET INTEGER   101
## 13 /delayed_array/seed/seed/seed         shape H5I_DATASET INTEGER     2
## 14           /delayed_array/seed          side H5I_DATASET  STRING ( 0 )
## 15           /delayed_array/seed         value H5I_DATASET INTEGER  1000
# And indeed, we can recover those same operations.
readObject(tmp)
## <1000 x 100> sparse ReloadedMatrix object of type "double":
##               [,1]       [,2]       [,3] ...       [,99]      [,100]
##    [1,]  0.0000000  0.0000000  0.0000000   .           0           0
##    [2,]  0.0000000  0.0000000  0.0000000   .           0           0
##    [3,]  0.2363888  0.0000000  0.0000000   .           0           0
##    [4,]  0.0000000  0.0000000  0.0000000   .           0           0
##    [5,]  0.0000000  0.0000000  0.0000000   .           0           0
##     ...          .          .          .   .           .           .
##  [996,] 0.00000000 0.00000000 0.00000000   . 0.000000000 0.000000000
##  [997,] 0.00000000 0.00000000 0.00000000   . 0.000000000 0.002368327
##  [998,] 0.00000000 0.00000000 0.00112182   . 0.000000000 0.014184635
##  [999,] 0.00000000 0.00000000 0.00000000   . 0.000000000 0.000000000
## [1000,] 0.00000000 0.00000000 0.00000000   . 0.000000000 0.000000000

This allows users to avoid evaluation of the operations when saving objects, which may improve efficiency, e.g., by avoiding loss of sparsity or casting to a larger type.

Session information

sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] DelayedArray_0.33.3    SparseArray_1.7.2      S4Arrays_1.7.1        
##  [4] abind_1.4-8            IRanges_2.41.2         S4Vectors_0.45.2      
##  [7] MatrixGenerics_1.19.0  matrixStats_1.4.1      BiocGenerics_0.53.3   
## [10] generics_0.1.3         alabaster.matrix_1.7.4 alabaster.base_1.7.2  
## [13] Matrix_1.7-1           BiocStyle_2.35.0      
## 
## loaded via a namespace (and not attached):
##  [1] jsonlite_1.8.9          compiler_4.5.0          BiocManager_1.30.25    
##  [4] crayon_1.5.3            Rcpp_1.0.13-1           rhdf5filters_1.19.0    
##  [7] jquerylib_0.1.4         yaml_2.3.10             fastmap_1.2.0          
## [10] lattice_0.22-6          R6_2.5.1                XVector_0.47.0         
## [13] knitr_1.49              bookdown_0.41           bslib_0.8.0            
## [16] rlang_1.1.4             HDF5Array_1.35.2        cachem_1.1.0           
## [19] xfun_0.49               sass_0.4.9              cli_3.6.3              
## [22] Rhdf5lib_1.29.0         zlibbioc_1.53.0         digest_0.6.37          
## [25] grid_4.5.0              alabaster.schemas_1.7.0 rhdf5_2.51.0           
## [28] lifecycle_1.0.4         evaluate_1.0.1          rmarkdown_2.29         
## [31] tools_4.5.0             htmltools_0.5.8.1