--- title: "Using DelayedMatrix with MultiAssayExperiment" author: "MultiAssay Special Interest Group" date: "`r format(Sys.time(), '%B %d, %Y')`" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{HDF5Array and MultiAssayExperiment} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: number_sections: yes toc: yes --- # Integrating an HDF5 backend for MultiAssayExperiment ## Dependencies ```{r,include=TRUE,results="hide",message=FALSE,warning=FALSE} library(MultiAssayExperiment) library(HDF5Array) library(SummarizedExperiment) ``` ## HDF5Array and DelayedArray Constructor The `HDF5Array` package provides an on-disk representation of large datasets without the need to load them into memory. Convenient lazy evaluation operations allow the user to manipulate such large data files based on metadata. The `DelayedMatrix` class in the `DelayedArray` package provides a way to connect to a large matrix that is stored on disk. First, we create a small matrix for constructing the `DelayedMatrix` class. ```{r} smallMatrix <- matrix(rnorm(10e5), ncol = 20) ``` We add rownames and column names to the matrix object for compatibility with the `MultiAssayExperiment` representation. ```{r} rownames(smallMatrix) <- paste0("GENE", seq_len(nrow(smallMatrix))) colnames(smallMatrix) <- paste0("SampleID", seq_len(ncol(smallMatrix))) ``` Here we use the `DelayedArray` constructor function to create a `DelayedMatrix` object. ```{r} smallMatrix <- DelayedArray(smallMatrix) class(smallMatrix) # show method smallMatrix dim(smallMatrix) ``` ## Writing to a file with dimnames Finally, the `rhdf5` package stores `dimnames` in a standard location. In order to make use of this functionality, we would use `writeHDF5Array` with the `with.dimnames` argument: ```{r} testh5 <- tempfile(fileext = ".h5") writeHDF5Array(smallMatrix, filepath = testh5, name = "smallMatrix", with.dimnames = TRUE) ``` To see the file structure we use `h5ls`: ```{r} h5ls(testh5) ``` ## Importing HDF5 files Note that a large matrix from an HDF5 file can also be loaded using the `HDF5ArraySeed` and `DelayedArray` functions. ```{r} hdf5Data <- HDF5ArraySeed(file = testh5, name = "smallMatrix") newDelayedMatrix <- DelayedArray(hdf5Data) class(newDelayedMatrix) newDelayedMatrix ``` ## Using a `DelayedMatrix` with `MultiAssayExperiment` A `DelayedMatrix` alone conforms to the `MultiAssayExperiment` API requirements. Shown below, the `DelayedMatrix` can be put into a named `list` and passed into the `MultiAssayExperiment` constructor function. ```{r} HDF5MAE <- MultiAssayExperiment(experiments = list(smallMatrix = smallMatrix)) sampleMap(HDF5MAE) colData(HDF5MAE) ``` ### `SummarizedExperiment` with `DelayedMatrix` backend A more information rich `DelayedMatrix` can be created when used in conjunction with the `SummarizedExperiment` class and it can even include `rowRanges`. The flexibility of the `MultiAssayExperiment` API supports classes with minimal requirements. Additionally, this `SummarizedExperiment` with the `DelayedMatrix` backend can be part of a bigger `MultiAssayExperiment` object. Below is a minimal example of how this would work: ```{r} HDF5SE <- SummarizedExperiment(assays = smallMatrix) assay(HDF5SE) MultiAssayExperiment(list(HDF5SE = HDF5SE)) ``` Additional scenarios are currently in development where an `HDF5Matrix` is hosted remotely. Many opportunities exist when considering on-disk and off-disk representations of data with `MultiAssayExperiment`.