--- title: "iscream compatible data structures" output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{iscream compatible data structures} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} #bibliography: refs.bib link-citations: yes --- The examples here show how iscream output can be converted into other data structures for further analysis. ```{r setup} library(iscream) data_dir <- system.file("extdata", package = "iscream") bedfiles <- list.files(data_dir, pattern = "[a|b|c|d].bed.gz$", full.names = TRUE) regions <- c(A = "chr1:1-6", B = "chr1:7-10", C = "chr1:11-14") ``` ## `r Biocpkg("GenomicRanges")` ```{r gr, message=FALSE} if (!require("GenomicRanges", quietly = TRUE)) { stop("The 'GenomicRanges' package must be installed for this functionality") } ``` `GRanges` objects can be used as the input regions to all of iscream's functions and can be returned by `tabix_gr()` and `make_mat_gr()`. ### From `tabix` queries `tabix_gr()` returns a `GenomicRanges` object. The `regions` parameter can be a string vector, a data frame or a `GRanges` object. If given a `GRanges` object with metadata, those columns will be preserved in the output. ```{r tabix_gr} gr <- GRanges(regions) values(gr) <- DataFrame( gene = c("gene1", "gene2", "gene3"), some_metadata = c("s1", "s2", "s3") ) gr tabix_gr(bedfiles[1], gr) ``` The `data.table` output of `tabix()` can also be piped into `GRanges`, but does not preserve input metadata. ```{r tabix} tabix(bedfiles[1], gr) |> makeGRangesFromDataFrame( starts.in.df.are.0based = TRUE, keep.extra.columns = TRUE ) ``` If the input BED file is not zero-based (e.g. Bismark coverage files), set `zero_based = FALSE` in the `tabix()` call to get the correct conversion from data frame to GenomicRanges. ### From `summarize_regions()` `summarize_regions()` returns a data frame with a feature column identifying each summary row's genomic region. If the region features are not named (see `?summarize_regions`), pass the `feature` column with the genomic regions as the GRanges input regions. Here, since the `regions` vector is named, using `unname` will cause the `feature` column to populate with regions strings: ```{r summarize_meth_regions} (summary <- summarize_meth_regions( bedfiles, unname(regions), fun = c("sum", "mean")) ) GRanges(summary$feature, summary = summary[, -1]) ``` If the input regions are named use `set_region_rownames = TRUE` so that the genomic regions strings are preserved and use them as the `GRanges` input regions. ```{r summarize_regions} (summary <- summarize_regions( bedfiles, regions, column = 4, set_region_rownames = TRUE, fun = c("sum", "mean")) ) GRanges(rownames(summary), summary = summary) ``` ### From `make_mat` `make_mat_gr()` returns a `GRanges` object for dense matrices. ```{r make_mat} make_mat_gr(bedfiles, regions, column = 4, mat_name = "beta") ``` ## `r Biocpkg("SummarizedExperiment")` `make_mat_se()` returns a `RangedSummarizedExperiment` for both sparse and dense matrices. ```{r mat_se, message=FALSE} if (!require("SummarizedExperiment", quietly = TRUE)) { stop("The 'SummarizedExperiment' package must be installed for this functionality") } make_mat_se(bedfiles, regions, column = 4, mat_name = "beta", sparse = TRUE) ``` ### Making `BSseq` objects A `r Biocpkg("bsseq")` object is a type of SummarizedExperiment, but it cannot handle sparse matrices: ```{r, message=FALSE} if (!require("bsseq", quietly = TRUE)) { stop("The 'bsseq' package must be installed for this functionality") } mats <- make_mat_bsseq(bedfiles, regions, sparse = FALSE) do.call(BSseq, mats) ``` ## Session info ```{r si} sessionInfo() ```