---
title: "iscream compatible data structures"
output:
  BiocStyle::html_document:
    toc_float: true
vignette: >
  %\VignetteIndexEntry{iscream compatible data structures}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
#bibliography: refs.bib
link-citations: yes
---

The examples here show how iscream output can be converted into other data
structures for further analysis.

```{r setup}
library(iscream)
data_dir <- system.file("extdata", package = "iscream")
bedfiles <- list.files(data_dir, pattern = "[a|b|c|d].bed.gz$", full.names = TRUE)
regions <- c(A = "chr1:1-6", B = "chr1:7-10", C = "chr1:11-14")
```

## `r Biocpkg("GenomicRanges")`

```{r gr, message=FALSE}
if (!require("GenomicRanges", quietly = TRUE)) {
  stop("The 'GenomicRanges' package must be installed for this functionality")
}
```

`GRanges` objects can be used as the input regions to all of iscream's functions
and can be returned by `tabix_gr()` and `make_mat_gr()`.

### From `tabix` queries

`tabix_gr()` returns a `GenomicRanges` object. The `regions` parameter can be a
string vector, a data frame or a `GRanges` object. If given a `GRanges` object
with metadata, those columns will be preserved in the output.


```{r tabix_gr}
gr <- GRanges(regions)
values(gr) <- DataFrame(
    gene = c("gene1", "gene2", "gene3"),
    some_metadata = c("s1", "s2", "s3")
)
gr
tabix_gr(bedfiles[1], gr)
```

The `data.table` output of `tabix()` can also be piped into `GRanges`, but does
not preserve input metadata.

```{r tabix}
tabix(bedfiles[1], gr) |>
  makeGRangesFromDataFrame(
    starts.in.df.are.0based = TRUE,
    keep.extra.columns = TRUE
  )
```

If the input BED file is not zero-based (e.g. Bismark coverage files), set
`zero_based = FALSE` in the `tabix()` call to get the correct conversion from
data frame to GenomicRanges.

### From `summarize_regions()`

`summarize_regions()` returns a data frame with a feature column identifying
each summary row's genomic region.

If the region features are not named (see `?summarize_regions`), pass the
`feature` column with the genomic regions as the GRanges input regions. Here,
since the `regions` vector is named, using `unname` will cause the `feature`
column to populate with regions strings:

```{r summarize_meth_regions}
(summary <- summarize_meth_regions(
  bedfiles,
  unname(regions),
  fun = c("sum", "mean"))
)
GRanges(summary$feature, summary = summary[, -1])
```

If the input regions are named use `set_region_rownames = TRUE` so that the
genomic regions strings are preserved and use them as the `GRanges` input
regions.

```{r summarize_regions}
(summary <- summarize_regions(
  bedfiles,
  regions,
  column = 4,
  set_region_rownames = TRUE,
  fun = c("sum", "mean"))
)
GRanges(rownames(summary), summary = summary)
```

### From `make_mat`

`make_mat_gr()` returns a `GRanges` object for dense matrices.

```{r make_mat}
make_mat_gr(bedfiles, regions, column = 4, mat_name = "beta")
```

## `r Biocpkg("SummarizedExperiment")`

`make_mat_se()` returns a `RangedSummarizedExperiment` for both sparse and dense
matrices.

```{r mat_se, message=FALSE}
if (!require("SummarizedExperiment", quietly = TRUE)) {
  stop("The 'SummarizedExperiment' package must be installed for this functionality")
}
make_mat_se(bedfiles, regions, column = 4, mat_name = "beta", sparse = TRUE)
```

### Making `BSseq` objects

A `r Biocpkg("bsseq")` object is a type of SummarizedExperiment, but it cannot
handle sparse matrices:

```{r, message=FALSE}
if (!require("bsseq", quietly = TRUE)) {
  stop("The 'bsseq' package must be installed for this functionality")
}
mats <- make_mat_bsseq(bedfiles, regions, sparse = FALSE)
do.call(BSseq, mats)
```

## Session info

```{r si}
sessionInfo()
```