---
title: "Using schex with Seurat"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Seurat_schex}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
library(ggplot2)
theme_set(theme_classic())
```

Reduced dimension plotting is one of the essential tools for the analysis of 
single cell data. However, as the number of cells/nuclei in these plots 
increases, the usefulness of these plots decreases. Many cells are plotted
on top of each other obscuring information, even when taking advantage of
transparency settings. This package provides binning strategies of cells/nuclei
into hexagon cells. Plotting summarized information of all cells/nuclei in their 
respective hexagon cells presents information without obstructions. The
package seemlessly works with the two most common object classes for the storage
of single cell data; `SingleCellExperiment` from the 
[SingleCellExperiment](https://bioconductor.org/packages/3.9/bioc/html/SingleCellExperiment.html) 
package and `Seurat` from the [Seurat](https://satijalab.org/seurat/) package. In
this vignette I will be presenting the use of `schex` for `Seurat` objects.

## Load libraries

```{r load-libraries, message=FALSE, warning=FALSE}
library(schex)
library(dplyr)
library(scater)
library(Seurat)
library(TENxPBMCData)
```

## Setup single cell data

In order to demonstrate the capabilities of the schex package, I will use the
a dataset of Peripheral Blood Mononuclear Cells (PBMC) freely available from 
10x Genomics. There are 2,700 single cells that were sequenced on the 
Illumina NextSeq 500. This data is handly available in the [`TENxPBMCData` package](http://bioconductor.org/packages/release/data/experiment/html/TENxPBMCData.html). 
Note that we will then have to convert the `SingleCellExperiment` object to a 
`Seurat` object first.

```{r load, eval=TRUE}
tenx_pbmc3k <- TENxPBMCData(dataset = "pbmc3k")

rownames(tenx_pbmc3k) <- uniquifyFeatureNames(rowData(tenx_pbmc3k)$ENSEMBL_ID, 
    rowData(tenx_pbmc3k)$Symbol_TENx)

pbmc <- as.Seurat(tenx_pbmc3k, data = NULL)
```

In the next few sections, I will perform some simple quality control steps
outlined in the [Seurat vignette](https://satijalab.org/seurat/v3.0/pbmc3k_tutorial.html). 
I will then calculate various dimension 
reductions and cluster the data also outlined in the vignette.

## Standard pre-processing workflow

### Normalization

Next a global-scaling normalization method is employed to normalizes the 
feature expression measurements for each cell.

```{r norm, message=FALSE, warning=FALSE}
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", 
    scale.factor = 10000, verbose=FALSE)
```

### Identification of highly variable genes

Many of the downstream methods are based on only the highly variable genes, 
hence we require their identification.

```{r highly-variable, message=FALSE, warning=FALSE}
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000,
    verbose = FALSE)
```

### Scaling

Prior to dimension reduction the data is scaled.

```{r scaling, message=FALSE, warning=FALSE}
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes, verbose = FALSE)
```

### Perform dimensionality reductions

First a PCA is applied to the data. Using the PCA you will have to decide on the
dimensionality of the data. Here the dimensionality was decided to be 10. 
Please refer to the original Seurat vignette for methods on how this is 
assessed. 

```{r pca, message=FALSE, warning=FALSE}
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc), 
      verbose = FALSE)
```

Next a UMAP dimensionality reduction is also run. Since there is a random 
component in the UMAP, we will set a seed.

```{r umap, message=FALSE, warning=FALSE}
set.seed(10)
pbmc <- RunUMAP(pbmc, dims = 1:10, verbose=FALSE)
```

## Plotting single cell data

At this stage in the workflow we usually would like to plot aspects of our data
in one of the reduced dimension representations. Instead of plotting this in an
ordinary fashion, I will demonstrate how schex can provide a better way of 
plotting this.

#### Calculate hexagon cell representation

First, I will calculate the hexagon cell representation for each cell for
a specified dimension reduction representation. I decide to use `nbins=40` which
specifies that I divide my x range into 40 bins. Note that this might be a 
parameter that you want to play around with depending on the number of cells/
nuclei in your dataset. Generally, for more cells/nuclei, `nbins` should be 
increased.

```{r calc-hexbin}
pbmc <- make_hexbin(pbmc, nbins = 40, 
    dimension_reduction = "UMAP")
```

#### Plot number of cells/nuclei in each hexagon cell

First I plot how many cells are in each hexagon cell. This should be
relatively even, otherwise change the `nbins` parameter in the previous 
calculation.

```{r plot-density, fig.height=7, fig.width=7}
plot_hexbin_density(pbmc)
```

#### Plot meta data in hexagon cell representation

Next I colour the hexagon cells by some meta information, such as the median 
total count in each hexagon cell.

```{r plot-meta-1, fig.height=7, fig.width=7}
pbmc$nCount_RNA <- colSums(GetAssayData(pbmc, assay="RNA", "data"))
plot_hexbin_meta(pbmc, col="nCount_RNA", action="median")
```

#### Plot gene expression in hexagon cell representation

Finally, I will visualize the gene expression of the CD19 gene in the 
hexagon cell representation.

```{r plot-gene, fig.height=7, fig.width=7}
gene_id <-"CD19"
schex::plot_hexbin_feature(pbmc, type="scale.data", feature=gene_id, 
    action="mean", xlab="UMAP1", ylab="UMAP2", 
    title=paste0("Mean of ", gene_id))
```

### Understanding `schex` output as `ggplot` objects

The `schex` packages renders ordinary `ggplot` objects and thus these can be 
treated and manipulated using the 
[`ggplot` grammar](https://ggplot2.tidyverse.org/). For example the non-data
components of the plots can be changed using the function `theme`.

```{r}
gene_id <-"CD19"
gg <- schex::plot_hexbin_feature(pbmc, type="scale.data", feature=gene_id, 
    action="mean", xlab="UMAP1", ylab="UMAP2", 
    title=paste0("Mean of ", gene_id))
gg + theme_void()
```

The fact that `schex` renders `ggplot` objects can also be used to save these 
plots. Simply use `ggsave` in order to save any created plot.

```{r, eval=FALSE}
ggsave(gg, file="schex_plot.pdf")
```