--- title: "The `waddR` package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{waddR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction `waddR` is an R package that provides a 2-Wasserstein distance based statistical test for detecting and describing differential distributions in one-dimensional data. Functions for wasserstein distance calculation, differential distribution testing, and a specialized test for differential expression in scRNA data are provided. The package `waddR` provides three sets of utilities to cover distinct use cases, each described in a separate vignette: * Fast and accurate [calculation of the 2-Wasserstein distance](wasserstein_metric.html) * [Two-sample test](wasserstein_test.html) to check for differences between two distributions * Detect [differential gene expression distributions](wasserstein_singlecell.html) in scRNAseq data These are bundled into the same package, because they are internally dependent: The procedure for detecting differential distributions in single-cell data is a refinement of the general two-sample test, which itself uses the 2-Wasserstein distance to compare two distributions. ### Wasserstein Distance functions The 2-Wasserstein distance is a metric to describe the distance between two distributions, representing two diferent conditions A and B. This package specifically considers the squared 2-Wasserstein distance d := W^2 which offers a decomposition into location, size, and shape terms. The package `waddR` offers three functions to calculate the 2-Wasserstein distance, all of which are implemented in Cpp and exported to R with Rcpp for better performance. The function `wasserstein_metric` is a Cpp reimplementation of the function `wasserstein1d` from the package `transport` and offers the most exact results. The functions `squared_wass_approx` and `squared_wass_decomp` compute approximations of the squared 2-Wasserstein distance with `squared_wass_decomp` also returning the decomosition terms for location, size, and shape. See `?wasserstein_metric`, `?squared_wass_aprox`, and `?squared_wass_decomp`. ### Two-Sample Testing This package provides two testing procedures using the 2-Wasserstein distance to test whether two distributions F_A and F_B given in the form of samples are different ba specifically testing the null hypothesis H0: F_A = F_B against the alternative hypothesis H1: F_A != F_B. The first, semi-parametric (SP), procedure uses a test based on permutations combined with a generalized pareto distribution approximation to estimate small pvalues accurately. The second procedure (ASY) uses a test based on asymptotic theory which is valid only if the samples can be assumed to come from continuous distributions. See `?wasserstein.test` for more details. ### Single Cell Test: The waddR package provides an adaptation of the semi-parametric testing procedure based on the 2-Wasserstein distance which is specifically tailored to identify differential distributions in single-cell RNA-seqencing (scRNA-seq) data. In particular, a two-stage (TS) approach has been implemented that takes account of the specific nature of scRNA-seq data by separately testing for differential proportions of zero gene expression (using a logistic regression model) and differences in non-zero gene expression (using the semi-parametric 2-Wasserstein distance-based test) between two conditions. See the documentation of the single cell procedure `?wasserstein.sc` and the test for zero expression levels `?testZeroes` for more details. ## Installation To install `waddR` from Bioconductor, use `BiocManager` with the following commands: ```{r install, eval=FALSE, echo=TRUE} if (!requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install("MyPackage") ``` Using `BiocManager`, the package can also be installed from github directly: ```{r install-github, eval=FALSE, echo=TRUE} BiocManager::install("goncalves-lab/waddR") ``` The package `waddR` can then be used in R: ```{r load-package} library("waddR") ``` ## Session Info ```{r session-info} sessionInfo() ```