mist:methylation inference for single-cell along trajectory

Introduction

mist (Methylation Inference for Single-cell along Trajectory) is an R package for differential methylation (DM) analysis of single-cell DNA methylation (scDNAm) data. The package employs a Bayesian approach to model methylation changes along pseudotime and estimates developmental-stage-specific biological variations. It supports both single-group and two-group analyses, enabling users to identify genomic features exhibiting temporal changes in methylation levels or different methylation patterns between groups.

This vignette demonstrates how to use mist for: 1. Single-group analysis. 2. Two-group analysis.

Installation

To install the latest version of mist, run the following commands:

# Install devtools if you don't have it installed already
install.packages("devtools")

# Install mist from GitHub
devtools::install_github("https://github.com/dxd429/mist")

From Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("mist")

To view the package vignette in HTML format, run the following lines in R:

library(mist)
vignette("mist")

Example Workflow for Single-Group Analysis

In this section, we will estimate parameters and perform differential methylation analysis using single-group data.

Step 1: Load Example Data

Here we load the example data from GSE121708.

library(mist)
library(SingleCellExperiment)
# Load sample scDNAm data
Dat_sce <- readRDS(system.file("extdata", "sampleData_sce.rds", package = "mist"))

Step 2: Estimate Parameters Using estiParamSingle

# Estimate parameters for single-group
beta_sigma_list <- estiParamSingle(
    Dat_sce = Dat_sce,
    Dat_name = "Methy_level_group1",
    ptime_name = "pseudotime"
)

# Check the output
head(beta_sigma_list)
## $ENSMUSG00000000001
##      Beta_0      Beta_1      Beta_2      Beta_3      Beta_4    Sigma2_1 
##  1.24065094 -0.60122344  0.53793840  0.32636368  0.01947859  5.64780093 
##    Sigma2_2    Sigma2_3    Sigma2_4 
## 11.92614012  3.81190530  2.06809942 
## 
## $ENSMUSG00000000003
##     Beta_0     Beta_1     Beta_2     Beta_3     Beta_4   Sigma2_1   Sigma2_2 
##  1.5375190  0.4479362  8.6940476 -9.5927661  0.1342782 23.5911931  5.5250254 
##   Sigma2_3   Sigma2_4 
##  6.4150751  9.2355445 
## 
## $ENSMUSG00000000028
##       Beta_0       Beta_1       Beta_2       Beta_3       Beta_4     Sigma2_1 
##  1.296064866 -0.001949552  0.058216969  0.045935974  0.009727932  8.339542830 
##     Sigma2_2     Sigma2_3     Sigma2_4 
##  7.321258910  3.423641306  2.193556354 
## 
## $ENSMUSG00000000037
##    Beta_0    Beta_1    Beta_2    Beta_3    Beta_4  Sigma2_1  Sigma2_2  Sigma2_3 
##  1.033106 -4.315972 11.693381 -4.172590 -3.217091  8.652779 14.874852  8.460898 
##  Sigma2_4 
##  2.384070 
## 
## $ENSMUSG00000000049
##      Beta_0      Beta_1      Beta_2      Beta_3      Beta_4    Sigma2_1 
##  1.02050932 -0.10267896  0.13920771  0.08535147  0.04308801  5.84699324 
##    Sigma2_2    Sigma2_3    Sigma2_4 
##  7.95875841  2.93852958  1.20037150 
## 
## $ENSMUSG00000000056
##      Beta_0      Beta_1      Beta_2      Beta_3      Beta_4    Sigma2_1 
##  1.28615339 -0.01421717  0.09391707  0.05109912  0.03787798  9.18374844 
##    Sigma2_2    Sigma2_3    Sigma2_4 
## 12.01570920  5.26560686  3.23271594

Step 3: Perform Differential Methylation Analysis Using dmSingle

# Perform differential methylation analysis for the single-group
dm_results <- dmSingle(beta_sigma_list)

# View the top genomic features with drastic methylation changes
head(dm_results)
## ENSMUSG00000000568 ENSMUSG00000000486 ENSMUSG00000000282 ENSMUSG00000000223 
##         0.11442048         0.07927437         0.07071541         0.06252971 
## ENSMUSG00000000037 ENSMUSG00000000359 
##         0.06159561         0.05612552

Step 4: Perform Differential Methylation Analysis Using plotGene

# Produce scatterplot with fitted curve of a specific gene
plotGene(Dat_sce = Dat_sce,
         Dat_name = "Methy_level_group1",
         ptime_name = "pseudotime", 
         beta_sigma_list, 
         gene_name = "ENSMUSG00000000037")

Example Workflow for Two-Group Analysis

In this section, we will estimate parameters and perform DM analysis using data from two phenotypic groups.

Step 1: Load Two-Group Data

# Load two-group scDNAm data
Dat_path <- readRDS(system.file("extdata", "sampleData_sce.rds", package = "mist"))

Step 2: Estimate Parameters Using estiParamTwoGroups

# Estimate parameters for both groups
beta_sigma_list_group <- estiParamTwo(
    Dat_sce = Dat_sce,
    Dat_name_g1 = "Methy_level_group1",
    Dat_name_g2 = "Methy_level_group2",
    ptime_name_g1 = "pseudotime",
    ptime_name_g2 = "pseudotime_g2"
)

# Check the output
names(beta_sigma_list_group)
## [1] "Group1" "Group2"
head(beta_sigma_list_group[[1]], n = 3)
## $ENSMUSG00000000001
##      Beta_0      Beta_1      Beta_2      Beta_3      Beta_4    Sigma2_1 
##  1.25809847 -0.36104585  0.33367033  0.21237691  0.03162836  5.29707820 
##    Sigma2_2    Sigma2_3    Sigma2_4 
## 12.89618178  4.60961402  1.73212107 
## 
## $ENSMUSG00000000003
##    Beta_0    Beta_1    Beta_2    Beta_3    Beta_4  Sigma2_1  Sigma2_2  Sigma2_3 
##  1.596588  1.609934  2.711167 -1.647963 -2.964354 24.419095  2.403587  7.182017 
##  Sigma2_4 
##  9.128952 
## 
## $ENSMUSG00000000028
##      Beta_0      Beta_1      Beta_2      Beta_3      Beta_4    Sigma2_1 
##  1.30065755 -0.01271698  0.07336106  0.05571895  0.01570911  8.09229936 
##    Sigma2_2    Sigma2_3    Sigma2_4 
##  7.15913952  3.22609158  2.31954314
head(beta_sigma_list_group[[2]], n = 3)
## $ENSMUSG00000000001
##    Beta_0    Beta_1    Beta_2    Beta_3    Beta_4  Sigma2_1  Sigma2_2  Sigma2_3 
##  1.915505 -1.088066  5.828407 -3.343740 -1.579720  5.416206  6.306331  3.308247 
##  Sigma2_4 
##  1.524147 
## 
## $ENSMUSG00000000003
##     Beta_0     Beta_1     Beta_2     Beta_3     Beta_4   Sigma2_1   Sigma2_2 
## -0.8196619 -0.6818815  2.1500895 -0.9999998 -0.4106212  6.8252274 10.1149136 
##   Sigma2_3   Sigma2_4 
##  4.6254361  2.7659230 
## 
## $ENSMUSG00000000028
##     Beta_0     Beta_1     Beta_2     Beta_3     Beta_4   Sigma2_1   Sigma2_2 
##  2.2936810 -0.6681344  2.7488722 -0.9902494 -1.0090488 10.8988700  6.4611817 
##   Sigma2_3   Sigma2_4 
##  3.8892465  3.2592253

Step 3: Perform Differential Methylation Analysis for Two-Group Comparison Using dmTwoGroups

# Perform DM analysis to compare the two groups
dm_results_two <- dmTwoGroups(beta_sigma_list_group)

# View the top genomic features with different temporal patterns between groups
head(dm_results_two)
## ENSMUSG00000000568 ENSMUSG00000000392 ENSMUSG00000000326 ENSMUSG00000000295 
##         0.14312295         0.11152751         0.09498556         0.09017207 
## ENSMUSG00000000216 ENSMUSG00000000555 
##         0.07645545         0.07339488

Conclusion

mist provides a comprehensive suite of tools for analyzing scDNAm data along pseudotime, whether you are working with a single group or comparing two phenotypic groups. With the combination of Bayesian modeling and differential methylation analysis, mist is a powerful tool for identifying significant genomic features in scDNAm data.

Session info

## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0
##  [3] Biobase_2.67.0              GenomicRanges_1.59.1       
##  [5] GenomeInfoDb_1.43.2         IRanges_2.41.1             
##  [7] S4Vectors_0.45.2            BiocGenerics_0.53.3        
##  [9] generics_0.1.3              MatrixGenerics_1.19.0      
## [11] matrixStats_1.4.1           mist_0.99.3                
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1         farver_2.1.2             dplyr_1.1.4             
##  [4] Biostrings_2.75.1        bitops_1.0-9             fastmap_1.2.0           
##  [7] RCurl_1.98-1.16          GenomicAlignments_1.43.0 XML_3.99-0.17           
## [10] digest_0.6.37            lifecycle_1.0.4          survival_3.7-0          
## [13] magrittr_2.0.3           compiler_4.5.0           rlang_1.1.4             
## [16] sass_0.4.9               tools_4.5.0              utf8_1.2.4              
## [19] yaml_2.3.10              rtracklayer_1.67.0       knitr_1.49              
## [22] labeling_0.4.3           S4Arrays_1.7.1           curl_6.0.1              
## [25] DelayedArray_0.33.3      abind_1.4-8              BiocParallel_1.41.0     
## [28] withr_3.0.2              grid_4.5.0               fansi_1.0.6             
## [31] colorspace_2.1-1         ggplot2_3.5.1            scales_1.3.0            
## [34] MASS_7.3-61              mcmc_0.9-8               cli_3.6.3               
## [37] mvtnorm_1.3-2            rmarkdown_2.29           crayon_1.5.3            
## [40] httr_1.4.7               rjson_0.2.23             cachem_1.1.0            
## [43] zlibbioc_1.53.0          splines_4.5.0            parallel_4.5.0          
## [46] XVector_0.47.0           restfulr_0.0.15          vctrs_0.6.5             
## [49] Matrix_1.7-1             jsonlite_1.8.9           SparseM_1.84-2          
## [52] carData_3.0-5            car_3.1-3                MCMCpack_1.7-1          
## [55] Formula_1.2-5            jquerylib_0.1.4          glue_1.8.0              
## [58] codetools_0.2-20         gtable_0.3.6             BiocIO_1.17.1           
## [61] UCSC.utils_1.3.0         munsell_0.5.1            tibble_3.2.1            
## [64] pillar_1.9.0             htmltools_0.5.8.1        quantreg_5.99.1         
## [67] GenomeInfoDbData_1.2.13  R6_2.5.1                 evaluate_1.0.1          
## [70] lattice_0.22-6           Rsamtools_2.23.1         bslib_0.8.0             
## [73] MatrixModels_0.5-3       coda_0.19-4.1            SparseArray_1.7.2       
## [76] xfun_0.49                pkgconfig_2.0.3