Theoretic spectra generation of SIP-labeled compound

library(Aerith)
library(ggplot2)

Overview

Stable Isotope Probing (SIP) experiments demand accurate theoretical spectra to benchmark labeling efficiency, optimize acquisition parameters, and interpret downstream peptide identifications. Aerith bundles two complementary simulators that address those needs: a Monte Carlo fine-structure engine for chemically faithful profiles and an FFT-based accelerator for rapid hypothesis testing. This vignette walks through both workflows so you can pick the right tool for your dataset, tune the key parameters, and understand how the resulting plots inform experimental design. Along the way we highlight why Aerith’s implementation aligns with the current state of the art in SIP proteomics—namely, reproducible stochastic sampling, charge-state aware protonation handling, and vectorized FFT routines refined for high-resolution instrumentation.

Monte Carlo method

The Monte Carlo simulator cal_isotope_numbers_SIP() models isotopologue populations by explicitly sampling multinomial combinations while preserving isotope fine structure. This is most valuable when you expect partially labeled populations or need confidence intervals around the mean abundance pattern.

What you’ll learn in this section

How to simulate natural glucose isotopologues and heavily labeled analogs.
Which parameters govern accuracy versus runtime.
How to interpret the abundance plots for assessing labeling efficiency.

Before we run the code, note the main arguments you can adjust:

formula: the elemental composition (e.g., "C6H11O6"). Aerith automatically balances charge states.
num_simulations: number of Monte Carlo draws (default 10,000). Increase this for smoother estimates at the cost of runtime.
C13: fraction of \(^{13}\)C enrichment (0-1). Similar arguments exist for other isotopes; supply them when probing alternative labeling strategies.

iso1 <- cal_isotope_numbers_SIP("C6H11O6")
plotMolecularIsotopes(iso1) +
    ggtitle(expression(C[6] * H[11] * O[6]^"-" ~ 1.07 * "% " *
        {}^{
            13
        } * C))

The first plot establishes the baseline: a near-natural abundance distribution for deprotonated glucose. Peaks close to \(m/z\) shifts of one neutron reflect the low natural \(^{13}\)C content. Use this as a reference when evaluating experimental control samples.

iso2 <- cal_isotope_numbers_SIP("C6H11O6", num_simulations = 50000, C13 = 0.5)
plotMolecularIsotopes(iso2) +
    ggtitle(expression(C[6] * H[11] * O[6]^"-" ~ 50 * "% " *
        {}^{
            13
        } * C))
#> Warning: ggrepel: 13 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps

Here we increase the simulation depth to 50,000 for a smoother signal and set the \(^{13}\)C enrichment to 50%. The resulting spectrum shows pronounced heavier isotopologues, illustrating how increasing enrichment shifts intensity towards higher \(m/z\). When benchmarking wet-lab labeling protocols, compare observed spectra against this simulated target to estimate incorporation efficiency and identify incomplete labeling.

FFT-based algorithm

The FFT engine cal_isotope_peaks_fft() accelerates theoretical spectrum generation by assuming each isotope adds exactly one neutron mass (ignoring fine structure). This approximation is well suited for high-throughput scans where you need quick guidance on expected peak positions, especially for highly charged ions.

Key parameters to consider:

N_width: size of the FFT grid. Larger values (e.g., 200) capture more peaks but require additional compute. Start around 128-256 when modeling long peptides.
min_abundance: pruning threshold to drop negligible peaks, keeping plots readable.
C13: enrichment level; the same interpretation as above.

iso1 <- cal_isotope_peaks_fft("C6H12O6Na")
plotMolecularFFTisotopes(iso1) +
    ggtitle(expression(C[6] * H[12] * O[6] * Na^"+" ~ 1.07 * "% " *
        {}^{
            13
        } * C))

This visualization showcases the sodium-adducted glucose ion under natural abundance. Because the FFT method skips fine structure, peaks appear as discrete sticks at integer neutron offsets—ideal for fast charge-state annotation or predicting centroided spectra in DDA workflows.

iso2 <- cal_isotope_peaks_fft("C6H12O6Na", N_width = 200, min_abundance = 0.001, C13 = 0.5)
plotMolecularFFTisotopes(iso2) +
    ggtitle(expression(C[6] * H[12] * O[6] * Na^"+" ~ 50 * "% " *
        {}^{
            13
        } * C))

Adjusting N_width to 200 ensures heavier isotopologues remain within the FFT window, while a min_abundance of 0.001 filters noise-like contributions. The resulting shift towards heavier peaks mirrors the Monte Carlo outcome, confirming that both approaches agree on the macroscopic trend despite different underlying assumptions. Use this configuration when screening candidate metabolites or peptides for downstream targeted acquisition.

iso1 <- cal_isotope_peaks_fft("C6H11O6")
plotMolecularFFTisotopes(iso1) +
    ggtitle(expression(C[6] * H[11] * O[6]^"-" ~ 1.07 * "% " *
        {}^{
            13
        } * C))

The deprotonated glucose FFT spectrum closely matches the Monte Carlo baseline in peak ordering, validating the approximation for monoisotopic-focused analyses.

iso2 <- cal_isotope_peaks_fft("C6H11O6", N_width = 200, min_abundance = 0.001, C13 = 0.5)
plotMolecularFFTisotopes(iso2) +
    ggtitle(expression(C[6] * H[11] * O[6]^"-" ~ 50 * "% " *
        {}^{
            13
        } * C))

Under elevated \(^{13}\)C incorporation, heavier peaks again dominate. Comparing this output with experimental spectra quickly reveals whether the labeling strategy achieved the intended enrichment.

Together, these examples show that Aerith treats positive and negative charge states symmetrically. The package’s vectorized core keeps runtimes low even as you scale to heavily labeled species, enabling batch evaluation of hundreds of candidate compounds, which remains challenging for many state-of-the-art SIP simulators.

Summary

Use the Monte Carlo engine when you require chemically precise fine structure or want to propagate stochastic uncertainty into downstream interpretations.
Reach for the FFT workflow when you need fast, reasonably accurate projections across many charge states or compounds.
Inspect the plotted isotopologue distributions to validate labeling efficiency: the relative heights of heavier peaks directly reflect \(^{13}\)C incorporation.
Aerith’s dual-engine design and charge-aware handling deliver performance on par with, and often ahead of, contemporary SIP modeling frameworks.

Combine both approaches within a single study: first screen candidates with the FFT routine, then finalize acquisition parameters with the Monte Carlo simulator for the few molecules that merit fine-structure fidelity.

sessionInfo()
#> R Under development (unstable) (2025-10-20 r88955)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#> [1] C
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] tidyr_1.3.1    ggplot2_4.0.1  stringr_1.6.0  dplyr_1.1.4    Aerith_0.99.11
#> 
#> loaded via a namespace (and not attached):
#>  [1] rlang_1.1.6                 magrittr_2.0.4             
#>  [3] clue_0.3-66                 matrixStats_1.5.0          
#>  [5] compiler_4.6.0              systemfonts_1.3.1          
#>  [7] vctrs_0.6.5                 reshape2_1.4.5             
#>  [9] ProtGenerics_1.43.0         pkgconfig_2.0.3            
#> [11] MetaboCoreUtils_1.19.1      crayon_1.5.3               
#> [13] fastmap_1.2.0               XVector_0.51.0             
#> [15] labeling_0.4.3              rmarkdown_2.30             
#> [17] preprocessCore_1.73.0       ragg_1.5.0                 
#> [19] purrr_1.2.0                 xfun_0.54                  
#> [21] MultiAssayExperiment_1.37.2 cachem_1.1.0               
#> [23] jsonlite_2.0.0              DelayedArray_0.37.0        
#> [25] BiocParallel_1.45.0         parallel_4.6.0             
#> [27] cluster_2.1.8.1             R6_2.6.1                   
#> [29] bslib_0.9.0                 stringi_1.8.7              
#> [31] RColorBrewer_1.1-3          limma_3.67.0               
#> [33] GenomicRanges_1.63.0        jquerylib_0.1.4            
#> [35] Rcpp_1.1.0                  Seqinfo_1.1.0              
#> [37] SummarizedExperiment_1.41.0 iterators_1.0.14           
#> [39] knitr_1.50                  IRanges_2.45.0             
#> [41] Matrix_1.7-4                igraph_2.2.1               
#> [43] tidyselect_1.2.1            dichromat_2.0-0.1          
#> [45] abind_1.4-8                 yaml_2.3.11                
#> [47] doParallel_1.0.17           codetools_0.2-20           
#> [49] affy_1.89.0                 lattice_0.22-7             
#> [51] tibble_3.3.0                plyr_1.8.9                 
#> [53] Biobase_2.71.0              withr_3.0.2                
#> [55] S7_0.2.1                    evaluate_1.0.5             
#> [57] Spectra_1.21.0              pillar_1.11.1              
#> [59] affyio_1.81.0               BiocManager_1.30.27        
#> [61] MatrixGenerics_1.23.0       foreach_1.5.2              
#> [63] stats4_4.6.0                MSnbase_2.37.0             
#> [65] MALDIquant_1.22.3           ncdf4_1.24                 
#> [67] generics_0.1.4              S4Vectors_0.49.0           
#> [69] scales_1.4.0                glue_1.8.0                 
#> [71] lazyeval_0.2.2              tools_4.6.0                
#> [73] mzID_1.49.0                 data.table_1.17.8          
#> [75] QFeatures_1.21.0            vsn_3.79.0                 
#> [77] mzR_2.45.0                  fs_1.6.6                   
#> [79] XML_3.99-0.20               grid_4.6.0                 
#> [81] impute_1.85.0               MsCoreUtils_1.23.1         
#> [83] PSMatch_1.15.0              cli_3.6.5                  
#> [85] textshaping_1.0.4           S4Arrays_1.11.1            
#> [87] AnnotationFilter_1.35.0     pcaMethods_2.3.0           
#> [89] gtable_0.3.6                sass_0.4.10                
#> [91] digest_0.6.39               BiocGenerics_0.57.0        
#> [93] SparseArray_1.11.7          ggrepel_0.9.6              
#> [95] farver_2.1.2                htmltools_0.5.8.1          
#> [97] lifecycle_1.0.4             statmod_1.5.1              
#> [99] MASS_7.3-65