library(Aerith)
library(dplyr)
library(stringr)
library(ggplot2)
Peptide-spectrum matching (PSM) annotation and visualization are fundamental components of proteomics data analysis, particularly in stable isotope probing (SIP) experiments. This vignette demonstrates how to use the Aerith package to annotate and visualize PSMs from both unlabeled and stable isotope labeled samples.
PSM annotation involves matching theoretical peptide fragmentation patterns with observed mass spectrometry data. In SIP experiments, this process becomes more complex due to isotopic labeling, which shifts the mass-to-charge (m/z) ratios of peptide fragments. Aerith addresses this challenge by:
The Aerith package offers several advantages over existing tools:
This vignette demonstrates these capabilities using two representative examples: an unlabeled PSM at natural 13C abundance (1.07%) and a heavily labeled PSM at 50% 13C incorporation.
This section demonstrates PSM annotation for a peptide at natural 13C abundance, representing a control or unlabeled condition in SIP experiments. Natural abundance labeling provides a baseline for comparison with isotope-enriched samples.
The first step in PSM validation involves annotating observed MS2 peaks with theoretical B and Y ion fragments. This process identifies which theoretical fragments are actually observed in the spectrum and assesses the quality of the peptide identification.
demo_file <- system.file("extdata", "107728.FT2", package = "Aerith")
scan1 <- readOneScanMS2(ftFile = demo_file, 107728)
anno <- annotatePSM(
scan1$peaks$mz, scan1$peaks$intensity,
scan1$peaks$charge,
"HSQVFSTAEDNQSAVTIHVLQGER", 1:2, "C13",
0.0107, scan1$isolationWindowCenterMZ, 4.0
)
head(anno$ExpectedBYions[anno$ExpectedBYions$matchedIndices != -1, ])
#> mz intensity charge ionkind residuePositions matchedIndices
#> 6 225.0988 0.88660208 1 B 2 161
#> 7 226.1014 0.10174102 1 BisotopicPeak 2 163
#> 12 353.1573 0.82919054 1 B 3 476
#> 13 354.1600 0.14751437 1 BisotopicPeak 3 477
#> 14 355.1621 0.02092744 1 BisotopicPeak 3 480
#> 18 452.2258 0.78016665 1 B 4 771
#> SIPabundances
#> 6 1.791241
#> 7 1.791241
#> 12 2.694015
#> 13 2.694015
#> 14 2.694015
#> 18 1.640479
residuePos <- anno$ExpectedBYions$residuePositions[anno$ExpectedBYions$matchedIndices != -1]
table(residuePos)
#> residuePos
#> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
#> 3 4 8 8 8 6 11 8 16 8 13 15 17 15 8 8 5 10 9 8 4 6 12
Parameter Explanation:
scan1$peaks$mz, scan1$peaks$intensity, scan1$peaks$charge: Observed peak data from the MS2 spectrum"HSQVFSTAEDNQSAVTIHVLQGER": The peptide sequence to be annotated1:2: Charge states to consider for fragment ions (singly and doubly charged)"C13": The isotope being tracked (13C in this case)0.0107: The isotope incorporation probability (1.07% for natural 13C abundance)scan1$isolationWindowCenterMZ: The m/z center of the isolation window used for MS24.0: The isolation window width in Da for precursor selectionResult Interpretation:
The annotation results show which theoretical B and Y ions match observed peaks. The matchedIndices column indicates successful matches (non-negative values), while residuePositions shows which amino acid positions in the peptide contribute to the matched fragments. The table of residue positions reveals the fragmentation pattern and peptide coverage, which are critical for assessing PSM confidence.
The visualization of annotated PSMs provides immediate visual validation of the peptide identification. This plot overlays theoretical fragment ions onto the observed spectrum, allowing researchers to assess the quality of the match and identify potential issues.
set.seed(9527)
p <- plotPSMannotation(
observedSpect = getRealScanFromList(scan1),
pep = "HSQVFSTAEDNQSAVTIHVLQGER", Atom = "C13", Prob = 0.01,
charges = 1:2, isoCenter = 886.65, isoWidth = 4.0,
ifRemoveNotFoundIon = TRUE
)
p
Plot Features and Significance:
This annotation plot displays several key features:
Parameter Guidance:
Prob = 0.01: Set to match natural 13C abundance (approximately 1%)isoCenter = 886.65: The precursor m/z used for isolation window centeringisoWidth = 4.0: Isolation window width should match instrumental settingsifRemoveNotFoundIon = TRUE: Removes theoretical ions with no corresponding observed peaks, reducing visual clutterThe resulting visualization enables manual validation of the automated PSM annotation, which is particularly important for SIP experiments where isotopic shifts can affect fragment matching accuracy.
This analysis combines theoretical fragment ion predictions with observed spectrum data to provide a complete picture of peptide fragmentation. The approach is particularly valuable for understanding how isotopic labeling affects fragment ion patterns.
a <- getSipBYionSpectra("HSQVFSTAEDNQSAVTIHVLQGER", "C13", 0.01, 1:2)
slot(a, "spectra") <- slot(a, "spectra")[slot(a, "spectra")$MZ < 2000, ]
p <- plot(a)
p <- p + plotSipBYionLabel(a)
demo_file <- system.file("extdata", "107728.FT2", package = "Aerith")
b <- readAllScanMS2(demo_file)
c <- getRealScan(107728, b)
p <- p + plotRealScan(c)
p
Analysis Workflow:
getSipBYionSpectra() calculates expected B and Y ion m/z values considering isotopic labelingplotSipBYionLabel() adds ion annotations for easy identificationAdvantages of this Approach:
This visualization is particularly powerful for SIP experiments because it demonstrates how well the isotopic labeling model predicts the observed fragment pattern, which is crucial for accurate peptide quantification in labeled samples.
Precursor ion analysis is essential for validating peptide identification and understanding isotopic incorporation. This analysis compares the observed precursor isotopic pattern with theoretical predictions, providing confidence in both peptide identity and labeling quantification.
demo_file <- system.file("extdata", "107695.FT1", package = "Aerith")
ft1 <- readOneScanMS1(demo_file, 107695)
precursorScan1 <- getRealScanFromList(ft1)
pep <- "HSQVFSTAEDNQSAVTIHVLQGER"
precursorSP <- getSipPrecursorSpectra(pep, Prob = 0.0107, charges = 3)
slot(precursorSP, "spectra")$Kind <- "Expected"
xlimit <- slot(precursorScan1, "spectra")$MZ > 880 & slot(precursorScan1, "spectra")$MZ < 890
slot(precursorScan1, "spectra") <- slot(precursorScan1, "spectra")[xlimit, ]
slot(precursorScan1, "spectra")$Kind <- "Observed"
maxInt <- max(slot(precursorScan1, "spectra")$Prob)
slot(precursorScan1, "spectra")$Prob <- slot(precursorScan1, "spectra")$Prob / maxInt * 100
p <- plot(precursorSP, linewidth = 0.3) + plotRealScan(precursorScan1, linewidth = 0.3) +
scale_x_continuous(breaks = seq(880, 890, by = 1)) +
theme(legend.title = element_blank()) +
scale_color_manual(values = c("#E7872B", "#F3082F"))
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.
p
Key Analysis Components:
getSipPrecursorSpectra() calculates expected isotopic distributionInterpretation of Results:
The precursor analysis reveals several important features:
Significance for SIP Experiments:
This type of precursor analysis is particularly valuable in SIP experiments because:
The close agreement between theoretical and observed patterns in this unlabeled sample establishes a baseline for comparison with isotope-enriched samples.
This section demonstrates the power of Aerith for analyzing heavily labeled samples, where 50% 13C incorporation represents a significant metabolic labeling experiment. High levels of isotopic incorporation create complex isotopic patterns that require sophisticated computational approaches for accurate analysis.
Analyzing fragment ions from heavily labeled peptides presents unique challenges due to substantial mass shifts and complex isotopic envelopes. Aerith’s probabilistic approach accounts for these complexities, enabling accurate annotation even at high labeling levels.
demo_file <- system.file("extdata", "X13_4068_2596_8182.FT2", package = "Aerith")
scan1 <- readAllScanMS2(ftFile = demo_file)[["2596"]]
anno <- annotatePSM(
scan1$peaks$mz, scan1$peaks$intensity,
scan1$peaks$charge,
"HYAHVDCPGHADYVK", 1:2, "C13",
0.52, scan1$isolationWindowCenterMZ, 5.0
)
head(anno$ExpectedBYions[anno$ExpectedBYions$matchedIndices != -1, ])
#> mz intensity charge ionkind residuePositions matchedIndices
#> 11 302.1334 0.0002627402 1 BisotopicPeak 2 60
#> 12 303.1368 0.0019950502 1 BisotopicPeak 2 61
#> 13 304.1401 0.0093803799 1 BisotopicPeak 2 62
#> 14 305.1435 0.0305447981 1 BisotopicPeak 2 63
#> 15 306.1468 0.0729722673 1 BisotopicPeak 2 64
#> 16 307.1501 0.1321546583 1 BisotopicPeak 2 65
#> SIPabundances
#> 11 50.70168
#> 12 50.70168
#> 13 50.70168
#> 14 50.70168
#> 15 50.70168
#> 16 50.70168
residuePos <- anno$ExpectedBYions$residuePositions[anno$ExpectedBYions$matchedIndices != -1]
table(residuePos)
#> residuePos
#> 1 2 3 4 5 6 8 9 10 11 12 13 14
#> 7 17 24 22 19 11 9 18 23 21 46 32 21
Critical Parameters for Heavy Labeling:
0.52: The 52% 13C incorporation probability reflects substantial metabolic labeling5.0: Wider isolation window (5.0 Da or 4.0 Da) accommodates the broadened isotopic envelope"HYAHVDCPGHADYVK": A peptide sequenceChallenges and Solutions in Heavy Labeling:
Biological Significance:
The successful annotation of this heavily labeled PSM demonstrates active metabolism and protein synthesis in the experimental system. The 52% incorporation level indicates:
This level of labeling is typically achieved in controlled laboratory experiments and represents an ideal scenario for quantitative SIP analysis.
The visualization of heavily labeled PSMs reveals the dramatic effects of isotopic incorporation on mass spectra. This plot demonstrates Aerith’s capability to accurately predict and annotate complex isotopic patterns that would be challenging for conventional proteomics tools.
set.seed(9527)
p <- plotPSMannotation(
observedSpect = getRealScanFromList(scan1),
pep = "HYAHVDCPGHADYVK", Atom = "C13", Prob = 0.52,
charges = 1:2, isoCenter = scan1$isolationWindowCenterMZ, isoWidth = 5.0,
ifRemoveNotFoundIon = TRUE
)
p
Visual Features of Heavy Labeling:
This plot illustrates several key differences from the unlabeled example:
Comparison with Natural Abundance:
Contrasting this heavily labeled spectrum with the natural abundance example reveals:
State-of-the-Art Advantages:
This analysis demonstrates Aerith’s advantages over existing proteomics software:
This comprehensive comparison demonstrates the exceptional accuracy of Aerith’s isotopic modeling even under extreme labeling conditions. The overlay of theoretical and observed spectra provides quantitative validation of the software’s predictive capabilities.
demo_file <- system.file("extdata", "X13_4068_2596_8182.FT2", package = "Aerith")
ft2 <- readAllScanMS2(demo_file)
a <- getSipBYionSpectra("HYAHVDCPGHADYVK", "C13", 0.52, 1:2)
p <- plot(a)
p <- p + plotSipBYionLabel(a)
c <- getRealScan(2596, ft2)
p <- p + plotRealScan(c)
p
Advanced Modeling Features:
The theoretical spectrum generation for heavily labeled peptides involves several sophisticated calculations:
Validation of Computational Accuracy:
The close agreement between theoretical predictions and observed data validates several aspects of Aerith’s approach:
Implications for Quantitative SIP:
This level of predictive accuracy has important implications for quantitative stable isotope probing:
Precursor ion analysis becomes particularly informative under heavy labeling conditions, where the isotopic envelope provides direct evidence of metabolic incorporation. This analysis demonstrates how Aerith handles complex precursor isotopic patterns that span multiple mass units.
demo_file <- system.file("extdata", "X13_2559.FT1", package = "Aerith")
ft1 <- readOneScanMS1(demo_file, 2559)
precursorScan1 <- getRealScanFromList(ft1)
pep <- "HYAHVDCPGHADYVK"
precursorSP <- getSipPrecursorSpectra(pep, Prob = 0.5, charges = 3)
slot(precursorSP, "spectra")$Kind <- "Expected"
xlimit <- slot(precursorScan1, "spectra")$MZ > 590 & slot(precursorScan1, "spectra")$MZ < 620
slot(precursorScan1, "spectra") <- slot(precursorScan1, "spectra")[xlimit, ]
slot(precursorScan1, "spectra")$Kind <- "Observed"
maxInt <- max(slot(precursorScan1, "spectra")$Prob)
slot(precursorScan1, "spectra")$Prob <- slot(precursorScan1, "spectra")$Prob / maxInt * 100
p <- plot(precursorSP, linewidth = 0.3) + plotRealScan(precursorScan1, linewidth = 0.3) +
scale_x_continuous(breaks = seq(590, 620, by = 5)) +
theme(legend.title = element_blank()) +
scale_color_manual(values = c("#E7872B", "#F3082F"))
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.
p
Heavy Labeling Precursor Features:
The precursor analysis of the heavily labeled peptide reveals several distinctive characteristics:
Quantitative Information Content:
This precursor analysis provides multiple layers of quantitative information:
Comparison with Natural Abundance:
Contrasting this heavily labeled precursor with the natural abundance example illustrates:
Methodological Advantages:
This analysis showcases several key advantages of the Aerith approach:
One of Aerith’s key strengths is its ability to process large numbers of PSMs automatically while maintaining the same level of annotation accuracy demonstrated in individual examples. This batch processing capability is essential for proteome-wide SIP experiments.
The batch processing functionality demonstrates Aerith’s scalability and practical utility in real-world proteomics workflows. This example processes multiple PSMs from a single experiment, generating publication-ready plots for each identification.
element <- "C13"
demo_file <- system.file("extdata", "demo.psm.txt", package = "Aerith")
psm <- readPSMtsv(demo_file)
psm <- psm[psm$Filename == "Pan_052322_X13.FT2", ]
psm <- psm[psm$ScanNumber %in% c("4068", "2596", "8182"), ]
demo_file <- system.file("extdata", "X13_4068_2596_8182.FT2", package = "Aerith")
ft2 <- readAllScanMS2(demo_file)
ftFileNames <- psm$Filename
scanNumbers <- psm$ScanNumber
proNames <- psm$ProteinNames
charges <- psm$ParentCharge
pep <- psm$OriginalPeptide
pep <- stringr::str_sub(pep, 2, -2)
pct <- psm$SearchName
pct <- as.numeric(stringr::str_sub(
stringr::str_split(pct, "_", simplify = TRUE)[, 2], 1, -4
)) / 100 / 1000
realScans <- getRealScans(ft2, scanNumbers)
tmp <- tempdir()
plotPSMs(
realScans,
charges,
element,
pct,
BYcharge = 1:2,
ftFileNames,
scanNumbers,
pep,
proNames,
path = tmp
)
list.files(tmp, pattern = ".pdf", full.names = TRUE)
#> [1] "/tmp/RtmpxyJ0VR/Pan_052322_X13.FT2_1_2_4068_AAQYVASHPGEVCPAK__sp_P0AE08_AHPC_ECOLI__0.51_.pdf"
#> [2] "/tmp/RtmpxyJ0VR/Pan_052322_X13.FT2_2_2_2596_HYAHVDCPGHADYVK__sp_P0CE47_EFTU1_ECOLI,sp_P0CE48_EFTU2_ECOLI__0.52_.pdf"
#> [3] "/tmp/RtmpxyJ0VR/Pan_052322_X13.FT2_3_2_8182_KAEQYLLENETTK__sp_P00509_AAT_ECOLI__0.51_.pdf"
Batch Processing Components:
Workflow Advantages:
This automated approach provides several key benefits for large-scale SIP experiments:
Scalability and Performance:
This batch processing approach scales effectively to proteome-wide datasets:
Integration with Proteomics Workflows:
The batch processing functionality integrates seamlessly with standard proteomics pipelines:
This automated capability transforms Aerith from a specialized analysis tool into a practical solution for production proteomics workflows, enabling routine application of SIP analysis to large-scale experiments.
This vignette has demonstrated the comprehensive PSM annotation and visualization capabilities of the Aerith package. The examples span from natural abundance (1.07% 13C) to heavy labeling (52% 13C), illustrating the software’s versatility and accuracy across diverse experimental conditions.
Isotope-aware analysis: Aerith provides native support for stable isotope probing experiments, accurately modeling complex isotopic patterns that challenge conventional proteomics tools.
Multi-level validation: The combination of precursor (MS1) and fragment (MS2) analysis provides comprehensive validation of peptide identifications and quantification of isotopic incorporation.
Visual integration: Seamless integration of computational analysis with publication-ready visualization accelerates data interpretation and presentation.
Scalable workflows: Batch processing capabilities enable application to proteome-wide datasets while maintaining annotation accuracy.
sessionInfo()
#> R Under development (unstable) (2025-10-20 r88955)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.1 stringr_1.6.0 dplyr_1.1.4 Aerith_0.99.11
#>
#> loaded via a namespace (and not attached):
#> [1] rlang_1.1.6 magrittr_2.0.4
#> [3] clue_0.3-66 matrixStats_1.5.0
#> [5] compiler_4.6.0 systemfonts_1.3.1
#> [7] vctrs_0.6.5 reshape2_1.4.5
#> [9] ProtGenerics_1.43.0 pkgconfig_2.0.3
#> [11] MetaboCoreUtils_1.19.1 crayon_1.5.3
#> [13] fastmap_1.2.0 XVector_0.51.0
#> [15] labeling_0.4.3 rmarkdown_2.30
#> [17] preprocessCore_1.73.0 ragg_1.5.0
#> [19] purrr_1.2.0 xfun_0.54
#> [21] MultiAssayExperiment_1.37.2 cachem_1.1.0
#> [23] jsonlite_2.0.0 DelayedArray_0.37.0
#> [25] BiocParallel_1.45.0 parallel_4.6.0
#> [27] cluster_2.1.8.1 R6_2.6.1
#> [29] bslib_0.9.0 stringi_1.8.7
#> [31] RColorBrewer_1.1-3 limma_3.67.0
#> [33] GenomicRanges_1.63.0 jquerylib_0.1.4
#> [35] Rcpp_1.1.0 Seqinfo_1.1.0
#> [37] SummarizedExperiment_1.41.0 iterators_1.0.14
#> [39] knitr_1.50 IRanges_2.45.0
#> [41] Matrix_1.7-4 igraph_2.2.1
#> [43] tidyselect_1.2.1 dichromat_2.0-0.1
#> [45] abind_1.4-8 yaml_2.3.11
#> [47] doParallel_1.0.17 codetools_0.2-20
#> [49] affy_1.89.0 lattice_0.22-7
#> [51] tibble_3.3.0 plyr_1.8.9
#> [53] Biobase_2.71.0 withr_3.0.2
#> [55] S7_0.2.1 evaluate_1.0.5
#> [57] Spectra_1.21.0 pillar_1.11.1
#> [59] affyio_1.81.0 BiocManager_1.30.27
#> [61] MatrixGenerics_1.23.0 foreach_1.5.2
#> [63] stats4_4.6.0 MSnbase_2.37.0
#> [65] MALDIquant_1.22.3 ncdf4_1.24
#> [67] generics_0.1.4 S4Vectors_0.49.0
#> [69] scales_1.4.0 glue_1.8.0
#> [71] lazyeval_0.2.2 tools_4.6.0
#> [73] mzID_1.49.0 data.table_1.17.8
#> [75] QFeatures_1.21.0 vsn_3.79.0
#> [77] mzR_2.45.0 fs_1.6.6
#> [79] XML_3.99-0.20 grid_4.6.0
#> [81] impute_1.85.0 tidyr_1.3.1
#> [83] MsCoreUtils_1.23.1 PSMatch_1.15.0
#> [85] cli_3.6.5 textshaping_1.0.4
#> [87] S4Arrays_1.11.1 AnnotationFilter_1.35.0
#> [89] pcaMethods_2.3.0 gtable_0.3.6
#> [91] sass_0.4.10 digest_0.6.39
#> [93] BiocGenerics_0.57.0 SparseArray_1.11.7
#> [95] ggrepel_0.9.6 farver_2.1.2
#> [97] htmltools_0.5.8.1 lifecycle_1.0.4
#> [99] statmod_1.5.1 MASS_7.3-65