library(Aerith)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
This vignette provides a comprehensive guide to understanding and working with various input data formats supported by the Aerith package. Aerith represents a state-of-the-art solution for stable isotope probing (SIP) proteomics analysis, offering robust data handling capabilities that surpass traditional approaches in both performance and flexibility.
The Aerith package distinguishes itself from other proteomics analysis tools through several key advantages:
Modern proteomics experiments generate data in various formats depending on the instrument vendor and analysis software. This vignette demonstrates how Aerith handles:
Before using Aerith, Thermo or other vendors’ raw files must be converted to supported formats. This preprocessing step is crucial for ensuring optimal performance and compatibility.
Required Tools for Conversion: - Raxport: Converts Thermo raw files to FT1/FT2 formats (optimized for Aerith) - ThermoRawFileParser: Converts raw files to mzML format (open standard)
Recommended Workflow: 1. Use Raxport for FT1/FT2 conversion (preferred for Aerith workflows) 2. Use ThermoRawFileParser for mzML conversion (for broader compatibility)
Resource Links: - Aerith README - File Conversion Guide - Raxport Tool and Documentation - ThermoRawFileParser - Sipros4 Tutorial - Sipros5 Tutorial - Sipros Conda Environment
FT1 and FT2 files represent Thermo Fisher’s optimized format for MS1 and MS2 data, respectively. These formats offer several advantages including faster read times and smaller file sizes compared to mzML.
The following example demonstrates different approaches to reading FT1 (MS1) data. Each function serves a specific purpose depending on your analysis needs:
rds <- system.file("extdata", "demo.FT1.rds", package = "Aerith")
demo_file <- tempfile(fileext = ".FT1")
writeLines(readRDS(rds), demo_file)
# Read all MS1 scans into memory
# This approach is memory-intensive but provides fastest access for subsequent operations
all_scans <- readAllScanMS1(demo_file)
# Read a specific range of scans (scan numbers 1527-1550)
# Recommended when you know the specific scans of interest
scan_range <- readScansMS1(demo_file, 1527, 1550)
# Read a single scan by scan number
# Most memory-efficient for analyzing individual scans
single_scan <- readOneScanMS1(demo_file, 1555)
# Extract real scan data from the list structure
# Converts internal format to user-friendly data structure
processed_scan <- getRealScanFromList(all_scans[[88]])
plot(processed_scan)
Parameter Selection Guidelines: - Use readAllScanMS1() when analyzing the entire dataset or when memory is not a constraint - Use readScansMS1() with specific scan ranges for targeted analysis - Use readOneScanMS1() for individual scan inspection or when building custom workflows
The plot generated shows the mass spectrum with m/z values on the x-axis and intensity on the y-axis. This visualization is crucial for quality assessment and identifying potential issues with the data.
Now let’s examine MS2 data from FT2 files:
demo_file <- system.file("extdata", "demo.FT2", package = "Aerith")
# Read all MS2 scans
all_ms2_scans <- readAllScanMS2(demo_file)
# Read specific scan range for MS2 data
ms2_range <- readScansMS2(demo_file, 1399, 1500)
# Read individual MS2 scan
single_ms2_scan <- readOneScanMS2(demo_file, 1371)
# Process and visualize MS2 spectrum
processed_ms2_scan <- getRealScanFromList(all_ms2_scans[[128]])
plot(processed_ms2_scan)
MS2 Data Interpretation: The MS2 spectrum shows fragment ions resulting from peptide fragmentation. Peak patterns in MS2 spectra are essential for peptide identification and are particularly important in SIP proteomics where isotope incorporation affects fragmentation patterns.
Creating smaller subset files is invaluable for testing workflows, debugging, and sharing data samples. This approach allows you to work with manageable datasets while preserving the original file structure.
Creating MS1 Subset Files:
rds <- system.file("extdata", "demo.FT1.rds", package = "Aerith")
demo_file <- tempfile(fileext = ".FT1")
writeLines(readRDS(rds), demo_file)
# Read file header information (essential for maintaining file integrity)
header <- readFTheader(demo_file)
# Read all scans from the original file
ft1_data <- readAllScanMS1(demo_file)
# Create output directory
output_dir <- tempdir()
# Write subset containing first 10 scans
# This preserves file format while reducing file size significantly
writeAllScanMS1(header, ft1_data[1:10], file.path(output_dir, "demo10.FT1"))
#> [1] TRUE
# Verify file creation
subset_files <- list.files(output_dir, pattern = "demo10.FT1", full.names = TRUE)
print(paste("Created subset file:", subset_files))
#> [1] "Created subset file: /tmp/RtmpxyJ0VR/demo10.FT1"
Key Advantages of This Approach: - Maintains original file format and structure - Preserves metadata and header information - Creates files suitable for method development and testing - Reduces computational requirements for iterative analysis
Creating MS2 Subset Files:
demo_file <- system.file("extdata", "demo.FT2", package = "Aerith")
# Read header and scan data
header <- readFTheader(demo_file)
ft2_data <- readAllScanMS2(demo_file)
# Create subset with first 10 MS2 scans
output_dir <- tempdir()
writeAllScanMS2(header, ft2_data[1:10], file.path(output_dir, "demo10.FT2"))
#> [1] TRUE
# Confirm successful file creation
subset_files <- list.files(output_dir, pattern = "demo10.FT2", full.names = TRUE)
print(paste("Created MS2 subset file:", subset_files))
#> [1] "Created MS2 subset file: /tmp/RtmpxyJ0VR/demo10.FT2"
Best Practices for Subset Creation: - Always include representative scans from different retention time ranges - Consider including both high and low intensity scans - Document the selection criteria for reproducibility - Verify that the subset maintains the essential characteristics of the original dataset
mzML (mass spectrometry Markup Language) is an open standard format that provides excellent cross-platform compatibility. Aerith’s mzML support leverages the power of specialized libraries while providing a consistent interface.
Reading mzML Data:
# mzML support requires the mzR package
demo_file <- system.file("extdata", "demo.mzML", package = "Aerith")
# Read MS1 data from mzML file
# mzML files can contain both MS1 and MS2 data in a single file
mzml_ms1_data <- readMzmlMS1(demo_file)
# Extract and visualize a specific MS1 scan
ms1_spectrum <- getRealScan(16, mzml_ms1_data)
plot(ms1_spectrum)
# Read MS2 data from the same mzML file
mzml_ms2_data <- readMzmlMS2(demo_file)
# Extract and visualize a specific MS2 scan
ms2_spectrum <- getRealScan(18, mzml_ms2_data)
plot(ms2_spectrum)
mzML Format Advantages: - Universal Compatibility: Supported by virtually all mass spectrometry software - Rich Metadata: Contains comprehensive instrument and acquisition parameters - Standardized Structure: Facilitates data sharing and collaboration - Vendor Independence: Not tied to specific instrument manufacturers
Plot Interpretation: The MS1 plot displays the precursor ion survey scan, showing the overall complexity of the sample at a given retention time. The MS2 plot reveals the fragmentation pattern of a selected precursor, which is essential for peptide identification. In SIP experiments, these spectra may show characteristic isotope patterns that indicate successful labeling.
Parameter Selection for Scan Extraction: - Scan numbers correspond to the chronological order of acquisition - Choose representative scans from different retention time windows - For method development, select scans with different complexity levels - Consider precursor intensity when selecting MS2 scans for analysis
MGF (Mascot Generic Format) files are widely used for database searching and contain MS2 spectra in a simple, text-based format. This format is particularly popular in proteomics workflows due to its simplicity and broad software support.
Reading MGF Files:
# MGF support requires the MSnbase package
demo_file <- system.file("extdata", "demo.mgf", package = "Aerith")
# Read MGF file containing MS2 spectra
# MGF files typically contain only MS2 spectra with associated metadata
mgf_data <- readMgf(demo_file)
# Extract and visualize a specific spectrum
selected_spectrum <- getRealScan(2688, mgf_data)
plot(selected_spectrum)
MGF Format Characteristics: - Database Search Ready: Designed specifically for search engines like Mascot, SEQUEST, and others - Simplified Structure: Contains only essential information for peptide identification - Metadata Rich: Includes precursor m/z, charge state, and retention time information - Text-Based: Human-readable format facilitating troubleshooting and manual inspection
Spectrum Analysis: The plotted spectrum shows the MS2 fragmentation pattern for the selected precursor. Key features to observe include: - Base Peak: The most intense fragment ion - Ion Series: Patterns of b-ions and y-ions characteristic of peptide fragmentation - Neutral Losses: Common losses like water (-18 Da) or ammonia (-17 Da) - Precursor Peak: May be visible if not completely fragmented
Selection Criteria for Scan Numbers: When choosing scan numbers for analysis, consider: - Signal Quality: Select scans with good signal-to-noise ratio - Fragmentation Efficiency: Choose spectra with rich fragmentation patterns - Precursor Intensity: Higher intensity precursors often yield better fragmentation - Charge State: Different charge states provide complementary information
pepXML is a standardized format for storing peptide identification results from database searches. This format is crucial for downstream analysis and provides comprehensive information about peptide-spectrum matches (PSMs).
# pepXML parsing requires the mzR package
demo_file <- system.file("extdata", "demo.pepXML", package = "Aerith")
# Parse pepXML file to extract peptide identification results
# This creates a structured data frame with all identification information
pepxml_results <- readPepXMLtable(demo_file)
# Display structure of the results
str(pepxml_results)
#> 'data.frame': 3 obs. of 28 variables:
#> $ psmID : chr "controllerType=0 controllerNumber=1 scan=13043_PEP_2" "controllerType=0 controllerNumber=1 scan=13044_PEP_3" "controllerType=0 controllerNumber=1 scan=531_PEP_1"
#> $ spectrumID : chr "controllerType=0 controllerNumber=1 scan=13043" "controllerType=0 controllerNumber=1 scan=13044" "controllerType=0 controllerNumber=1 scan=531"
#> $ number.of.matched.peaks : num 10 7 9
#> $ number.of.unmatched.peaks: num 10 21 13
#> $ X.Tandem.hyperscore : num 18.3 15.3 8
#> $ X.Tandem.expect : num 0.00909 4.141 2
#> $ chargeState : int 2 4 2
#> $ rank : int 1 1 1
#> $ passThreshold : logi TRUE TRUE TRUE
#> $ experimentalMassToCharge : num 558 411 418
#> $ calculatedMassToCharge : num 558 411 418
#> $ sequence : chr "GYGFITGTDGK" "ISISVDMLGSDDFIK" "KGAVATCK"
#> $ peptideRef : chr "PEP_2" "PEP_3" "PEP_1"
#> $ modNum : int 0 0 0
#> $ isDecoy : logi FALSE FALSE FALSE
#> $ post : chr "D" "G" "V"
#> $ pre : chr "K" "K" "K"
#> $ start : int 0 0 0
#> $ end : int 0 0 0
#> $ DBseqLength : int 0 0 0
#> $ DatabaseSeq : chr "" "" ""
#> $ scan.start.time : chr "1849.854" "1849.932" "379.6314"
#> $ acquisitionNum : num 13043 13044 531
#> $ DatabaseAccess : chr "MGBC134670_02107,MGBC165238_00126" "MGBC162268_01739" "MGBC115096_01980,MGBC128936_01434,MGBC132046_01637,MGBC118143_01865"
#> $ DatabaseDescription : chr "unannotated protein,unannotated protein" "hypothetical protein" "Bacterial Ig-like domain (group 2),Bacterial Ig-like domain (group 2),Bacterial Ig-like domain (group 2),unannotated protein"
#> $ name : chr NA NA NA
#> $ mass : num NA NA NA
#> $ location : int NA NA NA
pepXML Data Structure: The parsed pepXML file contains essential columns including: - Peptide Sequence: Amino acid sequence of identified peptides - Protein Accession: Database identifiers for matched proteins - Scores: Search engine specific scoring metrics - Modifications: Post-translational modifications and their positions - Spectrum Information: Links to original MS2 spectra
Advantages of pepXML Format: - Standardization: Consistent format across different search engines - Rich Metadata: Contains detailed scoring and statistical information - Protein Grouping: Maintains relationships between peptides and proteins - Modification Support: Comprehensive handling of post-translational modifications
Sipros (Stable Isotope Probing proteomics) generates specialized output formats optimized for SIP analysis. Aerith provides native support for these formats, representing a significant advantage over generic proteomics tools.
Sipros generates several types of output files, each serving specific analytical purposes. Aerith’s unified reading functions provide consistent access to these diverse data types.
Standard PSM Results:
demo_file <- system.file("extdata", "demo.psm.txt", package = "Aerith")
# Read peptide-spectrum match results
# Contains identification scores, modifications, and SIP-specific metrics
psm_data <- readPSMtsv(demo_file)
# Display key columns and data structure
head(psm_data)
#> Filename ScanNumber ParentCharge MeasuredParentMass
#> 1 Pan_052322_X14.FT2 27210 2 1225.764
#> 2 Pan_052322_X13.FT2 23676 2 1312.752
#> 3 Pan_052322_X15.FT2 40863 3 2393.352
#> 4 Pan_052322_X14.FT2 8192 2 959.526
#> 5 Pan_052322_X14.FT2 16534 3 2596.417
#> 6 Pan_052322_X15.FT2 27005 3 1237.785
#> CalculatedParentMass MassErrorDa MassErrorPPM ScanType SearchName
#> 1 1223.758 0.001050 0.8580128 Ms2 C13_49000Pct
#> 2 1311.748 -0.000245 -0.1867737 Ms2 C13_58000Pct
#> 3 2388.338 0.002905 1.2163270 Ms2 C13_54000Pct
#> 4 957.519 -0.000330 -0.3446407 Ms2 C13_57000Pct
#> 5 2602.440 0.002770 1.0643857 Ms2 C13_51000Pct
#> 6 1238.789 0.000985 0.7951314 Ms2 C13_52000Pct
#> ScoringFunction Score DeltaZ DeltaP IdentifiedPeptide
#> 1 WeightSum 43.64 1.71 NA [NILMIGPTGVGK]
#> 2 WeightSum 50.29 2.46 NA [DGSVVVLGYTDR]
#> 3 WeightSum 64.78 1.86 NA [IETLFTNDLDHGPYISETLR]
#> 4 WeightSum 51.29 0.87 NA [VMVMVDDK]
#> 5 WeightSum 46.81 0.46 NA [VETHNHPTAISPWPGAATGSGGEIR]
#> 6 WeightSum 68.29 1.21 NA [NRDELLALIR]
#> OriginalPeptide ProteinNames ProteinCount TargetMatch
#> 1 [NILMIGPTGVGK] {sp|P0A6H5|HSLU_ECOLI} 1 TRUE
#> 2 [DGSVVVLGYTDR] {sp|P0A910|OMPA_ECOLI} 1 TRUE
#> 3 [IETLFTNDLDHGPYISETLR] {sp|P0A8V2|RPOB_ECOLI} 1 TRUE
#> 4 [VMVMVDDK] {sp|P00961|SYGB_ECOLI} 1 TRUE
#> 5 [VETHNHPTAISPWPGAATGSGGEIR] {sp|P15254|PUR4_ECOLI} 1 TRUE
#> 6 [NRDELLALIR] {sp|P0AE78|CORC_ECOLI} 1 TRUE
Protein Clustering Results:
demo_file <- system.file("extdata", "demo.pro.cluster.txt", package = "Aerith")
# Read protein clustering and grouping information
# Essential for protein-level quantification and SIP analysis
protein_clusters <- readPSMtsv(demo_file)
# Examine protein grouping structure
head(protein_clusters)
#> X ProteinID AverageEnrichmentLevel StandardDeviation
#> 1 + {sp|P0A7W1|RS5_ECOLI} 58000 0
#> 2 + {sp|P0AGD7|SRP54_ECOLI} 48000 0
#> 3 + {sp|P0AEU7|SKP_ECOLI} 48000 0
#> 4 + {sp|P40874|MTOX_ECOLI} 41000 0
#> 5 + {sp|P00393|NDH_ECOLI} 54000 0
#> 6 + {sp|P0ACJ0|LRP_ECOLI} 46000 0
#> UniqueSpectrumCounts TotalSpectrumCounts
#> 1 1 1
#> 2 1 1
#> 3 3 3
#> 4 1 1
#> 5 3 3
#> 6 8 8
#> EnrichmentLevels
#> 1 {58000}
#> 2 {48000}
#> 3 {48000,48000,48000}
#> 4 {41000}
#> 5 {54000,54000,54000}
#> 6 {46000,46000,46000,46000,46000,46000,46000,46000}
#> ProteinDescription
#> 1 {sp|P0A7W1|RS5_ECOLI 30S ribosomal protein S5 OS=Escherichia coli (strain K12) OX=83333 GN=rpsE PE=1 SV=2}
#> 2 {sp|P0AGD7|SRP54_ECOLI Signal recognition particle protein OS=Escherichia coli (strain K12) OX=83333 GN=ffh PE=1 SV=1}
#> 3 {sp|P0AEU7|SKP_ECOLI Chaperone protein Skp OS=Escherichia coli (strain K12) OX=83333 GN=skp PE=1 SV=1}
#> 4 {sp|P40874|MTOX_ECOLI N-methyl-L-tryptophan oxidase OS=Escherichia coli (strain K12) OX=83333 GN=solA PE=1 SV=1}
#> 5 {sp|P00393|NDH_ECOLI Type II NADH:quinone oxidoreductase OS=Escherichia coli (strain K12) OX=83333 GN=ndh PE=1 SV=2}
#> 6 {sp|P0ACJ0|LRP_ECOLI Leucine-responsive regulatory protein OS=Escherichia coli (strain K12) OX=83333 GN=lrp PE=1 SV=2}
SIP-Specific Analysis Results:
demo_file <- system.file("extdata", "demo.sip", package = "Aerith")
# Read SIP analysis results containing isotope incorporation metrics
# This file type is unique to SIP proteomics workflows
sip_results <- readPSMtsv(demo_file)
# Display SIP-specific columns
head(sip_results)
#> Filename ScanNumber ParentCharge MeasuredParentMass
#> 1 Pan_052322_X13.FT2 35171 2 936.6362
#> 2 Pan_052322_X13.FT2 19237 2 951.5866
#> 3 Pan_052322_X13.FT2 10433 2 1240.6850
#> 4 Pan_052322_X13.FT2 36604 2 1183.7354
#> 5 Pan_052322_X13.FT2 21229 3 2281.3253
#> 6 Pan_052322_X13.FT2 24594 2 1821.0902
#> CalculatedParentMass ScanType SearchName ScoringFunction Rank Score
#> 1 936.6366 Ms2 C13_50000Pct WeightSum 3 12.37
#> 2 951.5837 Ms2 C13_50000Pct WeightSum 4 7.23
#> 3 1243.6882 Ms2 C13_50000Pct WeightSum 4 9.94
#> 4 1189.7524 Ms2 C13_50000Pct WeightSum 4 16.22
#> 5 2292.3554 Ms2 C13_50000Pct WeightSum 5 7.15
#> 6 1819.0757 Ms2 C13_50000Pct WeightSum 2 5.00
#> IdentifiedPeptide OriginalPeptide ProteinNames
#> 1 K[AIADAVVKK]D K[AIADAVVKK]D {Rev_sp|P0AFG8|ODP1_ECOLI}
#> 2 R[GLEREISK]L R[GLEREISK]L {sp|P0A9M0|LON_ECOLI}
#> 3 R[QHTHDAPRTR]T R[QHTHDAPRTR]T {sp|P0AGK8|ISCR_ECOLI}
#> 4 R[GLLFSGIEVAR]I R[GLLFSGIEVAR]I {Rev_sp|P0A853|TNAA_ECOLI}
#> 5 K[AMNNLKDGLAKLGQTLYYAR]D K[AMNNLKDGLAKLGQTLYYAR]D {sp|P76104|RLHA_ECOLI}
#> 6 R[DLHFKERYDVVISR]L R[DLHFKERYDVVISR]L {sp|P60872|YIDE_ECOLI}
Spectrum-to-Peptide Mapping:
target_file <- system.file("extdata", "demo_target.Spe2Pep.txt", package = "Aerith")
# Read spectrum-to-peptide mapping files
# These files link MS2 spectra to peptide identifications
spe2pep_data <- readSpe2Pep(target_file)
# Extract PSM information from the parsed data
psm_from_spe2pep <- spe2pep_data$PSM
# Display mapping structure
head(psm_from_spe2pep)
#> scanNumbers precursorScanNumbers parentCharges isolationWindowCenterMZs
#> 1 12725 12728 2 832.932
#> 2 12725 12728 2 832.932
#> 3 12725 12728 2 832.932
#> 4 12725 12728 2 832.932
#> 5 12725 12728 2 832.932
#> 6 12725 12728 2 832.932
#> measuredParentMasses calculatedParentMasses retentionTimes WDPscores
#> 1 1663.849 1663.849 28.53885 16.4672
#> 2 1663.849 1668.875 28.53885 4.4907
#> 3 1666.859 1671.884 28.53885 3.4707
#> 4 1663.849 1668.870 28.53885 1.9872
#> 5 1663.849 1661.846 28.53885 1.9872
#> 6 1666.859 1669.869 28.53885 1.9872
#> XcorrScores MVHscores ranks identifiedPeptides originalPeptides
#> 1 1.2468 80.9063 1 R[TYDDDPTKYQDLR]V R[TYDDDPTKYQDLR]V
#> 2 0.3837 0.0000 2 K[EDTLSDHIESYGIR]S K[EDTLSDHIESYGIR]S
#> 3 0.2425 19.9707 3 R[IVGGYQNNDSWLDR]Y R[IVGGYQNNDSWLDR]Y
#> 4 0.1082 0.0000 4 R[SGYSSPGSPGTPGSRSR]T R[SGYSSPGSPGTPGSRSR]T
#> 5 0.1502 0.0000 5 R[FYHDYYGENLFR]T R[FYHDYYGENLFR]T
#> 6 0.0734 0.0000 6 R[CDILEPGTLQGYDR]D R[CDILEPGTLQGYDR]D
#> nakePeptides proteinNames
#> 1 TYDDDPTKYQDLR {sp|P0AEM9|TCYJ_ECOLI}
#> 2 EDTLSDHIESYGIR {sp|P75978|YMFN_ECOLI}
#> 3 IVGGYQNNDSWLDR {sp|P16869|FHUE_ECOLI}
#> 4 SGYSSPGSPGTPGSRSR {Con_TAU_HUMAN}
#> 5 FYHDYYGENLFR {sp|P28629|ADIA_ECOLI}
#> 6 CDILEPGTLQGYDR {sp|P0A9C5|GLN1B_ECOLI}
File Type Interpretation Guide: - PSM files: Core identification results with scoring metrics - Protein cluster files: Protein grouping and quantification data - SIP files: Isotope incorporation analysis and labeling efficiency - Spe2Pep files: Direct mapping between spectra and peptide identifications
Advantages of Sipros Integration: Aerith’s native support for Sipros formats offers several benefits: - Seamless Workflow: Direct reading without format conversion - SIP-Optimized: Preserves SIP-specific metadata and metrics - Performance: Optimized parsing for large Sipros datasets - Consistency: Uniform data structures across different file types
Aerith provides powerful visualization tools for quality assessment and data exploration. These functions are essential for identifying potential issues and understanding dataset characteristics.
Total Ion Current (TIC) analysis provides a comprehensive overview of instrument performance and sample complexity throughout the chromatographic separation. This analysis is crucial for identifying systematic issues and understanding data quality.
# Analyze TIC from MS1 data
rds <- system.file("extdata", "demo.FT1.rds", package = "Aerith")
demo_file <- tempfile(fileext = ".FT1")
writeLines(readRDS(rds), demo_file)
ms1_scans <- readAllScanMS1(demo_file)
ms1_tic <- getTIC(ms1_scans)
# Create TIC plot with specified retention time breaks
# The breaks parameter allows customization of the x-axis for better visualization
plotTIC(ms1_tic, seq(9, 10, by = 0.2))
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> ℹ The deprecated feature was likely used in the Aerith package.
#> Please report the issue at <https://github.com/xyz1396/Aerith/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
#> ℹ Please use the `linewidth` argument instead.
#> ℹ The deprecated feature was likely used in the Aerith package.
#> Please report the issue at <https://github.com/xyz1396/Aerith/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
# Analyze TIC from MS2 data
demo_file <- system.file("extdata", "demo.FT2", package = "Aerith")
ms2_scans <- readAllScanMS2(demo_file)
ms2_tic <- getTIC(ms2_scans)
# Plot MS2 TIC with the same retention time range for comparison
plotTIC(ms2_tic, seq(9, 10, by = 0.2))
TIC Plot Interpretation: - Peak Shape: Should show smooth chromatographic peaks indicating proper separation - Baseline Stability: Consistent baseline suggests stable instrument performance - Peak Intensity: Reflects sample concentration and ionization efficiency - Peak Width: Indicates chromatographic resolution and gradient steepness
Parameter Selection Guidelines: - Retention Time Breaks: Choose intervals that highlight important features - Time Range: Focus on the active elution window to avoid empty baseline regions - Resolution: Balance between detail and overview based on analysis goals
Quality Assessment Criteria: - Consistent peak shapes across the chromatographic run - Minimal baseline drift or sudden intensity changes - Appropriate peak capacity for the experimental design - Comparable TIC profiles between technical replicates
Understanding scan frequency and timing is essential for evaluating instrument performance and data acquisition strategies. This analysis reveals the temporal distribution of MS1 and MS2 events.
# Process MS2 data to extract retention time and precursor information
demo_file <- system.file("extdata", "demo.FT2", package = "Aerith")
ms2_scan_data <- readAllScanMS2(demo_file)
ms2_retention_info <- getRetentionTimeAndPrecursorInfo(ms2_scan_data)
# Process MS1 data for comparison
rds <- system.file("extdata", "demo.FT1.rds", package = "Aerith")
demo_file <- tempfile(fileext = ".FT1")
writeLines(readRDS(rds), demo_file)
ms1_scan_data <- readAllScanMS1(demo_file)
ms1_retention_info <- getRetentionTimeAndPrecursorInfo(ms1_scan_data)
# Create combined scan frequency visualization
# This plot shows the temporal distribution of MS1 and MS2 scans
combined_plot <- plotScanFrequency(ms2_retention_info,
binwidth = 0.1,
breaks = seq(9, 10, by = 0.2)
) +
plotScanFrequencyMS2(ms1_retention_info, binwidth = 0.1)
print(combined_plot)
Scan Frequency Analysis Interpretation: - MS1 Frequency: Indicates survey scan rate and data collection efficiency - MS2 Frequency: Reflects precursor selection and fragmentation efficiency - Temporal Distribution: Shows how scan events are distributed across retention time - Balance: Optimal ratio between MS1 and MS2 scans for comprehensive coverage
Parameter Optimization Guidelines: - Binwidth: Smaller values (0.05-0.1) provide higher temporal resolution - Breaks: Set to match your chromatographic gradient and peak capacity - Time Range: Focus on active elution window for meaningful analysis
Performance Indicators: - Consistent scan rates indicate stable instrument operation - Appropriate MS1/MS2 ratio ensures both survey and fragmentation coverage - Even temporal distribution suggests optimal data-dependent acquisition settings
Precursor m/z distribution analysis provides insights into sample complexity, mass range coverage, and data acquisition effectiveness. This analysis is particularly valuable for SIP experiments where isotope patterns affect precursor masses.
demo_file <- system.file("extdata", "demo.FT2", package = "Aerith")
ms2_data <- readAllScanMS2(demo_file)
precursor_info <- getRetentionTimeAndPrecursorInfo(ms2_data)
# Create precursor m/z frequency plot
# This visualization shows how precursors are distributed across m/z and retention time
plotPrecursorMzFrequency(precursor_info,
timeBinWidth = 0.1,
x_breaks = seq(8, 11, by = 0.2)
)
Precursor Distribution Analysis: This 2D visualization reveals several critical aspects of the data:
Parameter Selection for Optimal Visualization: - timeBinWidth (0.1): Provides good temporal resolution for most LC gradients - x_breaks: Customized to match your retention time window of interest - Mass Range: Automatically scaled to your data range
Interpretation Guidelines: - Hot Spots: High-density regions may indicate co-eluting compounds or abundant species - Coverage Gaps: Empty regions may suggest missed opportunities or instrumental limitations - Temporal Patterns: Changes in m/z distribution over time reflect chromatographic separation - Dynamic Range: Spread of intensities indicates sample complexity
SIP-Specific Considerations: In stable isotope probing experiments, this analysis is particularly valuable for: - Identifying isotope-labeled peptides (mass shifts) - Assessing labeling efficiency across different m/z ranges - Detecting systematic biases in heavy vs. light peptide selection - Optimizing acquisition parameters for SIP workflows
This vignette demonstrates Aerith’s comprehensive data handling capabilities, showcasing its advantages over traditional proteomics tools:
The combination of robust data handling, specialized SIP features, and comprehensive visualization makes Aerith an invaluable tool for modern proteomics research, particularly in the rapidly growing field of stable isotope probing.
sessionInfo()
#> R Under development (unstable) (2025-10-20 r88955)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 Aerith_0.99.11
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 farver_2.1.2
#> [3] S7_0.2.1 fastmap_1.2.0
#> [5] lazyeval_0.2.2 XML_3.99-0.20
#> [7] digest_0.6.39 lifecycle_1.0.4
#> [9] cluster_2.1.8.1 ProtGenerics_1.43.0
#> [11] statmod_1.5.1 magrittr_2.0.4
#> [13] compiler_4.6.0 rlang_1.1.6
#> [15] sass_0.4.10 tools_4.6.0
#> [17] igraph_2.2.1 yaml_2.3.11
#> [19] data.table_1.17.8 knitr_1.50
#> [21] labeling_0.4.3 S4Arrays_1.11.1
#> [23] DelayedArray_0.37.0 plyr_1.8.9
#> [25] RColorBrewer_1.1-3 abind_1.4-8
#> [27] BiocParallel_1.45.0 withr_3.0.2
#> [29] purrr_1.2.0 BiocGenerics_0.57.0
#> [31] grid_4.6.0 stats4_4.6.0
#> [33] preprocessCore_1.73.0 ggplot2_4.0.1
#> [35] iterators_1.0.14 scales_1.4.0
#> [37] MASS_7.3-65 MultiAssayExperiment_1.37.2
#> [39] dichromat_2.0-0.1 SummarizedExperiment_1.41.0
#> [41] cli_3.6.5 crayon_1.5.3
#> [43] mzR_2.45.0 rmarkdown_2.30
#> [45] generics_0.1.4 reshape2_1.4.5
#> [47] ncdf4_1.24 affy_1.89.0
#> [49] cachem_1.1.0 stringr_1.6.0
#> [51] parallel_4.6.0 impute_1.85.0
#> [53] BiocManager_1.30.27 vsn_3.79.0
#> [55] AnnotationFilter_1.35.0 XVector_0.51.0
#> [57] matrixStats_1.5.0 vctrs_0.6.5
#> [59] Matrix_1.7-4 jsonlite_2.0.0
#> [61] IRanges_2.45.0 S4Vectors_0.49.0
#> [63] MALDIquant_1.22.3 ggrepel_0.9.6
#> [65] clue_0.3-66 foreach_1.5.2
#> [67] limma_3.67.0 jquerylib_0.1.4
#> [69] tidyr_1.3.1 affyio_1.81.0
#> [71] glue_1.8.0 MSnbase_2.37.0
#> [73] codetools_0.2-20 QFeatures_1.21.0
#> [75] Spectra_1.21.0 stringi_1.8.7
#> [77] gtable_0.3.6 GenomicRanges_1.63.0
#> [79] mzID_1.49.0 tibble_3.3.0
#> [81] pillar_1.11.1 pcaMethods_2.3.0
#> [83] htmltools_0.5.8.1 Seqinfo_1.1.0
#> [85] R6_2.6.1 doParallel_1.0.17
#> [87] evaluate_1.0.5 lattice_0.22-7
#> [89] Biobase_2.71.0 bslib_0.9.0
#> [91] MetaboCoreUtils_1.19.1 Rcpp_1.1.0
#> [93] PSMatch_1.15.0 SparseArray_1.11.7
#> [95] xfun_0.54 MsCoreUtils_1.23.1
#> [97] MatrixGenerics_1.23.0 fs_1.6.6
#> [99] pkgconfig_2.0.3