--- title: "Glycan Graphs: The Network Behind Your Sugar Structures πŸ•ΈοΈ" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Glycan Graphs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` **⚠️ Advanced Users Alert:** This vignette is tailored for those already familiar with graph theory and the `igraph` package. If you're new to these concepts, we recommend checking out the [igraph documentation](https://r.igraph.org) first! ## The Hidden Graph Universe of Glycans 🌌 Think of glycans as nature's own social networks – they're naturally represented as directed graphs, specifically as outwardly-directed trees where each sugar "talks" to its neighbors in a very structured way. Behind the scenes, every `glycan_structure()` object is actually powered by an `igraph` object. The beauty of `glycoverse` is that most users can work with the intuitive concept of "glycan structures" without getting lost in the graph theory weeds 🌾. But for you power users who want to peek under the hood – this guide is your treasure map! πŸ—ΊοΈ ```{r setup} library(glyrepr) ``` ## What's Actually Stored in Memory? 🧠 Representing a glycan in computer memory is like trying to pack for a month-long trip in a carry-on bag – you need to decide what's absolutely essential! A glycan has tons of information: linear oriented C-atoms, basetype (the stereochemical skeleton), substituents, configuration, anomeric center, ring size, linkage positions... the list goes on! πŸ“ Some packages (like Python's `glypy`) take the "pack everything" approach πŸŽ’, storing every tiny detail. This comprehensive strategy is fantastic for specialized tasks like MS/MS spectra simulation, but it can be overkill for everyday omics research. `glyrepr` takes a more minimalist approach ✨. Our philosophy: **if you can derive it from an IUPAC-condensed text representation, we'll store it**. Everything else? We let it go. This means we skip details like configuration and ring size – and that's usually just fine, since common carbohydrates have predictable properties anyway. > πŸ’‘ **Pro Tip:** Want to master IUPAC-condensed notation? > Check out [this comprehensive guide](https://glycoverse.github.io/glyrepr/articles/iupac.html). ## Extracting the Graph: Show Me the Network! πŸ” You can't just throw `igraph` functions at a `glycan_structure()` object – they speak different languages! Instead, let's extract the underlying graph using `get_structure_graphs()`: ```{r} glycan <- n_glycan_core() graph <- get_structure_graphs(glycan) graph ``` Let's decode what we're seeing here πŸ•΅οΈ: **First line:** Directed Named ("DN") graph with 5 vertices (sugar units) and 4 edges (bonds). Think of it as a family tree with 5 people and 4 relationships. **Graph-level attributes:** - `anomer` πŸ”„: The anomeric configuration of the reducing end (the "root" of our tree) **Vertex attributes (the sugar units themselves):** - `name` 🏷️: Unique ID for each sugar (like social security numbers) - `mono` 🍬: The actual sugar type ("Hex", "HexNAc", etc.) - `sub` βš—οΈ: Any chemical decorations attached to the sugar **Edge attributes (the connections):** - `linkage` πŸ”—: How the sugars are connected (including bond positions and configurations) **Connection pattern:** "1->2" means vertex 1 connects to vertex 2. We treat bonds as arrows pointing from the core toward the branches (even though real glycosidic bonds aren't actually directional – it just makes coding easier! πŸ˜…) Want to see it visually? `igraph` has got you covered: ```{r} plot(graph) ``` ## Deep Dive: Dissecting the Components πŸ”¬ ### Vertices: Meet Your Sugar Cast 🎭 Each vertex represents a monosaccharide with three key properties: **🏷️ Names (Unique IDs):** These are auto-generated identifiers – usually simple integers, but they could be anything as long as they're unique: ```{r} igraph::V(graph)$name ``` **🍬 Monosaccharides (The Star Players):** These are IUPAC-condensed names like "Hex", "HexNAc", "Glc", "GlcNAc". Think of them as the "job titles" of your sugars: ```{r} igraph::V(graph)$mono ``` > πŸ“š **Reference:** For the complete cast of available monosaccharides, > check [SNFG notation](https://www.ncbi.nlm.nih.gov/glycans/snfg.html) or run `available_monosaccharides()`. **βš—οΈ Substituents (The Accessories):** Chemical decorations like "Me" (methyl), "Ac" (acetyl), "S" (sulfate), etc. Position matters! "3Me" = methyl at position 3, "?S" = sulfate at unknown position: ```{r} igraph::V(graph)$sub ``` Got multiple decorations? No problem! They're comma-separated and sorted by position: ```{r} glycan2 <- as_glycan_structure("Glc3Me6S(a1-") graph2 <- get_structure_graphs(glycan2) igraph::V(graph2)$sub ``` ### Edges: The Relationship Status πŸ’• Edges represent glycosidic bonds with a simple but powerful format: ``` - ``` Here's a real example where "Gal" has an "a" anomeric configuration, linking from position 3 of "GalNAc" to position 1 of "Gal": ```{r} glycan3 <- as_glycan_structure("Gal(a1-3)GalNAc(b1-") graph3 <- get_structure_graphs(glycan3) igraph::E(graph3)$linkage ``` > πŸ€” **Why encode anomer info in edges?** We debated this! > It might seem more natural to store it with vertices, > but thinking "Neu5Ac with a2-3 linkage" flows better mentally and matches IUPAC notation perfectly. ### Graph-Level Attributes: The Global Settings βš™οΈ **πŸ”„ Anomer:** The anomeric configuration of the reducing end (the "root" sugar that doesn't link to anything else) ```{r} graph$anomer ``` ## Now for the Fun Part: What Can You Do? πŸŽ‰ ### Unleash the Power of `igraph` πŸ’ͺ Once you understand the graph structure, the entire `igraph` universe opens up! **Example 1:** Count branched structures (sugars with multiple children): ```{r} sum(igraph::degree(graph, mode = "out") > 1) ``` **Example 2:** Explore the structure with breadth-first search: ```{r} bfs_result <- igraph::bfs(graph, root = 1, mode = "out") bfs_result$order ``` ### Level Up with `smap` Functions πŸš€ Working with multiple glycans? You could use `purrr`: ```{r} library(purrr) glycans <- c(n_glycan_core(), o_glycan_core_1(), o_glycan_core_2()) graphs <- get_structure_graphs(glycans) # Extract graphs first map_int(graphs, ~ igraph::vcount(.x)) # Then analyze ``` But `glyrepr`'s `smap` functions are way more elegant: ```{r} smap_int(glycans, ~ igraph::vcount(.x)) # Direct analysis - no intermediate step! ``` The real magic ✨ of `smap` functions is their intelligence with duplicates. Real datasets often have many identical structures, and `smap` optimizes by processing unique structures once, then efficiently expanding results back to the original dimensions. > πŸ“– **Learn More:** Dive deeper into `smap` wizardry in the [dedicated vignette](https://glycoverse.github.io/glyrepr/articles/smap.html). ### Motif Hunting with `glymotif` πŸ” One of the most exciting applications is identifying biologically meaningful motifs (functional substructures). The `glymotif` package, built on this graph foundation, specializes in exactly this task. > 🎯 **Get Started:** Check out the [`glymotif` introduction](https://glycoverse.github.io/glymotif/articles/glymotif.html) to start your motif hunting adventure! ## Wrapping Up: Your Graph Journey Continues 🎯 You've just unlocked the graph-powered engine behind `glyrepr`! You now understand: - πŸ—οΈ How glycan structures map to directed graphs - πŸ“Š What information is stored (and what's deliberately omitted) - πŸ”§ How to extract and manipulate the underlying graphs - πŸš€ How to leverage `igraph`, `smap`, and `glymotif` for powerful analyses The graph representation might seem complex at first, but it's this solid foundation that enables all the sophisticated glycan analysis capabilities in the `glycoverse`. Now go forth and explore your glycan networks! 🌟