A common challenge in integrative analysis of high-dimensional data sets is subsequent biological interpretation. Anansi addresses this challenge by only considering pairwise associations that are known to occur a priori. In order to achieve this, we need to provide this relational information to anansi in the form of a (bi)adjacency matrix.
This vignette can be divided into two sections:
Recall that the citric acid cycle, or Krebs cycle, is a central piece of metabolic machinery involved in oxidative phosphorylation. See figure below for a simplified representation with 8 unique enzymatic reactions and 9 unique metabolites.
Figure 1: Simplified representation of the Krebs cycle
Orange boxes mark
metabolites and blue arrows represent enyzmes.
Let’s imagine we had (high-dimensional proxy) measurements available for both types. For instance, we could have transcription data for the enzymes and metabolomics data for the metabolites. We could represent our two data sets like two sets of features.
If we were to perform an all-vs-all integrative analysis between our two data sets, comprehensively testing every metabolite-enzyme pair, we’d test \(8\times9=72\) hypotheses.
Figure 2: Graph representation of possible association sctructures in the
Krebs cycle. A) All-vs-all, with a full graph. B) Constrained, only
linking metabolites an enzymes that are known to directly interact.
Often however, only a subset of these comparisons is scientifically relevant to investigate. In this case for instance, only 17 associations between enzyme-metabolite pairs are likely to make sense. Non-selectively testing all 72 associations actively harms statistical power, as 55 of these tests likely cannot be interpreted, but their p-values will still be tallied for the purpose of FDR and FWER. Moreover, they tend to obscure any biologically interpretable findings.
In order to ‘know’ which features-pairs in your data set should be considered, anansi requires a biadjacency matrix. The biadjacency matrix corresponding to to the sparser graph on the right can be seen in the figure below.
Figure 3: Matrix representation of the constrained analysis of the Krebs
cycle, with cells marked with ‘X’ signifying tested feature-pairs.
In anansi, we work with biadjacency matrices in the AnansiWeb
S7 class, which
consists of a biadjacency matrix, in this case we call it a dictionary
, as
well as two tables of observations that should be analysed jointly.
The anansi package offers some functions that should be sufficient for basic analysis. To enable more advanced users to pursue non-standard applications, our methodology is compatible with several popular interfaces.
weaveWeb()
The recommended default interface to generate an AnansiWeb
object is
weaveWeb()
. Besides a traditional interface, it also accepts R formula
syntax. Once we have an AnansiWeb
object, we can use the $ operator to access
the biadjacency matrix in the ‘dictionary’ slot.
# Two interfaces
form.web <- weaveWeb(ec ~ ko, link = kegg_link())
## Warning in .check_metadata_labels(metadata, tableY, tableX): Argument
## `metadata` not provided; Please validate sample ID order.
trad.web <- weaveWeb(y = "ec", x = "ko", link = kegg_link())
## Warning in .check_metadata_labels(metadata, tableY, tableX): Argument
## `metadata` not provided; Please validate sample ID order.
# Same output
identical(form.web, trad.web)
## [1] TRUE
# Get the biadjacency matrix using the $ operator
head(dictionary(form.web), c(10, 10))
## 10 x 10 sparse Matrix of class "ngCMatrix"
## [[ suppressing 10 column names 'K00001', 'K00002', 'K00003' ... ]]
## ko
## ec
## 1.1.1.1 | . . . . . . . . .
## 1.1.1.2 . | . . . . . . . .
## 1.1.1.3 . . | . . . . . . .
## 1.1.1.4 . . . | . . . . . .
## 1.1.1.303 . . . | . . . . . .
## 1.1.1.6 . . . . | . . . . .
## 1.1.1.8 . . . . . | . . . .
## 1.1.1.11 . . . . . . | . . .
## 1.1.1.14 . . . . . . . | . .
## 1.1.1.17 . . . . . . . . | .
Note that the majority of this matrix consists of dots, it’s a sparse matrix, which is a format that can considerably speed up calculations involving huge, mostly empty matrices. For further reading, see the Matrix package website.
weaveWeb()
input: link
weaveWeb()
takes additional arguments, let’s focus on link
, which will take
the information necessary to link features between input data sets. The anansi
package includes such data for the KEGG database,
let’s take a look at some of it:
# Extract the data.frame from the list that kegg_link() returns
ec2ko <- kegg_link()[["ec2ko"]]
head(ec2ko)
## ec ko
## 1 1.1.1.1 K00001
## 2 1.1.1.2 K00002
## 3 1.1.1.3 K00003
## 4 1.1.1.4 K00004
## 5 1.1.1.303 K00004
## 6 1.1.1.6 K00005
link
inputThe format of the data.frame
is important: It should have two named columns.
Each column contains feature IDs of one type (in this case, ec
and ko
, which
is how the KEGG database refers to
Enzyme Commission numbers and
kegg orthologes, respectively. If two
feature IDs are on the same row of the data.frame
, it means those features
are paired (hence adjacency).
The column names correspond to the types of feature IDs that should be linked.
These names are used throughout the AnansiWeb
workflow. They can be directly
called from an AnansiWeb
object:
# Note that the formula controls the order
names(weaveWeb(ko ~ ec, link = kegg_link()))
## Warning in .check_metadata_labels(metadata, tableY, tableX): Argument
## `metadata` not provided; Please validate sample ID order.
## [1] "ko" "ec"
# names() is short for the above
names(weaveWeb(ec ~ ko, link = kegg_link()))
## Warning in .check_metadata_labels(metadata, tableY, tableX): Argument
## `metadata` not provided; Please validate sample ID order.
## [1] "ec" "ko"
We need to provide weaveWeb()
with a map of which features are linked. We can
use the pre-packaged kegg link map, which consists of two similarly structured
data.frame
s in a list, one of which we just inspected. We can call it directly
using kegg_link()
:
# Only print the first few entries
lapply(kegg_link(), head)
## $ec2ko
## ec ko
## 1 1.1.1.1 K00001
## 2 1.1.1.2 K00002
## 3 1.1.1.3 K00003
## 4 1.1.1.4 K00004
## 5 1.1.1.303 K00004
## 6 1.1.1.6 K00005
##
## $ec2cpd
## ec cpd
## 1 1.1.1.1 C00001
## 2 1.1.1.115 C00001
## 3 1.1.1.132 C00001
## 4 1.1.1.136 C00001
## 5 1.1.1.170 C00001
## 6 1.1.1.186 C00001
data.frame
Note the column names in the two data.frames
: ec
, cpd
and ko
. Two of
these correspond to the formula we used just used: weaveWeb(ec ~ ko)
. We
didn’t use the term cpd
. Further, ec
is present in both data.frame
s.
We can use weaveWeb()
to make a biadjacency matrix between any combination of
one or two similarly structured data.frame
s, presented as a list. For the pre-
packaged data set, this means we can link between any two of ec
, cpd
and
ko
, in either order.
# Formula notation
head(dictionary(weaveWeb(cpd ~ ko, link = kegg_link())), c(10, 10))
## Warning in .check_metadata_labels(metadata, tableY, tableX): Argument
## `metadata` not provided; Please validate sample ID order.
## 10 x 10 sparse Matrix of class "ngCMatrix"
## [[ suppressing 10 column names 'K00001', 'K00002', 'K00003' ... ]]
## ko
## cpd
## C00001 | . . . . . . . . .
## C00002 . . . . . . . . . .
## C00003 | . | | | | | | | |
## C00004 | . | | | | | | | |
## C00005 | | | . . . . . . .
## C00006 | | | . . . . . . .
## C00007 . . . . . . . . . .
## C00008 . . . . . . . . . .
## C00009 . . . . . . . . . .
## C00010 . . . . . . . . . .
# Character notation
head(dictionary(weaveWeb(y = "ko", x = "cpd", link = kegg_link())), c(10, 10))
## Warning in .check_metadata_labels(metadata, tableY, tableX): Argument
## `metadata` not provided; Please validate sample ID order.
## 10 x 10 sparse Matrix of class "ngCMatrix"
## [[ suppressing 10 column names 'C00001', 'C00002', 'C00003' ... ]]
## cpd
## ko
## K00001 | . | | | | . . . .
## K00002 . . . . | | . . . .
## K00003 . . | | | | . . . .
## K00004 . . | | . . . . . .
## K00005 . . | | . . . . . .
## K00006 . . | | . . . . . .
## K00007 . . | | . . . . . .
## K00008 . . | | . . . . . .
## K00009 . . | | . . . . . .
## K00010 . . | | . . . . . .
The most flexible way to make an AnansiWeb
object is through AnansiWeb()
:
# Prep some dummy input tables
tX <- matrix(rnorm(30), nrow = 6, dimnames = list(NULL, x = paste("x", seq_len(5),
sep = "_")))
tY <- matrix(rnorm(42), nrow = 6, dimnames = list(NULL, y = paste("y", seq_len(7),
sep = "_")))
# Prep biadjacency matrix base::matrix is fine too, but will get coerced to
# Matrix::Matrix.
d <- matrix(data = sample(x = c(TRUE, FALSE), size = 35, replace = TRUE), nrow = NCOL(tY),
ncol = NCOL(tX), dimnames = list(colnames(tY), colnames(tX)))
# make the AnansiWeb
w <- AnansiWeb(tableY = tY, tableX = tX, dictionary = d)
## Warning in AnansiWeb(tableY = tY, tableX = tX, dictionary = d): Dimnames of
## 'dictionary' were missing; Assigned 'y' and 'x'.
## Warning in .check_metadata_labels(metadata, tableY, tableX): Argument
## `metadata` not provided; Please validate sample ID order.
# Confirm the warning:
names(w)
## [1] "y" "x"
There are numerous ways in which we can define an adjacency matrix. Here, we demonstrate a graph-based and matrix based approach.
igraph
Importantly, (bi)adjacency matrices can be understood as graphs. Two common
packages that deal with graphs are igraph
and
graph
. To start off, we need to
have a graph object, so let’s make one using igraph
.
library(igraph)
# Prepared data.frame for the occasion
head(krebs_edge_df)
## Enzyme Metabolite
## 1 isocitrate dehydrogenase isocitrate
## 2 isocitrate dehydrogenase ketoglutarate
## 3 ketoglutarate dehydrogenase ketoglutarate
## 4 ketoglutarate dehydrogenase succinyl-CoA
## 5 succinyl-CoA synthetase succinyl-CoA
## 6 succinyl-CoA synthetase succinate
# Convert data.frame to graph
g <- graph_from_data_frame(krebs_edge_df, directed = FALSE)
Now that we have constructed a graph, we still need to identify which features,
vertices, belong to which data modality, in this case either enzymes and
metabolites. Using the igraph
package, this should be done through the type
slot. type
can be assigned a boolean vector, where TRUE
values
become columns, corresponding to features in tableX
, whereas FALSE
values
become rows, which correspond to those in tableY
.
V(g)$type <- V(g)$name %in% krebs_edge_df$Metabolite
# Now that we've defined type, we can convert to biadjacency matrix:
bi_mat1 <- as_biadjacency_matrix(g, sparse = TRUE)
head(bi_mat1, n = c(4, 5))
## 4 x 5 sparse Matrix of class "dgTMatrix"
## isocitrate ketoglutarate succinyl-CoA succinate
## isocitrate dehydrogenase 1 1 . .
## ketoglutarate dehydrogenase . 1 1 .
## succinyl-CoA synthetase . . 1 1
## succinate dehydrogenase . . . 1
## fumarate
## isocitrate dehydrogenase .
## ketoglutarate dehydrogenase .
## succinyl-CoA synthetase .
## succinate dehydrogenase 1
Though biadjacency support in graph
is currently limited, We note that igraph
and graph objects can be converted using the graph_from_graphnel()
and
as_graphnel()
functions in igraph
.
Matrix
We can also define a matrix directly. Conveniently, sparse matrices can be
defined easily from our starting data. The Matrix
library provides fantastic
support for this.
library(Matrix)
# For this approach, it's useful to prepare the input as two factors:
Enzyme <- factor(krebs_edge_df$Enzyme)
Metabolite <- factor(krebs_edge_df$Metabolite)
# We can get integers out of factors, corresponding to their level indices
bi_mat2 <- sparseMatrix(i = as.integer(Enzyme), j = as.integer(Metabolite), dimnames = list(Enzyme = levels(Enzyme),
Metabolite = levels(Metabolite)))
head(bi_mat2, n = c(4, 5))
## 4 x 5 sparse Matrix of class "ngCMatrix"
## Metabolite
## Enzyme citrate cis-aconitate isocitrate ketoglutarate
## aconitase | | | .
## isocitrate dehydrogenase . . | |
## ketoglutarate dehydrogenase . . . |
## succinyl-CoA synthetase . . . .
## Metabolite
## Enzyme succinyl-CoA
## aconitase .
## isocitrate dehydrogenase .
## ketoglutarate dehydrogenase |
## succinyl-CoA synthetase |
sessionInfo()
## R version 4.5.1 Patched (2025-08-23 r88802)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Matrix_1.7-4 dplyr_1.1.4 tidygraph_1.3.1 igraph_2.2.0
## [5] ggraph_2.2.2 patchwork_1.3.2 ggplot2_4.0.0 anansi_0.99.4
## [9] BiocStyle_2.37.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 viridisLite_0.4.2
## [3] farver_2.1.2 viridis_0.6.5
## [5] Biostrings_2.77.2 S7_0.2.0
## [7] fastmap_1.2.0 SingleCellExperiment_1.31.1
## [9] lazyeval_0.2.2 tweenr_2.0.3
## [11] digest_0.6.37 lifecycle_1.0.4
## [13] tidytree_0.4.6 magrittr_2.0.4
## [15] compiler_4.5.1 rlang_1.1.6
## [17] sass_0.4.10 tools_4.5.1
## [19] yaml_2.3.10 knitr_1.50
## [21] labeling_0.4.3 S4Arrays_1.9.1
## [23] graphlayouts_1.2.2 DelayedArray_0.35.3
## [25] RColorBrewer_1.1-3 TreeSummarizedExperiment_2.17.1
## [27] abind_1.4-8 BiocParallel_1.43.4
## [29] withr_3.0.2 purrr_1.1.0
## [31] BiocGenerics_0.55.3 grid_4.5.1
## [33] polyclip_1.10-7 stats4_4.5.1
## [35] scales_1.4.0 MASS_7.3-65
## [37] tinytex_0.57 MultiAssayExperiment_1.35.9
## [39] dichromat_2.0-0.1 SummarizedExperiment_1.39.2
## [41] cli_3.6.5 rmarkdown_2.30
## [43] crayon_1.5.3 treeio_1.33.0
## [45] generics_0.1.4 BiocBaseUtils_1.11.2
## [47] ape_5.8-1 cachem_1.1.0
## [49] ggforce_0.5.0 parallel_4.5.1
## [51] formatR_1.14 BiocManager_1.30.26
## [53] XVector_0.49.1 matrixStats_1.5.0
## [55] yulab.utils_0.2.1 vctrs_0.6.5
## [57] jsonlite_2.0.0 bookdown_0.45
## [59] IRanges_2.43.5 S4Vectors_0.47.4
## [61] ggrepel_0.9.6 magick_2.9.0
## [63] tidyr_1.3.1 jquerylib_0.1.4
## [65] glue_1.8.0 codetools_0.2-20
## [67] gtable_0.3.6 GenomicRanges_1.61.5
## [69] tibble_3.3.0 pillar_1.11.1
## [71] rappdirs_0.3.3 htmltools_0.5.8.1
## [73] Seqinfo_0.99.2 R6_2.6.1
## [75] evaluate_1.0.5 lattice_0.22-7
## [77] Biobase_2.69.1 memoise_2.0.1
## [79] bslib_0.9.0 Rcpp_1.1.0
## [81] gridExtra_2.3 SparseArray_1.9.1
## [83] nlme_3.1-168 xfun_0.53
## [85] fs_1.6.6 MatrixGenerics_1.21.0
## [87] forcats_1.0.1 pkgconfig_2.0.3