Gene regulatory networks model the underlying gene regulation hierarchies that drive gene expression and cell states. The main functions of this package are to construct gene regulatory networks and infer transcription factor (TF) activity at the single cell level by integrating scATAC-seq and scRNA-seq data and incorporating of public bulk TF ChIP-seq data.
There are three related packages: epiregulon,
epiregulon.extra
and epiregulon.archr, the two of which are available
through Bioconductor and the last of which is only available through github. The
basic epiregulon package takes in gene expression and
chromatin accessibility matrices in the form of
SingleCellExperiment objects, constructs gene regulatory
networks (“regulons”) and outputs the activity of transcription factors
at the single cell level. The epiregulon.extra package
provides a suite of tools for enrichment analysis of target genes,
visualization of target genes and transcription factor activity, and
network analysis which can be run on the epiregulon output.
If the users would like to start from ArchR projects
instead of SingleCellExperiment objects, they may choose to
use epiregulon.archr package, which allows for seamless
integration with the ArchR
package, and continue with the rest of the workflow offered in
epiregulon.extra.
For full documentation of the epiregulon package, please
refer to the epiregulon
book.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("epiregulon")Alternatively, you could install from github
Load package
Prior to using epiregulon, single cell preprocessing
needs to performed by user’s favorite methods. The following components
are required:
1. Peak matrix from scATAC-seq containing the
chromatin accessibility information
2. Gene expression matrix from
either paired or unpaired scRNA-seq. RNA-seq integration needs to be
performed for unpaired dataset.
3. Dimensionality reduction matrix
from either single modality dataset or joint scRNA-seq and scATAC-seq
This tutorial demonstrates the basic functions of
epiregulon, using the reprogram-seq dataset which can be
downloaded from the scMultiome
package. In this example, prostate cancer cells (LNCaP) were infected in
separate wells with viruses encoding 4 transcription factors (NKX2-1,
GATA6, FOXA1 and FOXA2) and a positive control (mNeonGreen) before
pooling. The identity of the infected transcription factors was tracked
through cell hashing (available in the field
hash_assignment of the colData) and serves as
the ground truth.
# load the MAE object
library(scMultiome)
mae <- scMultiome::reprogramSeq()
# extract peak matrix
PeakMatrix <- mae[["PeakMatrix"]]
# extract expression matrix
GeneExpressionMatrix <- mae[["GeneExpressionMatrix"]]
rownames(GeneExpressionMatrix) <- rowData(GeneExpressionMatrix)$name
# define the order of hash_assigment
GeneExpressionMatrix$hash_assignment <-
factor(as.character(GeneExpressionMatrix$hash_assignment),
levels = c("HTO10_GATA6_UTR", "HTO2_GATA6_v2",
"HTO8_NKX2.1_UTR", "HTO3_NKX2.1_v2",
"HTO1_FOXA2_v2", "HTO4_mFOXA1_v2", "HTO6_hFOXA1_UTR",
"HTO5_NeonG_v2"))
# extract dimensional reduction matrix
reducedDimMatrix <- reducedDim(mae[['TileMatrix500']], "LSI_ATAC")
# transfer UMAP_combined from TileMatrix to GeneExpressionMatrix
reducedDim(GeneExpressionMatrix, "UMAP_Combined") <-
reducedDim(mae[['TileMatrix500']], "UMAP_Combined")Visualize singleCellExperiment by UMAP
scater::plotReducedDim(GeneExpressionMatrix,
dimred = "UMAP_Combined",
text_by = "Clusters",
colour_by = "Clusters")First, we retrieve a GRangesList object containing the
binding sites of all the transcription factors and co-regulators. These
binding sites are derived from bulk ChIP-seq data in the ChIP-Atlas and ENCODE databases. For the same
transcription factor, multiple ChIP-seq files from different cell lines
or tissues are merged. For further information on how these peaks are
derived, please refer to ?epiregulon::getTFMotifInfo.
Currently, human genomes hg19 and hg38 and mouse mm10 are supported.
grl <- getTFMotifInfo(genome = "hg38")
#> Retrieving chip-seq data, version 2
#> see ?scMultiome and browseVignettes('scMultiome') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
grl
#> GRangesList object of length 1377:
#> $AEBP2
#> GRanges object with 2700 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 9792-10446 *
#> [2] chr1 942105-942400 *
#> [3] chr1 984486-984781 *
#> [4] chr1 3068932-3069282 *
#> [5] chr1 3069411-3069950 *
#> ... ... ... ...
#> [2696] chrY 8465261-8465730 *
#> [2697] chrY 11721744-11722260 *
#> [2698] chrY 11747448-11747964 *
#> [2699] chrY 19302661-19303134 *
#> [2700] chrY 19985662-19985982 *
#> -------
#> seqinfo: 25 sequences from an unspecified genome; no seqlengths
#>
#> ...
#> <1376 more elements>We recommend the use of ChIP-seq data over motif for estimating TF
occupancy. However, if the user would like to start from motifs, it is
possible to switch to the motif mode. The user can provide
the peak regions as a GRanges object and the location of
the motifs will be annotated based on Cisbp from chromVARmotifs
(human_pwms_v2 and mouse_pwms_v2, version 0.2)
Next, we try to link ATAC-seq peaks to their putative target genes. We assign a peak to a gene within a size window (default ±250kb) if the chromatin accessibility of the peak and expression of the target genes are highly correlated (default p-vale threshold 0.05). To compute correlations, we first create cell aggregates by performing k-means clustering on the reduced dimensionality matrix. Then we aggregate the counts of the gene expression and peak matrix and average across the number of cells. Correlations are computed on the averaged gene expression and chromatin accessibility.
The number of metacells used to aggregate cells into meta-cells is a
sensitive parameter, as it should strike the correct balance between
averaging out biological variation and reducing signal noise caused by
sparsity in single cells. Therefore, we provide a separate function,
optimizeMetacellNumber, to automatically select the optimal
value for this parameter. The algorithm samples different possible
values and calculates the mean empirical p-value of RE–TG connections
for each value. Then, a quadratic regression is fitted to the
relationship, and the minimum of the function is determined.
cellNum <- optimizeMetacellNumber(peakMatrix = PeakMatrix,
expMatrix = GeneExpressionMatrix,
reducedDim = reducedDimMatrix,
exp_assay = "normalizedCounts",
peak_assay = "counts",
BPPARAM = BiocParallel::SerialParam(progressbar = FALSE))
#> Warning in optimizeMetacellNumber(peakMatrix = PeakMatrix, expMatrix =
#> GeneExpressionMatrix, : Coefficient of determination of the regression model is
#> lower thant 0.7
#> An issue detected during estimation optimal number of metacells.
#> Consider at least one of the following actions:
#> 1. Change of the `cellNumMin` and `cellNumMax` paramaters
#> 2. Increasing the number of evaluation points (`n_evaluation_points` argument)
#> 3. Increasing the number of iterations (`n_iter` argument)
#> 4. Increasing the number of false connections used to compute p-value
#> null distribution (`nRandConns` argument)
#> Solution not found using quadratic regression. Using cluster size with the lowest mean p-value.We can plot the results to make sure that the relationship is correctly generalized.
The output of optimizeMetacellNumber function is then
passed to calculateP2G as cellNum argument.
Alternatively, users may skip optimizeMetacellNumber and
specify their own cell number in cellNum.
set.seed(1010)
p2g <- calculateP2G(peakMatrix = PeakMatrix,
expMatrix = GeneExpressionMatrix,
reducedDim = reducedDimMatrix,
exp_assay = "normalizedCounts",
peak_assay = "counts",
cellNum = cellNum,
BPPARAM = BiocParallel::SerialParam(progressbar = FALSE))
#> Using epiregulon to compute peak to gene links...
#> Creating metacells...
#> Looking for regulatory elements near target genes...
#> Computing correlations...
p2g
#> DataFrame with 27000 rows and 10 columns
#> idxATAC chr start end idxRNA target
#> <integer> <character> <integer> <integer> <integer> <array>
#> 1 1 chr1 817121 817621 15 LINC01409
#> 2 1 chr1 817121 817621 17 LINC01128
#> 3 4 chr1 827287 827787 15 LINC01409
#> 4 10 chr1 920987 921487 30 HES4
#> 5 11 chr1 923974 924474 21 AL645608.2
#> ... ... ... ... ... ... ...
#> 26996 126573 chrX 154547004 154547504 36402 UBL4A
#> 26997 126574 chrX 154607299 154607799 36404 FAM3A
#> 26998 126579 chrX 154799320 154799820 36414 DKC1
#> 26999 126599 chrX 155897986 155898486 36436 VAMP7
#> 27000 126601 chrX 155959324 155959824 36436 VAMP7
#> Correlation p_val_peak_gene FDR_peak_gene distance
#> <matrix> <matrix> <matrix> <integer>
#> 1 -0.262028 0.00901753 0.927936 38363
#> 2 -0.202686 0.03208484 0.927936 9975
#> 3 -0.243085 0.01372893 0.927936 48529
#> 4 -0.196595 0.03693134 0.930558 78492
#> 5 -0.211672 0.02664730 0.927936 12539
#> ... ... ... ... ...
#> 26996 -0.225898 0.0199095 0.927936 60420
#> 26997 0.345985 0.0443088 0.930779 91183
#> 26998 -0.200960 0.0334358 0.927936 36578
#> 26999 0.518876 0.0105684 0.927936 16641
#> 27000 0.398017 0.0292531 0.927936 77979The next step is to add the TF binding information by overlapping regions of the peak matrix with the bulk chip-seq database. The output is a data frame object with three columns:
idxATAC - index of the peak in the peak matrixidxTF - index in the gene expression matrix
corresponding to the transcription factortf - name of the transcription factorA DataFrame, representing the inferred regulons, is then generated. The DataFrame consists of ten columns:
idxATAC - index of the peak in the peak matrixchr - chromosome numberstart - start position of the peakend - end position of the peakidxRNA - index in the gene expression matrix
corresponding to the target genetarget - name of the target genedistance - distance between the transcription start
site of the target gene and the middle of the peakidxTF - index in the gene expression matrix
corresponding to the transcription factortf - name of the transcription factorcorr - correlation between target gene expression and
the chromatin accessibility at the peak. if cluster labels are provided,
this field is a matrix with columns names corresponding to correlation
across all cells and for each of the clusters.
regulon <- getRegulon(p2g = p2g,
overlap = overlap)
regulon
#> DataFrame with 5822538 rows and 12 columns
#> idxATAC chr start end idxRNA target corr
#> <integer> <character> <integer> <integer> <integer> <array> <matrix>
#> 1 1 chr1 817121 817621 17 LINC01128 -0.202686
#> 2 1 chr1 817121 817621 17 LINC01128 -0.202686
#> 3 1 chr1 817121 817621 17 LINC01128 -0.202686
#> 4 1 chr1 817121 817621 17 LINC01128 -0.202686
#> 5 1 chr1 817121 817621 17 LINC01128 -0.202686
#> ... ... ... ... ... ... ... ...
#> 5822534 126601 chrX 155959324 155959824 36436 VAMP7 0.398017
#> 5822535 126601 chrX 155959324 155959824 36436 VAMP7 0.398017
#> 5822536 126601 chrX 155959324 155959824 36436 VAMP7 0.398017
#> 5822537 126601 chrX 155959324 155959824 36436 VAMP7 0.398017
#> 5822538 126601 chrX 155959324 155959824 36436 VAMP7 0.398017
#> p_val_peak_gene FDR_peak_gene distance idxTF tf
#> <matrix> <matrix> <integer> <integer> <character>
#> 1 0.0320848 0.927936 9975 16 ATF1
#> 2 0.0320848 0.927936 9975 17 ATF2
#> 3 0.0320848 0.927936 9975 18 ATF3
#> 4 0.0320848 0.927936 9975 21 ATF7
#> 5 0.0320848 0.927936 9975 35 BRCA2
#> ... ... ... ... ... ...
#> 5822534 0.0292531 0.927936 77979 419 YY1
#> 5822535 0.0292531 0.927936 77979 571 EOMES
#> 5822536 0.0292531 0.927936 77979 1018 POLR2A
#> 5822537 0.0292531 0.927936 77979 1023 POLR2H
#> 5822538 0.0292531 0.927936 77979 1286 ZNF592Epiregulon prunes the network by performing tests of independence on the observed number of cells jointly expressing transcription factor (TF), regulatory element (RE) and target gene (TG) vs the expected number of cells if TF/RE and TG are independently expressed. The users can choose between two tests, the binomial test and the chi-square test. In the binomial test, the expected probability is P(TF, RE) * P(TG), and the number of trials is the total number of cells, and the observed successes is the number of cells jointly expressing all three elements. In the chi-square test, the expected probability for having all 3 elements active is also P(TF, RE) * P(TG) and the probability otherwise is 1- P(TF, RE) * P(TG). The observed cell count for the category of “active TF” is the number of cells jointly expressing all three elements, and the cell count for the inactive category is n - n_triple.
We calculate cluster-specific p-values if users supply cluster labels. This is useful if we are interested in cluster-specific networks. The pruned regulons can then be used to visualize differential networks for transcription factors of interest.
pruned.regulon <- pruneRegulon(expMatrix = GeneExpressionMatrix,
exp_assay = "normalizedCounts",
peakMatrix = PeakMatrix,
peak_assay = "counts",
test = "chi.sq",
regulon = regulon[regulon$tf %in%
c("NKX2-1","GATA6","FOXA1","FOXA2", "AR"),],
clusters = GeneExpressionMatrix$Clusters,
prune_value = "pval",
regulon_cutoff = 0.05
)
pruned.regulonWhile the pruneRegulon function provides statistics on
the joint occurrence of TF-RE-TG, we would like to further estimate the
strength of regulation. Biologically, this can be interpreted as the
magnitude of gene expression changes induced by transcription factor
activity. Epiregulon estimates the regulatory potential using one of the
three measures: 1) correlation between TF and target gene expression, 2)
mutual information between the TF and target gene expression and 3)
Wilcoxon test statistics of target gene expression in cells jointly
expressing all 3 elements vs cells that do not.
Two of the measures (correlation and Wilcoxon statistics) give both the magnitude and directionality of changes whereas mutual information is always positive. The correlation and mutual information statistics are computed on pseudobulks aggregated by user-supplied cluster labels, whereas the Wilcoxon method groups cells into two categories: 1) the active category of cells jointly expressing TF, RE and TG and 2) the inactive category consisting of the remaining cells.
We calculate cluster-specific weights if users supply cluster labels.
So far the gene regulatory network was constructed from TF ChIP-seq
exclusively. Some users would prefer to further annotate regulatory
elements with the presence of motifs. We provide an option to annotate
peaks with motifs from the Cisbp database. If no motifs are present for
this particular factor (as in the case of co-factors or chromatin
modifiers), we return NA. If motifs are available for a factor and the
RE contains a motif, we return 1. If motifs are available and the RE
does not contain a motif, we return 0. The users can also provide their
own motif annotation through the pwms argument.
regulon.w.motif <- addMotifScore(regulon = regulon.w,
peaks = rowRanges(PeakMatrix),
species = "human",
genome = "hg38")
#> annotating peaks with motifs
#> see ?scMultiome and browseVignettes('scMultiome') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
#>
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#>
#> strsplit
#>
#> Attaching package: 'rtracklayer'
#> The following object is masked from 'package:AnnotationHub':
#>
#> hubUrl
# if desired, set weight to 0 if no motif is found
regulon.w.motif$weight[regulon.w.motif$motif == 0] <- 0
regulon.w.motif
#> DataFrame with 13011 rows and 17 columns
#> idxATAC chr start end idxRNA target corr
#> <integer> <character> <integer> <integer> <integer> <array> <matrix>
#> 1 1 chr1 817121 817621 17 LINC01128 -0.202686
#> 2 12 chr1 924540 925040 27 PLEKHN1 0.431694
#> 3 21 chr1 959089 959589 21 AL645608.2 -0.233889
#> 4 28 chr1 999317 999817 27 PLEKHN1 0.559134
#> 5 31 chr1 1001729 1002229 31 ISG15 0.403538
#> ... ... ... ... ... ... ... ...
#> 13007 123510 chr9 130031225 130031725 35090 GPR107 -0.252214
#> 13008 124110 chr9 137223379 137223879 35261 SSNA1 -0.343389
#> 13009 124973 chrX 47233173 47233673 35590 CDK16 0.528033
#> 13010 125193 chrX 56563317 56563817 35748 UBQLN2 -0.192939
#> 13011 126545 chrX 154397431 154397931 36389 FLNA -0.196522
#> p_val_peak_gene FDR_peak_gene distance idxTF tf
#> <matrix> <matrix> <integer> <integer> <character>
#> 1 0.03208484 0.927936 9975 485 AR
#> 2 0.02260801 0.927936 41440 485 AR
#> 3 0.01644770 0.927936 47654 485 AR
#> 4 0.00718454 0.927936 32835 485 AR
#> 5 0.02802707 0.927936 591 485 AR
#> ... ... ... ... ... ...
#> 13007 0.01111149 0.927936 21699 723 NKX2-1
#> 13008 0.00123273 0.925168 34703 723 NKX2-1
#> 13009 0.00973469 0.927936 14940 723 NKX2-1
#> 13010 0.03990341 0.930558 0 723 NKX2-1
#> 13011 0.03698200 0.930558 26209 723 NKX2-1
#> pval stats
#> <matrix> <matrix>
#> 1 0.154809:0.712504431:1.000000:... 2.024211: 0.1357883:0.00000:...
#> 2 0.110060:0.000116038:1.000000:... 2.553361:14.8560048:0.00000:...
#> 3 0.123672:0.620989666:1.000000:... 2.370185: 0.2444790:0.00000:...
#> 4 0.017158:0.015354703:1.000000:... 5.680140: 5.8753061:0.00000:...
#> 5 0.450921:0.812253402:0.798292:... 0.568335: 0.0564158:0.06531:...
#> ... ... ...
#> 13007 4.76496e-02:1:1:... 3.92230:0:0:...
#> 13008 5.75895e-02:1:1:... 3.60548:0:0:...
#> 13009 8.37380e-06:1:1:... 19.85056:0:0:...
#> 13010 3.27196e-04:1:1:... 12.90791:0:0:...
#> 13011 1.91636e-02:1:1:... 5.48652:0:0:...
#> qval weight motif
#> <matrix> <matrix> <numeric>
#> 1 1:1:1:... 0:0:0:... 0
#> 2 1:1:1:... 0:0:0:... 0
#> 3 1:1:1:... 0:0:0:... 0
#> 4 1:1:1:... 0:0:0:... 0
#> 5 1:1:1:... 0:0:0:... 0
#> ... ... ... ...
#> 13007 1.000000:1:1:... 0.0000000:0:0:... 0
#> 13008 1.000000:1:1:... 0.0312427:0:0:... 1
#> 13009 0.616956:1:1:... 0.0000000:0:0:... 0
#> 13010 1.000000:1:1:... 0.0000000:0:0:... 0
#> 13011 1.000000:1:1:... 0.0000000:0:0:... 0It is sometimes helpful to filter the regulons based on gene
expression changes between two conditions. The addLogFC
function is a wrapper around scran::findMarkers and adds
extra columns of log changes to the regulon DataFrame. The users can
specify the reference condition in logFC_ref and conditions
for comparison in logFC_condition. If these are not
provided, log fold changes are calculated for every condition in the
cluster labels against the rest of the conditions.
# create logcounts
GeneExpressionMatrix <- scuttle::logNormCounts(GeneExpressionMatrix)
# add log fold changes which are calculated by taking the difference of the log counts
regulon.w <- addLogFC(regulon = regulon.w,
clusters = GeneExpressionMatrix$hash_assignment,
expMatrix = GeneExpressionMatrix,
assay.type = "logcounts",
pval.type = "any",
logFC_condition = c("HTO10_GATA6_UTR",
"HTO2_GATA6_v2",
"HTO8_NKX2.1_UTR"),
logFC_ref = "HTO5_NeonG_v2")
regulon.w
#> DataFrame with 13011 rows and 25 columns
#> idxATAC chr start end idxRNA target corr
#> <integer> <character> <integer> <integer> <integer> <array> <matrix>
#> 1 1 chr1 817121 817621 17 LINC01128 -0.202686
#> 2 12 chr1 924540 925040 27 PLEKHN1 0.431694
#> 3 21 chr1 959089 959589 21 AL645608.2 -0.233889
#> 4 28 chr1 999317 999817 27 PLEKHN1 0.559134
#> 5 31 chr1 1001729 1002229 31 ISG15 0.403538
#> ... ... ... ... ... ... ... ...
#> 13007 123510 chr9 130031225 130031725 35090 GPR107 -0.252214
#> 13008 124110 chr9 137223379 137223879 35261 SSNA1 -0.343389
#> 13009 124973 chrX 47233173 47233673 35590 CDK16 0.528033
#> 13010 125193 chrX 56563317 56563817 35748 UBQLN2 -0.192939
#> 13011 126545 chrX 154397431 154397931 36389 FLNA -0.196522
#> p_val_peak_gene FDR_peak_gene distance idxTF tf
#> <matrix> <matrix> <integer> <integer> <character>
#> 1 0.03208484 0.927936 9975 485 AR
#> 2 0.02260801 0.927936 41440 485 AR
#> 3 0.01644770 0.927936 47654 485 AR
#> 4 0.00718454 0.927936 32835 485 AR
#> 5 0.02802707 0.927936 591 485 AR
#> ... ... ... ... ... ...
#> 13007 0.01111149 0.927936 21699 723 NKX2-1
#> 13008 0.00123273 0.925168 34703 723 NKX2-1
#> 13009 0.00973469 0.927936 14940 723 NKX2-1
#> 13010 0.03990341 0.930558 0 723 NKX2-1
#> 13011 0.03698200 0.930558 26209 723 NKX2-1
#> pval stats
#> <matrix> <matrix>
#> 1 0.154809:0.712504431:1.000000:... 2.024211: 0.1357883:0.00000:...
#> 2 0.110060:0.000116038:1.000000:... 2.553361:14.8560048:0.00000:...
#> 3 0.123672:0.620989666:1.000000:... 2.370185: 0.2444790:0.00000:...
#> 4 0.017158:0.015354703:1.000000:... 5.680140: 5.8753061:0.00000:...
#> 5 0.450921:0.812253402:0.798292:... 0.568335: 0.0564158:0.06531:...
#> ... ... ...
#> 13007 4.76496e-02:1:1:... 3.92230:0:0:...
#> 13008 5.75895e-02:1:1:... 3.60548:0:0:...
#> 13009 8.37380e-06:1:1:... 19.85056:0:0:...
#> 13010 3.27196e-04:1:1:... 12.90791:0:0:...
#> 13011 1.91636e-02:1:1:... 5.48652:0:0:...
#> qval weight
#> <matrix> <matrix>
#> 1 1:1:1:... 0.0228717:-0.0185965: 0.0000000:...
#> 2 1:1:1:... 0.0251689: 0.1903207: 0.0000000:...
#> 3 1:1:1:... 0.0296089:-0.0245879: 0.0000000:...
#> 4 1:1:1:... 0.0375626: 0.1220942: 0.0000000:...
#> 5 1:1:1:... 0.0110831:-0.0118111:-0.0389184:...
#> ... ... ...
#> 13007 1.000000:1:1:... 0.0290874:0:0:...
#> 13008 1.000000:1:1:... 0.0312427:0:0:...
#> 13009 0.616956:1:1:... 0.0695293:0:0:...
#> 13010 1.000000:1:1:... 0.0545797:0:0:...
#> 13011 1.000000:1:1:... 0.0367516:0:0:...
#> HTO10_GATA6_UTR.vs.HTO5_NeonG_v2.FDR
#> <numeric>
#> 1 0.0175565
#> 2 0.7687435
#> 3 0.7803240
#> 4 0.7687435
#> 5 1.0000000
#> ... ...
#> 13007 9.38624e-01
#> 13008 6.73284e-01
#> 13009 1.00000e+00
#> 13010 9.38624e-01
#> 13011 3.33965e-08
#> HTO10_GATA6_UTR.vs.HTO5_NeonG_v2.p.value
#> <numeric>
#> 1 -7.675643
#> 2 -2.406967
#> 3 -2.358197
#> 4 -2.406967
#> 5 -0.960831
#> ... ...
#> 13007 -1.905708
#> 13008 -2.683678
#> 13009 -0.722856
#> 13010 -1.581068
#> 13011 -22.192730
#> HTO10_GATA6_UTR.vs.HTO5_NeonG_v2.logFC HTO2_GATA6_v2.vs.HTO5_NeonG_v2.FDR
#> <numeric> <numeric>
#> 1 -0.0754470 0.0126545
#> 2 -0.0224284 1.0000000
#> 3 -0.0172101 1.0000000
#> 4 -0.0224284 1.0000000
#> 5 -0.0222188 0.9505118
#> ... ... ...
#> 13007 -0.0535483 6.01298e-01
#> 13008 -0.0360566 9.50512e-01
#> 13009 -0.0115076 1.00000e+00
#> 13010 -0.0259159 1.00000e+00
#> 13011 0.2771912 1.95552e-05
#> HTO2_GATA6_v2.vs.HTO5_NeonG_v2.p.value
#> <numeric>
#> 1 -8.212536
#> 2 -0.498011
#> 3 -0.916768
#> 4 -0.498011
#> 5 -1.471853
#> ... ...
#> 13007 -3.030377
#> 13008 -1.259896
#> 13009 -0.252483
#> 13010 -0.125431
#> 13011 -15.462314
#> HTO2_GATA6_v2.vs.HTO5_NeonG_v2.logFC HTO8_NKX2.1_UTR.vs.HTO5_NeonG_v2.FDR
#> <numeric> <numeric>
#> 1 -0.07615662 0.985398
#> 2 0.00796239 1.000000
#> 3 -0.00918756 1.000000
#> 4 0.00796239 1.000000
#> 5 -0.02663603 1.000000
#> ... ... ...
#> 13007 -0.06609460 0.391721
#> 13008 -0.02188740 1.000000
#> 13009 -0.00458811 1.000000
#> 13010 0.00308709 1.000000
#> 13011 0.19017108 1.000000
#> HTO8_NKX2.1_UTR.vs.HTO5_NeonG_v2.p.value
#> <numeric>
#> 1 -1.133582
#> 2 -0.128231
#> 3 -0.582022
#> 4 -0.128231
#> 5 -0.163465
#> ... ...
#> 13007 -4.308875
#> 13008 -0.677769
#> 13009 -0.170643
#> 13010 -0.690995
#> 13011 -0.696100
#> HTO8_NKX2.1_UTR.vs.HTO5_NeonG_v2.logFC
#> <numeric>
#> 1 -0.02293881
#> 2 -0.00218254
#> 3 -0.00642750
#> 4 -0.00218254
#> 5 0.00444303
#> ... ...
#> 13007 -0.07934275
#> 13008 0.01432091
#> 13009 -0.00311527
#> 13010 -0.01281210
#> 13011 -0.01675158Finally, the activities for a specific TF in each cell are computed by averaging expressions of target genes linked to the TF weighted by the test statistics of choice, chosen from either correlation, mutual information or the Wilcoxon test statistics. \[y=\frac{1}{n}\sum_{i=1}^{n} x_i * weights_i\] where \(y\) is the activity of a TF for a cell, \(n\) is the total number of targets for a TF, \(x_i\) is the log count expression of target \(i\) where \(i\) in {1,2,…,n} and \(weights_i\) is the weight of TF - target \(i\)
score.combine <- calculateActivity(expMatrix = GeneExpressionMatrix,
regulon = regulon.w,
mode = "weight",
method = "weightedMean",
exp_assay = "normalizedCounts",
normalize = FALSE)
#> Warning in calculateActivity(expMatrix = GeneExpressionMatrix, regulon =
#> regulon.w, : Argument 'method' to calculateActivity was deprecated as of
#> epiregulon version 2.0.0
#> calculating TF activity from regulon using weightedMean
#> Warning in calculateActivity(expMatrix = GeneExpressionMatrix, regulon = regulon.w, : The weight column contains multiple subcolumns but no cluster information was provided.
#> Using first column to compute activity...
#> aggregating regulons...
#> creating weight matrix...
#> calculating activity scores...
#> normalize by the number of targets...
score.combine[1:5,1:5]
#> 5 x 5 sparse Matrix of class "dgCMatrix"
#> reprogram#TTAGGAACAAGGTACG-1 reprogram#GAGCGGTCAACCTGGT-1
#> AR 0.01798157 0.02027795
#> FOXA1 0.01648708 0.01790651
#> FOXA2 0.01721087 0.02088793
#> GATA6 0.01547510 0.04891365
#> NKX2-1 0.02852576 0.02059250
#> reprogram#TTATAGCCACCCTCAC-1 reprogram#TGGTGATTCCTGTTCA-1
#> AR 0.011354193 0.011450435
#> FOXA1 0.011696228 0.012328340
#> FOXA2 0.013253208 0.017337749
#> GATA6 0.009573275 0.008097493
#> NKX2-1 0.012773808 0.012261809
#> reprogram#TCGGTTCTCACTAGGT-1
#> AR 0.01819436
#> FOXA1 0.01631431
#> FOXA2 0.01483648
#> GATA6 0.01198980
#> NKX2-1 0.02642440sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.79.1
#> [3] rtracklayer_1.69.1 BiocIO_1.21.0
#> [5] Biostrings_2.79.2 XVector_0.51.0
#> [7] GenomeInfoDb_1.47.0 scMultiome_1.11.0
#> [9] MultiAssayExperiment_1.37.1 ExperimentHub_3.1.0
#> [11] AnnotationHub_4.1.0 BiocFileCache_3.1.0
#> [13] dbplyr_2.5.1 epiregulon_2.1.2
#> [15] SingleCellExperiment_1.33.0 SummarizedExperiment_1.41.0
#> [17] Biobase_2.71.0 GenomicRanges_1.63.0
#> [19] Seqinfo_1.1.0 IRanges_2.45.0
#> [21] S4Vectors_0.49.0 BiocGenerics_0.57.0
#> [23] generics_0.1.4 MatrixGenerics_1.23.0
#> [25] matrixStats_1.5.0 BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 sys_3.4.3
#> [3] jsonlite_2.0.0 magrittr_2.0.4
#> [5] ggbeeswarm_0.7.2 farver_2.1.2
#> [7] rmarkdown_2.30 vctrs_0.6.5
#> [9] memoise_2.0.1 Rsamtools_2.27.0
#> [11] RCurl_1.98-1.17 htmltools_0.5.8.1
#> [13] S4Arrays_1.11.0 curl_7.0.0
#> [15] BiocNeighbors_2.5.0 Rhdf5lib_1.33.0
#> [17] SparseArray_1.11.1 rhdf5_2.55.7
#> [19] sass_0.4.10 bslib_0.9.0
#> [21] httr2_1.2.1 cachem_1.1.0
#> [23] buildtools_1.0.0 GenomicAlignments_1.47.0
#> [25] igraph_2.2.1 lifecycle_1.0.4
#> [27] pkgconfig_2.0.3 rsvd_1.0.5
#> [29] Matrix_1.7-4 R6_2.6.1
#> [31] fastmap_1.2.0 digest_0.6.38
#> [33] TFMPvalue_0.0.9 AnnotationDbi_1.73.0
#> [35] scater_1.39.0 dqrng_0.4.1
#> [37] irlba_2.3.5.1 RSQLite_2.4.4
#> [39] beachmat_2.27.0 seqLogo_1.77.0
#> [41] filelock_1.0.3 labeling_0.4.3
#> [43] httr_1.4.7 abind_1.4-8
#> [45] compiler_4.5.2 bit64_4.6.0-1
#> [47] withr_3.0.2 S7_0.2.0
#> [49] backports_1.5.0 BiocParallel_1.45.0
#> [51] viridis_0.6.5 DBI_1.2.3
#> [53] HDF5Array_1.39.0 rappdirs_0.3.3
#> [55] DelayedArray_0.37.0 bluster_1.21.0
#> [57] rjson_0.2.23 gtools_3.9.5
#> [59] caTools_1.18.3 tools_4.5.2
#> [61] vipor_0.4.7 beeswarm_0.4.0
#> [63] glue_1.8.0 h5mread_1.3.0
#> [65] restfulr_0.0.16 rhdf5filters_1.23.0
#> [67] grid_4.5.2 checkmate_2.3.3
#> [69] cluster_2.1.8.1 TFBSTools_1.49.0
#> [71] gtable_0.3.6 metapod_1.19.0
#> [73] BiocSingular_1.27.0 ScaledMatrix_1.19.0
#> [75] ggrepel_0.9.6 BiocVersion_3.23.1
#> [77] pillar_1.11.1 motifmatchr_1.31.1
#> [79] limma_3.67.0 dplyr_1.1.4
#> [81] lattice_0.22-7 bit_4.6.0
#> [83] tidyselect_1.2.1 DirichletMultinomial_1.53.0
#> [85] locfit_1.5-9.12 maketools_1.3.2
#> [87] scuttle_1.21.0 knitr_1.50
#> [89] gridExtra_2.3 scrapper_1.5.2
#> [91] edgeR_4.9.0 xfun_0.54
#> [93] statmod_1.5.1 UCSC.utils_1.7.0
#> [95] yaml_2.3.10 evaluate_1.0.5
#> [97] codetools_0.2-20 cigarillo_1.1.0
#> [99] tibble_3.3.0 BiocManager_1.30.26
#> [101] cli_3.6.5 jquerylib_0.1.4
#> [103] Rcpp_1.1.0 png_0.1-8
#> [105] XML_3.99-0.20 parallel_4.5.2
#> [107] ggplot2_4.0.0 blob_1.2.4
#> [109] scran_1.39.0 bitops_1.0-9
#> [111] pwalign_1.7.0 viridisLite_0.4.2
#> [113] scales_1.4.0 purrr_1.2.0
#> [115] crayon_1.5.3 rlang_1.1.6
#> [117] KEGGREST_1.51.0