--- title: "Hi-C Workflow with linkSet" author: - name: Gilbert Han affiliation: Your Affiliation email: GilbertHan1011@gmail.com date: "`r Sys.Date()`" package: linkSet output: BiocStyle::html_document: toc: true toc_float: collapsed: false smooth_scroll: true toc_depth: 2 number_sections: true theme: flatly highlight: tango fig_width: 7 fig_height: 5 vignette: > %\VignetteIndexEntry{Hi-C Workflow with linkSet} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.wide = TRUE, fig.align = "center", fig.retina = 2, warning = FALSE, message = FALSE, cache = FALSE, echo = TRUE ) suppressPackageStartupMessages({ library(BiocStyle) }) ``` `r BiocStyle::doc_date()` `r BiocStyle::pkg_ver('linkSet')` # Introduction This vignette provides a step-by-step guide to using the `r Biocpkg("linkSet")` package to analyze Hi-C/HiChIP data. We will use a toy example to illustrate the main functions and workflows. Our goal is to identify the enhancer-gene links in this example. We will use the following datasets as input: 1. validPairs produced by [HiC-Pro](https://github.com/nservant/HiC-Pro). 2. Mouse embryo body enhancer data from enhancer atlas [website](http://www.enhanceratlas.org/). 3. Gene annotation data from `r Biocpkg("TxDb.Mmusculus.UCSC.mm10.knownGene")` package. We highly recommend you to use custom data instead of the example data provided in this vignette. # Setup ```{r installation, eval = FALSE} if (!require("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } ``` ```{r load-packages} suppressPackageStartupMessages({ library(linkSet) # Load additional packages only if available if (requireNamespace("TxDb.Mmusculus.UCSC.mm10.knownGene", quietly = TRUE)) { library(TxDb.Mmusculus.UCSC.mm10.knownGene) } if (requireNamespace("org.Mm.eg.db", quietly = TRUE)) { library(org.Mm.eg.db) } if (requireNamespace("Organism.dplyr", quietly = TRUE)) { library(Organism.dplyr) } if (requireNamespace("InteractionSet", quietly = TRUE)) { library(InteractionSet) } if (requireNamespace("rtracklayer", quietly = TRUE)) { library(rtracklayer) } }) ``` We use our custom function `readvalidPairs` to load the [example data](https://data.4dnucleome.org/files-processed/4DNFI9DUBGCW/). Firstly, we need to load into [GInteractions object](https://bioconductor.org/packages/release/bioc/vignettes/InteractionSet/inst/doc/GInteractions.html). ```{r setup-data} hic_file <- system.file("extdata", "toyExample.pair.gz", package = "linkSet" ) gi <- readvalidPairs(hic_file, format = "pair") promoterGr <- tryCatch( { withTxDb("mm10", function(src) { genes <- Organism.dplyr::genes(src, columns = "symbol") IRanges::promoters(genes, upstream = 10000) }) }, error = function(e) { # Fallback approach using TxDb directly if (requireNamespace("TxDb.Mmusculus.UCSC.mm10.knownGene", quietly = TRUE) && requireNamespace("org.Mm.eg.db", quietly = TRUE)) { txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene::TxDb.Mmusculus.UCSC.mm10.knownGene gene_ranges <- GenomicFeatures::genes(txdb) # Add gene symbols gene_ids <- names(gene_ranges) symbols <- AnnotationDbi::select(org.Mm.eg.db::org.Mm.eg.db, keys = gene_ids, columns = "SYMBOL", keytype = "ENTREZID" ) symbol_map <- setNames(symbols$SYMBOL, symbols$ENTREZID) mcols(gene_ranges)$symbol <- symbol_map[names(gene_ranges)] genes <- gene_ranges[!is.na(mcols(gene_ranges)$symbol)] IRanges::promoters(genes, upstream = 10000) } else { stop("Required packages not available for mm10 annotation") } } ) file_path <- system.file("extdata", "Embryo_body.bed.gz", package = "linkSet") enhancer <- rtracklayer::import(file_path, format = "BED") ``` Because the hic data only contains digist end, so we resize the region to upstream 5kb and downstream 5kb. After that, we use `baitGInteractions` to generate the `linkSet` object. ```{r create-linkset} gi <- resize(gi, 10000, fix = "center") ls <- baitGInteractions(gi, promoterGr, enhancer, geneSymbol = "symbol") ls ``` When we print the `linkSet` object, we can see the basic information of the `linkSet` object. By default, we don't show the bait region. But you are interested in the bait region, you can set `showBaitRegion = TRUE`. ```{r show-linkset} showLinkSet(ls, baitRegion = TRUE) ``` # Diagnose and filter links Now, we can run diagnoseLinkSet to check the distance distribution and inter/intra interaction percentage. First, let's annotate the promoter regions with genome information: ```{r annotate-and-diagnose} ls <- suppressWarnings(annotatePromoter(ls, genome = "mm10", upstream = 1000)) diagnoseLinkSet(ls) ``` Intrachromosomal interaction and long distance interaction are likely be noise, so we filter them. ```{r filter-links} ls <- countInteractibility(ls) ls <- filterLinks(ls, filter_intra = TRUE, filter_unannotate = TRUE, distance = 50000000) ``` Duplicated links are associated with contact frequency, so it's a good idea to count duplicated links. ```{r count-and-order} ls <- countInteractions(ls) orderLinks(ls, by = "count", decreasing = TRUE) ``` We can notice that there is a significant link strength between `Sulf1` and `chr1:12785091-12785750`. # Cross gene links and visualization Enhancers that regulate multiple genes are biologically meaningful. ```{r cross-gene-analysis} ls <- crossGeneEnhancer(ls) ls <- orderLinks(ls, by = "crossFreq", decreasing = TRUE) ls ``` We can use `plot_genomic_ranges` to visualize the cross gene links. ```{r plot-genomic-ranges} plot_genomic_ranges(ls, showOE = oe(ls)[1]) ``` We can also choose to visualze the bait center region. ```{r plot-bait-region} # Check if regionsBait exists and has names if (!is.null(regionsBait(ls)) && !is.null(names(regionsBait(ls)))) { # Get available bait names from regionsBait available_baits <- unique(names(regionsBait(ls))) if (length(available_baits) > 0) { # Use the first available bait for demonstration plot_genomic_ranges(ls, showBait = available_baits[1]) } else { cat("No named bait regions available for plotting") } } else { # If no regionsBait, just plot without showBait plot_genomic_ranges(ls) } ``` # Session Information ```{r session-info} sessionInfo() ```