Seqinfo 1.0.0
Seqinfo is an R/Bioconductor infrastructure package that implements a simple S4 class, the Seqinfo class, for storing the following information about a collection of genomic sequences:
the sequence names a.k.a. the seqnames;
the sequence lengths a.k.a. the seqlengths;
the sequence circularity flags, that is, whether the sequences are circular or not;
the sequence genomes, that is, the genome assembly that each sequence is coming from, if any.
The sequences described in a Seqinfo object are typically, but not necessarily, the chromosomes and/or scaffolds of a specific genome assembly of a given organism.
Note that Seqinfo objects are rarely used as standalone objects. Instead,
they are used as part of higher-level objects to represent their seqinfo()
component. Examples of such higher-level objects are GRanges,
RangedSummarizedExperiment, VCF, GAlignments, TxDb, etc… defined
in other Bioconductor infrastructure packages.
Like any Bioconductor package, Seqinfo should be installed
with BiocManager::install():
if (!require("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("Seqinfo")BiocManager::install() will take care of installing the package dependencies
that are missing.
Load the package:
library(Seqinfo)Seqinfo objects can be created with the Seqinfo() constructor function:
## Note that all the arguments (except 'genome') must have the
## same length. 'genome' can be of length 1, whatever the lengths
## of the other arguments are.
si1 <- Seqinfo(seqnames=c("chr1", "chr2", "chr3", "chrM"),
               seqlengths=c(100, 200, NA, 15),
               isCircular=c(NA, FALSE, FALSE, TRUE),
               genome="toy")
si1## Seqinfo object with 4 sequences (1 circular) from toy genome:
##   seqnames seqlengths isCircular genome
##   chr1            100         NA    toy
##   chr2            200      FALSE    toy
##   chr3             NA      FALSE    toy
##   chrM             15       TRUE    toyOne special form of calling the Seqinfo() constructor function is to
specify only the genome argument and set it to the name of an NCBI
assembly (e.g. Seqinfo(genome="GRCh38.p14")) or UCSC genome (e.g.
Seqinfo(genome="hg38")), in which case the sequence information is
fetched from NCBI or UCSC. This requires the GenomeInfoDb
package (the package only needs to be installed):
library(GenomeInfoDb)  # just making sure that the package is installed
Seqinfo(genome="GRCh38.p14")## Seqinfo object with 709 sequences (1 circular) from GRCh38.p14 genome:
##   seqnames       seqlengths isCircular     genome
##   1               248956422      FALSE GRCh38.p14
##   2               242193529      FALSE GRCh38.p14
##   3               198295559      FALSE GRCh38.p14
##   4               190214555      FALSE GRCh38.p14
##   5               181538259      FALSE GRCh38.p14
##   ...                   ...        ...        ...
##   HSCHR22_8_CTG1     145162      FALSE GRCh38.p14
##   HSCHRX_3_CTG7      188004      FALSE GRCh38.p14
##   HSCHRX_1_CTG14     619716      FALSE GRCh38.p14
##   HSCHRX_2_CTG14     294119      FALSE GRCh38.p14
##   HSCHRX_3_CTG3      330493      FALSE GRCh38.p14Seqinfo(genome="hg38")## Seqinfo object with 711 sequences (1 circular) from hg38 genome:
##   seqnames             seqlengths isCircular genome
##   chr1                  248956422      FALSE   hg38
##   chr2                  242193529      FALSE   hg38
##   chr3                  198295559      FALSE   hg38
##   chr4                  190214555      FALSE   hg38
##   chr5                  181538259      FALSE   hg38
##   ...                         ...        ...    ...
##   chr22_KQ759761v1_alt     145162      FALSE   hg38
##   chrX_KV766199v1_alt      188004      FALSE   hg38
##   chrX_MU273395v1_alt      619716      FALSE   hg38
##   chrX_MU273396v1_alt      294119      FALSE   hg38
##   chrX_MU273397v1_alt      330493      FALSE   hg38See ?Seqinfo for more information.
Various accessor functions are provided:
length(si1)## [1] 4seqnames(si1)## [1] "chr1" "chr2" "chr3" "chrM"names(si1)## [1] "chr1" "chr2" "chr3" "chrM"seqlevels(si1)## [1] "chr1" "chr2" "chr3" "chrM"seqlengths(si1)## chr1 chr2 chr3 chrM 
##  100  200   NA   15isCircular(si1)##  chr1  chr2  chr3  chrM 
##    NA FALSE FALSE  TRUEgenome(si1)##  chr1  chr2  chr3  chrM 
## "toy" "toy" "toy" "toy"See ?Seqinfo for more information.
A Seqinfo object can be subsetted by seqnames:
si1[c("chrY", "chr3", "chr1")]## Seqinfo object with 3 sequences from 2 genomes (NA, toy):
##   seqnames seqlengths isCircular genome
##   chrY             NA         NA   <NA>
##   chr3             NA      FALSE    toy
##   chr1            100         NA    toyRename:
si <- si1
seqlevels(si) <- sub("chr", "ch", seqlevels(si))
si## Seqinfo object with 4 sequences (1 circular) from toy genome:
##   seqnames seqlengths isCircular genome
##   ch1             100         NA    toy
##   ch2             200      FALSE    toy
##   ch3              NA      FALSE    toy
##   chM              15       TRUE    toyReorder:
seqlevels(si) <- rev(seqlevels(si))
si## Seqinfo object with 4 sequences (1 circular) from toy genome:
##   seqnames seqlengths isCircular genome
##   chM              15       TRUE    toy
##   ch3              NA      FALSE    toy
##   ch2             200      FALSE    toy
##   ch1             100         NA    toyDrop/add/reorder:
seqlevels(si) <- c("ch1", "ch2", "chY")
si## Seqinfo object with 3 sequences from 2 genomes (toy, NA):
##   seqnames seqlengths isCircular genome
##   ch1             100         NA    toy
##   ch2             200      FALSE    toy
##   chY              NA         NA   <NA>Rename/reorder/drop/add:
seqlevels(si) <- c(chY="Y", ch1="1", "22")
si## Seqinfo object with 3 sequences from 2 genomes (NA, toy):
##   seqnames seqlengths isCircular genome
##   Y                NA         NA   <NA>
##   1               100         NA    toy
##   22               NA         NA   <NA>See ?Seqinfo for more information.
Two Seqinfo objects can be merged if they are compatible:
si2 <- Seqinfo(seqnames=c("chr3", "chr4", "chrM"),
               seqlengths=c(300, NA, 15))
si2## Seqinfo object with 3 sequences from an unspecified genome:
##   seqnames seqlengths isCircular genome
##   chr3            300         NA   <NA>
##   chr4             NA         NA   <NA>
##   chrM             15         NA   <NA>merge(si1, si2)  # rows for chr3 and chrM are merged## Warning in .merge_two_Seqinfo_objects(x, y): Each of the 2 combined objects has sequence levels not in the other:
##   - in 'x': chr1, chr2
##   - in 'y': chr4
##   Make sure to always combine/compare objects based on the same reference
##   genome (use suppressWarnings() to suppress this warning).## Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
##   seqnames seqlengths isCircular genome
##   chr1            100         NA    toy
##   chr2            200      FALSE    toy
##   chr3            300      FALSE    toy
##   chrM             15       TRUE    toy
##   chr4             NA         NA   <NA>suppressWarnings(merge(si1, si2))## Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
##   seqnames seqlengths isCircular genome
##   chr1            100         NA    toy
##   chr2            200      FALSE    toy
##   chr3            300      FALSE    toy
##   chrM             15       TRUE    toy
##   chr4             NA         NA   <NA>Note that, strictly speaking, merging two Seqinfo objects is not a
commutative operation, i.e., in general z1 <- merge(x, y) is not
identical to z2 <- merge(y, x). However z1 and z2 are guaranteed
to contain the same information (i.e. the same rows, but typically not
in the same order):
suppressWarnings(merge(si2, si1))## Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
##   seqnames seqlengths isCircular genome
##   chr3            300      FALSE    toy
##   chr4             NA         NA   <NA>
##   chrM             15       TRUE    toy
##   chr1            100         NA    toy
##   chr2            200      FALSE    toyTrying to merge Seqinfo objects that are not compatible will raise an error:
## This contradicts what 'x' says about circularity of chr3 and chrM:
isCircular(si2)[c("chr3", "chrM")] <- c(TRUE, FALSE)
si2## Seqinfo object with 3 sequences (1 circular) from an unspecified genome:
##   seqnames seqlengths isCircular genome
##   chr3            300       TRUE   <NA>
##   chr4             NA         NA   <NA>
##   chrM             15      FALSE   <NA>merge(si1, si2)  # ERROR!
## Error in mergeNamedAtomicVectors(isCircular(x), isCircular(y), what = c("sequence",  :
##   sequences chr3, chrM have incompatible circularity flags:
##   - in 'x': FALSE, TRUE
##   - in 'y': TRUE, FALSESee ?Seqinfo for more information.
Seqinfo objects are typically found as part of higher-level objects to
represent their seqinfo() component. For example, in TxDb objects:
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
class(txdb)## [1] "TxDb"
## attr(,"package")
## [1] "GenomicFeatures"seqinfo(txdb)## Seqinfo object with 711 sequences (1 circular) from hg38 genome:
##   seqnames             seqlengths isCircular genome
##   chr1                  248956422      FALSE   hg38
##   chr2                  242193529      FALSE   hg38
##   chr3                  198295559      FALSE   hg38
##   chr4                  190214555      FALSE   hg38
##   chr5                  181538259      FALSE   hg38
##   ...                         ...        ...    ...
##   chr22_KQ759761v1_alt     145162      FALSE   hg38
##   chrX_KV766199v1_alt      188004      FALSE   hg38
##   chrX_MU273395v1_alt      619716      FALSE   hg38
##   chrX_MU273396v1_alt      294119      FALSE   hg38
##   chrX_MU273397v1_alt      330493      FALSE   hg38and in BSgenome objects:
library(BSgenome.Hsapiens.UCSC.hg38)
bsg <- BSgenome.Hsapiens.UCSC.hg38
class(bsg)## [1] "BSgenome"
## attr(,"package")
## [1] "BSgenome"seqinfo(bsg)## Seqinfo object with 711 sequences (1 circular) from hg38 genome:
##   seqnames             seqlengths isCircular genome
##   chr1                  248956422      FALSE   hg38
##   chr2                  242193529      FALSE   hg38
##   chr3                  198295559      FALSE   hg38
##   chr4                  190214555      FALSE   hg38
##   chr5                  181538259      FALSE   hg38
##   ...                         ...        ...    ...
##   chr22_KQ759761v1_alt     145162      FALSE   hg38
##   chrX_KV766199v1_alt      188004      FALSE   hg38
##   chrX_MU273395v1_alt      619716      FALSE   hg38
##   chrX_MU273396v1_alt      294119      FALSE   hg38
##   chrX_MU273397v1_alt      330493      FALSE   hg38Sanity checks:
stopifnot(identical(seqinfo(txdb), Seqinfo(genome="hg38")))
stopifnot(identical(seqinfo(bsg), Seqinfo(genome="hg38")))Here is the output of sessionInfo() on the system on which this document
was compiled:
sessionInfo()## R version 4.5.1 Patched (2025-08-23 r88802)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5       
##  [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.22.0
##  [3] BSgenome_1.78.0                         
##  [4] rtracklayer_1.70.0                      
##  [5] BiocIO_1.20.0                           
##  [6] Biostrings_2.78.0                       
##  [7] XVector_0.50.0                          
##  [8] GenomicFeatures_1.62.0                  
##  [9] AnnotationDbi_1.72.0                    
## [10] Biobase_2.70.0                          
## [11] GenomicRanges_1.62.0                    
## [12] GenomeInfoDb_1.46.0                     
## [13] IRanges_2.44.0                          
## [14] S4Vectors_0.48.0                        
## [15] Seqinfo_1.0.0                           
## [16] BiocGenerics_0.56.0                     
## [17] generics_0.1.4                          
## [18] BiocStyle_2.38.0                        
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.50.0             SummarizedExperiment_1.40.0
##  [3] rjson_0.2.23                xfun_0.53                  
##  [5] bslib_0.9.0                 lattice_0.22-7             
##  [7] vctrs_0.6.5                 tools_4.5.1                
##  [9] bitops_1.0-9                curl_7.0.0                 
## [11] parallel_4.5.1              RSQLite_2.4.3              
## [13] blob_1.2.4                  pkgconfig_2.0.3            
## [15] Matrix_1.7-4                cigarillo_1.0.0            
## [17] lifecycle_1.0.4             compiler_4.5.1             
## [19] Rsamtools_2.26.0            codetools_0.2-20           
## [21] htmltools_0.5.8.1           sass_0.4.10                
## [23] RCurl_1.98-1.17             yaml_2.3.10                
## [25] crayon_1.5.3                jquerylib_0.1.4            
## [27] BiocParallel_1.44.0         cachem_1.1.0               
## [29] DelayedArray_0.36.0         abind_1.4-8                
## [31] digest_0.6.37               restfulr_0.0.16            
## [33] bookdown_0.45               fastmap_1.2.0              
## [35] grid_4.5.1                  SparseArray_1.10.0         
## [37] cli_3.6.5                   S4Arrays_1.10.0            
## [39] XML_3.99-0.19               UCSC.utils_1.6.0           
## [41] bit64_4.6.0-1               rmarkdown_2.30             
## [43] httr_1.4.7                  matrixStats_1.5.0          
## [45] bit_4.6.0                   png_0.1-8                  
## [47] memoise_2.0.1               evaluate_1.0.5             
## [49] knitr_1.50                  rlang_1.1.6                
## [51] DBI_1.2.3                   BiocManager_1.30.26        
## [53] jsonlite_2.0.0              R6_2.6.1                   
## [55] MatrixGenerics_1.22.0       GenomicAlignments_1.46.0