biomaRt 2.48.3
In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. The biomaRt package, provides an interface to a growing collection of databases implementing the BioMart software suite. The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. Examples of BioMart databases are Ensembl, Uniprot and HapMap. These major databases give biomaRt users direct access to a diverse set of data and enable a wide range of powerful online queries from R.
There are a small number of non-Ensembl databases that offer a BioMart interface to their data. The biomaRt package can be used to access these in a very similar fashion to Ensembl. The majority of biomaRt functions will work in the same manner, but the construction of the initial Mart object requires slightly more setup. In this section we demonstrate the setting requires to query Wormbase ParaSite and Phytozome. First we need to load biomaRt.
library(biomaRt)To demonstrate the use of the biomaRt package with non-Ensembl databases the next query is performed using the Wormbase ParaSite BioMart. In this example, we use the listMarts() function to find the name of the available marts, given the URL of Wormbase. We use this to connect to Wormbase BioMart using the useMart() function.1 Note that we use the https address and must provide the port as 443. Queries to WormBase will fail without these options.
listMarts(host = "parasite.wormbase.org")##         biomart      version
## 1 parasite_mart WBPS 15 Martwormbase <- useMart(biomart = "parasite_mart", 
                    host = "https://parasite.wormbase.org", 
                    port = 443)We can then use functions described earlier in this vignette to find and select the gene dataset, and print the first 6 available attributes and filters. Then we use a list of gene names as filter and retrieve associated transcript IDs and the transcript biotype.
listDatasets(wormbase)##     dataset          description version
## 1 wbps_gene All Species (WBPS15)      15wormbase <- useDataset(mart = wormbase, dataset = "wbps_gene")
head(listFilters(wormbase))##                  name     description
## 1     species_id_1010          Genome
## 2 nematode_clade_1010  Nematode Clade
## 3     chromosome_name Chromosome name
## 4               start           Start
## 5                 end             End
## 6              strand          Strandhead(listAttributes(wormbase))##                      name        description         page
## 1          species_id_key      Internal Name feature_page
## 2    production_name_1010     Genome project feature_page
## 3       display_name_1010        Genome name feature_page
## 4        taxonomy_id_1010        Taxonomy ID feature_page
## 5 assembly_accession_1010 Assembly accession feature_page
## 6     nematode_clade_1010     Nematode clade feature_pagegetBM(attributes = c("external_gene_id", "wbps_transcript_id", "transcript_biotype"), 
      filters = "gene_name", 
      values = c("unc-26","his-33"), 
      mart = wormbase)##   external_gene_id wbps_transcript_id transcript_biotype
## 1           his-33         F17E9.13.1     protein_coding
## 2           unc-26          JC8.10a.1     protein_coding
## 3           unc-26          JC8.10a.2     protein_coding
## 4           unc-26          JC8.10b.1     protein_coding
## 5           unc-26          JC8.10c.1     protein_coding
## 6           unc-26          JC8.10c.2     protein_coding
## 7           unc-26          JC8.10d.1     protein_codingThe Phytozome BioMart can be accessed with the following settings. Note that we use the https address. Queries to Phytozome will fail without specifying this.
listMarts(host = "https://phytozome.jgi.doe.gov")##                               biomart                            version
## 1                      phytozome_mart           V13 Genomes and Families
## 2            phytozome_diversity_mart               V13 Genome Diversity
## 3              phytozome_mart_archive          Genome Archive - All Data
## 4 phytozome_mart_archive_unrestricted Genome Archive - Unrestricted Dataphytozome <- useMart(biomart = "phytozome_mart", 
                host = "https://phytozome.jgi.doe.gov")
listDatasets(phytozome)##              dataset            description version
## 1 brachypan_clusters     BrachyPan Families        
## 2          phytozome  Phytozome V13 Genomes        
## 3 phytozome_clusters Phytozome V13 Familiesphytozome <- useDataset(mart = phytozome, dataset = "phytozome")Once this is set up the usual biomaRt functions can be used to interogate the database options and run queries.
getBM(attributes = c("organism_name", "gene_name1"), 
      filters = "gene_name_filter", 
      values = "82092", 
      mart = phytozome)##     organism_name gene_name1
## 1 Smoellendorffii      82092Version 13 of Phyotozome can be found at https://phytozome-next.jgi.doe.gov/ and if you wish to query that version the URL used to create the Mart object must reflect that.
phytozome_v13 <- useMart(biomart = "phytozome_mart", 
                dataset = "phytozome", 
                host = "https://phytozome-next.jgi.doe.gov")sessionInfo()## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB             
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] biomaRt_2.48.3   BiocStyle_2.20.2
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.7             prettyunits_1.1.1      png_0.1-7              Biostrings_2.60.2     
##  [5] assertthat_0.2.1       digest_0.6.27          utf8_1.2.2             BiocFileCache_2.0.0   
##  [9] R6_2.5.0               GenomeInfoDb_1.28.1    stats4_4.1.0           RSQLite_2.2.7         
## [13] evaluate_0.14          highr_0.9              httr_1.4.2             pillar_1.6.2          
## [17] zlibbioc_1.38.0        rlang_0.4.11           progress_1.2.2         curl_4.3.2            
## [21] jquerylib_0.1.4        blob_1.2.2             S4Vectors_0.30.0       rmarkdown_2.10        
## [25] stringr_1.4.0          RCurl_1.98-1.3         bit_4.0.4              compiler_4.1.0        
## [29] xfun_0.25              pkgconfig_2.0.3        BiocGenerics_0.38.0    htmltools_0.5.1.1     
## [33] tidyselect_1.1.1       KEGGREST_1.32.0        tibble_3.1.3           GenomeInfoDbData_1.2.6
## [37] bookdown_0.23          codetools_0.2-18       IRanges_2.26.0         XML_3.99-0.6          
## [41] fansi_0.5.0            withr_2.4.2            crayon_1.4.1           dplyr_1.0.7           
## [45] dbplyr_2.1.1           rappdirs_0.3.3         bitops_1.0-7           jsonlite_1.7.2        
## [49] lifecycle_1.0.0        DBI_1.1.1              magrittr_2.0.1         stringi_1.7.3         
## [53] cachem_1.0.5           XVector_0.32.0         xml2_1.3.2             bslib_0.2.5.1         
## [57] filelock_1.0.2         ellipsis_0.3.2         vctrs_0.3.8            generics_0.1.0        
## [61] tools_4.1.0            bit64_4.0.5            Biobase_2.52.0         glue_1.4.2            
## [65] purrr_0.3.4            hms_1.1.0              parallel_4.1.0         fastmap_1.1.0         
## [69] yaml_2.2.1             AnnotationDbi_1.54.1   BiocManager_1.30.16    memoise_2.0.0         
## [73] knitr_1.33             sass_0.4.0warnings()