--- title: "An introduction to Rbowtie" date: "`r format(Sys.time(), '%d %B, %Y')`" bibliography: Rbowtie-refs.bib author: - Michael Stadler - Dimos Gaidatzis - Anita Lerch package: Rbowtie output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{An introduction to Rbowtie} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction The `r Biocpkg("Rbowtie")` package provides an **R** wrapper around the popular *bowtie* [@bowtie] short read aligner and around *SpliceMap* [@SpliceMap] a *de novo* splice junction discovery and alignment tool, which makes use of the *bowtie* software package. The package is used by the `r Biocpkg("QuasR")` [@QuasR] bioconductor package to _qu_antify and _a_nnotate _s_hort _r_eads. We recommend to use the `r Biocpkg("QuasR")` package instead of using `r Biocpkg("Rbowtie")` directly. The `r Biocpkg("QuasR")` package provides a simpler interface than `r Biocpkg("Rbowtie")` and covers the whole analysis workflow of typical ultra-high throughput sequencing experiments, starting from the raw sequence reads, over pre-processing and alignment, up to quantification. # Preliminaries ## Citing *Rbowtie* If you use `r Biocpkg("Rbowtie")` [@Rbowtie] in your work, you can cite it as follows: ```{r cite, eval=TRUE} citation("Rbowtie") ``` ## Installation `r Biocpkg("Rbowtie")` is a package for the **R** computing environment and it is assumed that you have already installed **R**. See the **R** project at (http://www.r-project.org). To install the latest version of `r Biocpkg("Rbowtie")`, you will need to be using the latest version of **R**. `r Biocpkg("Rbowtie")` is part of the Bioconductor project at (http://www.bioconductor.org). To get `r Biocpkg("Rbowtie")` together with its dependencies you can use ```{r install, eval=FALSE} if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("Rbowtie") ``` ## Loading of *Rbowtie* In order to run the code examples in this vignette, the `r Biocpkg("Rbowtie")` library need to be loaded. ```{r loadLibraries, eval=TRUE} library(Rbowtie) ``` ## How to get help Most questions about `r Biocpkg("Rbowtie")` will hopefully be answered by the documentation or references. If you've run into a question which isn't addressed by the documentation, or you've found a conflict between the documentation and software itself, then there is an active support community which can offer help. The authors of the package (maintainer: `r maintainer("Rbowtie")`) always appreciate receiving reports of bugs in the package functions or in the documentation. The same goes for well-considered suggestions for improvements. Any other questions or problems concerning `r Biocpkg("Rbowtie")` should be posted to the Bioconductor support site (https://support.bioconductor.org). Users posting to the support site for the first time should read the helpful posting guide at (https://support.bioconductor.org/info/faq/). Note that each function in `r Biocpkg("Rbowtie")` has it's own help page, e.g. `help("bowtie")`. Posting etiquette requires that you read the relevant help page carefully before posting a problem to the site. # Example usage for individual Rbowtie functions Please refer to the `r Biocpkg("Rbowtie")` reference manual or the function documentation (e.g. using `?bowtie`) for a complete description of `r Biocpkg("Rbowtie")` functions. The descriptions provided below are meant to give and overview over all functions and summarize the purpose of each one. ## Build the reference index with `bowtie_build`{#bowtieBuild} To be able to align short reads to a genome, an index has to be build first using the function `bowtie_build`. Information about arguments can be found with the help of the `bowtie_build_usage` function or in the manual page `?bowtie_build`. ```{r bowtieBuildUsage, eval=TRUE} bowtie_build_usage() ``` `refFiles` below is a vector with filenames of the reference sequence in `FASTA` format, and `indexDir` specifies an output directory for the index files that will be generated when calling `bowtie_build`: ```{r bowtieBuild, eval=TRUE} refFiles <- dir(system.file(package="Rbowtie", "samples", "refs"), full=TRUE) indexDir <- file.path(tempdir(), "refsIndex") tmp <- bowtie_build(references=refFiles, outdir=indexDir, prefix="index", force=TRUE) head(tmp) ``` ## Create alignment with `bowtie` Information about the arguments supported by the `bowtie` function can be obtained with the help of the `bowtie_usage` function or in the manual page `?bowtie`. ```{r bowtieUsage, eval=TRUE} bowtie_usage() ``` In the example below, `readsFiles` is the name of a file containing short reads to be aligned with `bowtie`, and `samFiles` specifies the name of the output file with the generated alignments. ```{r bowtie, eval=TRUE} readsFiles <- system.file(package="Rbowtie", "samples", "reads", "reads.fastq") samFiles <- file.path(tempdir(), "alignments.sam") bowtie(sequences=readsFiles, index=file.path(indexDir, "index"), outfile=samFiles, sam=TRUE, best=TRUE, force=TRUE) strtrim(readLines(samFiles), 65) ``` ## Create spliced alignment with `SpliceMap` While `bowtie` only generates ungapped alignments, the `SpliceMap` function can be used to generate spliced alignments. `SpliceMap` is itself using `bowtie`. To use it, it is necessary to create an index of the reference sequence as described in \@ref(bowtieBuild). `SpliceMap` parameters are specified in the form of a named list, which follows closely the configure file format of the original `SpliceMap` program[@SpliceMap]. Be aware that `SpliceMap` can only be used for reads that are at least 50bp long. ```{r SpliceMap, eval=TRUE} readsFiles <- system.file(package="Rbowtie", "samples", "reads", "reads.fastq") refDir <- system.file(package="Rbowtie", "samples", "refs", "chr1.fa") indexDir <- file.path(tempdir(), "refsIndex") samFiles <- file.path(tempdir(), "splicedAlignments.sam") cfg <- list(genome_dir=refDir, reads_list1=readsFiles, read_format="FASTQ", quality_format="phred-33", outfile=samFiles, temp_path=tempdir(), max_intron=400000, min_intron=20000, max_multi_hit=10, seed_mismatch=1, read_mismatch=2, num_chromosome_together=2, bowtie_base_dir=file.path(indexDir, "index"), num_threads=4, try_hard="yes", selectSingleHit=TRUE) res <- SpliceMap(cfg) res strtrim(readLines(samFiles), 65) ``` # Session information The output in this vignette was produced under: ```{r sessionInfo} sessionInfo() ``` # References