% \VignetteIndexEntry{Reporting on microarray differential expression} % \VignetteDepends{} % \VignetteKeywords{reprise} % \VignettePackage{ReportingTools} \documentclass[10pt]{article} \usepackage{times} \usepackage{hyperref} \usepackage{Sweave} \textwidth=6.5in \textheight=8.5in \oddsidemargin=-.1in \evensidemargin=-.1in \headheight=-.3in \title{Using ReportingTools in an Analysis of Microarray Data} \author{Jason A. Hackney and Jessica L. Larson} \date{\today} \begin{document} \maketitle \tableofcontents \newpage \section{Introduction} The \tt ReportingTools \rm package is particularly useful in displaying results from a microarray experiment. In this vignette we show how to display results from differential gene expression, Gene Ontology (GO), protein families (PFAM) and gene set enrichment analyses. In the final section, we create an index page where the user can easily access any of these results. \section{Differential expression analysis using limma} For this vignette we will examine the \tt ALL \rm dataset. First we load our \tt ReportingTools \rm package and the data. This dataset is from a clinical trial in acute lymphoblastic leukemia (ALL) and is available from Bioconductor. <>= library(ReportingTools) library(ALL) library(hgu95av2.db) library(genefilter) data(ALL) @ We will compare the gene expression between the \tt BCR/ABL \rm and \tt NEG \rm samples. We use \tt featureFilter \rm to remove most of the unexpressed genes. <>= ALL <- ALL[, ALL$mol.biol %in% c('NEG','BCR/ABL') & !is.na(ALL$sex)] ALL$mol.biol <- factor(ALL$mol.biol, levels = c('NEG', 'BCR/ABL')) ALL <- featureFilter(ALL) @ Next we use \tt limma \rm to find statistical evidence of differentially expressed genes. <>= library(limma) model <- model.matrix(~mol.biol+sex, ALL) fit <- eBayes(lmFit(ALL, model)) @ With the \tt limma \rm output we can make our differential analysis report. To publish \tt MArrayLM \rm objects, we supply the \tt eSet \rm and \tt factor \rm used in our analysis. <>= library(lattice) rep.theme <- reporting.theme() lattice.options(default.theme = rep.theme) deReport <- HTMLReport(shortName = 'de_analysis', title = 'Analysis of BCR/ABL translocation differential expression', reportDirectory = "./reports") publish(fit, deReport, eSet=ALL, factor=ALL$mol.biol, coef=2, n=100) finish(deReport) @ The resulting output is displayed on an .html page and includes several statistics of interest as well as an image of the data. \begin{figure} \centering \includegraphics[scale = .5]{microarray1.png} \caption{Resulting page created by \tt publish \rm for \tt fit \rm.} \end{figure} If we want to change our images, we can do so with \tt .modifyDF \rm (see the basic vignette for more examples of how to use this feature). In this example, we make lattice plots of the expression of each gene in our table stratified by \tt mol.biol \rm and \tt sex \rm. Note that \tt .modifyDF \rm uses the basic data frame (output from \tt topTable \rm ) as its default object and then modifies it with the corresponding function. To modify the decorated \tt ReportingTools\rm \tt data.frame \rm, let \tt .modifyDF=list(modifyReportDF, makeNewImages) \rm. <>= library(hwriter) makeNewImages <- function(df,...){ imagename <- c() for (i in 1:nrow(df)){ probeId <- df$ProbeId[i] y_at <- pretty(exprs(ALL)[probeId,]) y_labels <- formatC(y_at, digits = 1, format = 'f') imagename[i] <- paste0("plot", probeId, ".png") png(filename = paste0("./reports/figuresde_analysis/", imagename[i])) print(stripplot(exprs(ALL)[probeId,]~ALL$mol.biol|ALL$sex)) dev.off() } df$Image <- hwriteImage(paste0("figuresde_analysis/", imagename), link=paste0("figuresde_analysis/", imagename), table=FALSE, width=100) return(df) } deReport2 <- HTMLReport(shortName='de_analysis2', title = 'Analysis of BCR/ABL translocation differential expression with new plots', reportDirectory = "./reports") publish(fit, deReport2, eSet = ALL, factor = ALL$mol.biol, coef=2, n=100, ##.modifyDF=list(modifyReportDF, makeNewImages) ) ##to add new images to default RT output .modifyDF=makeNewImages) finish(deReport2) @ \begin{figure} \centering \includegraphics[scale = .5]{microarray2.png} \caption{Resulting page created by \tt makeNewImages \rm} \end{figure} \section{GO analysis using GOstats} In this section, we show how to use \tt ReportingTools \rm to \tt publish \rm a GO analysis to an html file. First we select the top 100 differential genes and then run the \tt hyperGTest \rm from the \tt GOstats \rm package. <>= library(GOstats) tt <- topTable(fit, coef = 2, n = 100) selectedIDs <- unlist(mget(rownames(tt), hgu95av2ENTREZID)) universeIDs <- unlist(mget(featureNames(ALL), hgu95av2ENTREZID)) goParams <- new("GOHyperGParams", geneIds = selectedIDs, universeGeneIds = universeIDs, annotation = annotation(ALL), ontology = "BP", pvalueCutoff = 0.01, conditional = TRUE, testDirection = "over") goResults <- hyperGTest(goParams) @ With these results, we can then make the GO report. We must supply \tt publish \rm with the genes of interest and the species annotation for this dataset. The default p-value cutoff is 0.01 and the minimum category size is 10 genes. <>= goReport <- HTMLReport(shortName = 'go_analysis', title = 'GO analysis of BCR/ABL translocation', reportDirectory = "./reports") publish(goResults, goReport) finish(goReport) @ The resulting output is displayed on an .html page and includes several statistics of interest as well as an image of the overlap for each category. \begin{figure} \centering \includegraphics[scale = .5]{microarray3.png} \caption{Resulting page created by \tt publish \rm for \tt goResults\rm.} \end{figure} \section{PFAM analysis} In this section, we show how to use \tt ReportingTools \rm to write a table of PFAM analysis results to an html file. First we run the \tt hyperGTest \rm from the \tt Category \rm package. <>= library(Category) pfamParams <- new("PFAMHyperGParams", geneIds = selectedIDs, universeGeneIds = universeIDs, annotation = annotation(ALL), pvalueCutoff = 0.01, testDirection = "over") PFAMResults <- hyperGTest(pfamParams) @ Then we make the PFAM report. Again we supply \tt publish \rm with the genes of interest and the species annotation for this dataset. We set the minimum category size to 3 genes. <>= PFAMReport <- HTMLReport(shortName = 'pfam_analysis', title = 'PFAM analysis of BCR/ABL translocation', reportDirectory = "./reports") publish(PFAMResults, PFAMReport, categorySize = 3) finish(PFAMReport) @ The resulting output is displayed on an .html page and includes several statistics of interest as well as an image of the overlap for each category. \section{GSEA analysis} In this section we show how to use \tt publish \rm to display \tt GeneSetCollection \rm objects and their corresponding gene set enrichment statistics. For this example, we will randomly select our gene sets and create our collection. <>= library(GSEAlm) library(GSEABase) mapped_genes <- mappedkeys(org.Hs.egSYMBOL) eidsAndSymbols <- as.list(org.Hs.egSYMBOL[mapped_genes]) geneEids <- names(eidsAndSymbols) set.seed(123) set1 <- GeneSet(geneIds=sample(geneEids, 100, replace=FALSE), setName="set1", shortDescription = "This is set1") set2 <- GeneSet(geneIds=sample(geneEids, 10, replace=FALSE), setName="set2", shortDescription = "This is set2") set3 <- GeneSet(geneIds=sample(geneEids, 37, replace=FALSE), setName="set3", shortDescription = "This is set3") set4 <- GeneSet(geneIds=sample(geneEids, 300, replace=FALSE), setName="set4", shortDescription = "This is set4") geneSets <- GeneSetCollection(c(set1, set2, set3, set4)) @ We can now make a very simple \tt GeneSetCollection \rm html table with \tt ReportingTools \rm. <>= geneSetsReport <- HTMLReport(shortName = "gene_sets", title = "Gene Sets", reportDirectory = "./reports") publish(geneSets, geneSetsReport, annotation.db = "org.Hs.eg") finish(geneSetsReport) @ The resulting output is displayed on an .html page and includes the gene sets and links to pages listing the genes within the corresponding set. Often, investigators would like more information about the enrichment of certain gene sets. Thus, we will proceed with gene set enrichment analysis (GSEA). To begin, we determine the overlap between our sets and our genes of interest by creating an incidence matrix. <>= mat <- matrix(data=0, ncol=length(universeIDs),nrow=length(geneSets)) for(i in 1:length(geneSets)){ geneIdEntrez <- unlist(geneIds(geneSets[[i]])) mat[i,match(geneIdEntrez, universeIDs)] <- 1 } colnames(mat) <- universeIDs rownames(mat) <- sapply(geneSets, function(x) x@setName) @ Now we can run the GSEA and obtain set-specific statistics and p-values. <>= lm <- lmPerGene(ALL, ~mol.biol+sex, na.rm=TRUE) GSNorm <- GSNormalize(lm$tstat[2,], mat) #one-sided p-values pVals <- gsealmPerm(ALL,~mol.biol+sex, mat, nperm=100) bestPval <- apply(pVals, 1, min) @ We can add these statistics to our report page. <>= gseaReport <- HTMLReport(shortName = "gsea_analysis", title = "GSEA analysis", reportDirectory = "./reports") publish(geneSets, gseaReport, annotation.db = "org.Hs.eg", setStats = GSNorm, setPValues = 2*bestPval) finish(gseaReport) @ The resulting output is displayed on an .html page and includes our set statistics and p-values. Links to set-specific pages are also created. \begin{figure} \centering \includegraphics[scale = .5]{microarray4.png} \caption{Resulting updated page created by \tt publish \rm for \tt geneSets \rm after we include the set statistics and p-values.} \end{figure} \begin{figure} \centering \includegraphics[scale = .5]{microarray5.png} \caption{The set-specific page.} \end{figure} We can also add the same statistics via \tt .modifyDF \rm. As demonstrated in the basic vignette, .modifyDF allows us to manipulate the output published to our html pages. <>= runGSEA <- function(df,...){ mat <- matrix(data = 0, ncol = length(universeIDs), nrow = length(geneSets)) for(i in 1:length(geneSets)){ geneIdEntrez <- unlist(geneIds(geneSets[[i]])) mat[i,match(geneIdEntrez, universeIDs)] <- 1 } colnames(mat) <- universeIDs rownames(mat) <- sapply(geneSets, function(x) x@setName) lm <- lmPerGene(ALL, ~mol.biol+sex, na.rm=TRUE) GSNorm <- GSNormalize(lm$tstat[2,], mat) pVals <- gsealmPerm(ALL,~mol.biol+sex, mat, nperm = 100) bestPval <- apply(pVals,1, min) df <- cbind(df, GSNorm, bestPval) return(df) } gseaReport2 <- HTMLReport(shortName = "gsea_analysis2", title = "GSEA analysis", reportDirectory = "./reports") publish(geneSets, gseaReport2, annotation.db = "org.Hs.eg", .modifyDF = runGSEA) finish(gseaReport2) @ \section{Putting it all together} We now make an index page to put all the output together. <>= indexPage <- HTMLReport(shortName = "index", title = "Analysis of ALL Gene Expression", reportDirectory = "./reports") publish(Link(list(deReport, goReport), report = indexPage), indexPage) publish(Link(PFAMReport, report = indexPage), indexPage) publish(Link("GSEA report has a new title", "gsea_analysis.html"), indexPage) finish(indexPage) @ \begin{figure} \centering \includegraphics[scale = .5]{microarray6.png} \caption{The page created from calling \tt publish \rm on all of our previous pages.} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{References} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Huntley, M.A., Larson, J.L., Chaivorapol, C., Becker, G., Lawrence, M., Hackney, J.A., and J.S. Kaminker. (2013). ReportingTools: an automated results processing and presentation toolkit for high throughput genomic analyses. {\it Bioinformatics}. {\bf 29}(24): 3220-3221. \end{document}