%\VignetteIndexEntry{Simulating molecular regulatory networks using qpgraph} %\VignetteKeywords{graphical Markov model, qp-graph, microarray, network, eQTL} %\VignettePackage{qpgraphSimulate} \documentclass{article} <>= BiocStyle::latex(use.unsrturl=FALSE) @ \usepackage{natbib} \bioctitle[Simulating networks using \Biocpkg{qpgraph}]% {Simulating molecular regulatory networks using {\tt qpgraph}} \author{Inma Tur$^{1,3}$, Alberto Roverato$^2$ and Robert Castelo$^1$} \begin{document} \SweaveOpts{eps=FALSE} \DefineVerbatimEnvironment{Sinput}{Verbatim} {formatcom = {\color{Sinput}}} \DefineVerbatimEnvironment{Soutput}{Verbatim} {formatcom = {\color{Soutput}}} \DefineVerbatimEnvironment{Scode}{Verbatim} {formatcom = {\color{Scode}}} \definecolor{Sinput}{rgb}{0.21,0.49,0.72} \definecolor{Soutput}{rgb}{0.32,0.32,0.32} \definecolor{Scode}{rgb}{0.75,0.19,0.19} <>= pdf.options(useDingbats=FALSE) options(width=80) rm(list=ls()) try(detach("package:mvtnorm"), silent=TRUE) try(detach("package:qtl"), silent=TRUE) @ \maketitle \begin{quote} {\scriptsize 1. Universitat Pompeu Fabra, Barcelona, Spain. \\ 2. Universit\`a di Bologna, Bologna, Italy. \\ 3. Now at Kernel Analytics, Barcelona, Spain. } \end{quote} \section{Introduction} The theoretical substrate used by \Biocpkg{qpgraph} to estimate network models of molecular regulatory interactions is that of graphical Markov models (GMMs). A useful way to understand the underpinnings of these models is to simulate them and simulate data from them. More importantly, these simulated data may serve the purpose of verifying properties of GMM estimation procedures, such as correctness or asymptotic behavior. Here we illustrate the functionalities of \Biocpkg{qpgraph} to perform these simulations. If you use them in your own research, please cite the following article: \begin{quote} Tur, I., Roverato, A. and Castelo, R. Mapping eQTL networks with mixed graphical Markov models. \textit{Genetics}, 198(4):1377-1393, 2014. \end{quote} The interface provided by \Biocpkg{qpgraph} tries to comply with the available functions in the base R \CRANpkg{stats} package for simulating data from probability distributions and the names of functions described below for the purpose of simulating graphs, models and data follow the convention: \begin{quote} r$<\!\!\!objectclass\!\!\!>$($n$, ...) \end{quote} where $<\!\!\!objectclass\!\!\!>$ refers to the class of object (in a broad sense, not just a formal S3 or S4 class) being simulated and $n$ is the number of observations to simulate. Except for the case of data, since the simulated observations are other than R atomic types of objects, when $n>1$, these functions return simulated observations in the form of a list with $n$ elements. \section{Simulation of graphs} An undirected graph $G$ is a mathematical object defined by a pair of sets $G=(V, E)$ where $V=\{1,\dots,p\}$ is the vertex set and $E\subseteq (V\times V)$ is the edge set. In the context of GMMs {\it labeled} undirected graphs are employed to represent conditional (in)dependences among random variables (r.v.'s) $X=\{X_1, \dots, X_p\}$ indexed by the vertices in $V$ whose values occur on equal footing. Stepwise data generating processes can be represented by directed graphs. In the context of GMMs one may also consider so-called {\it marked} graphs, which are graphs with {\it marked} vertices grouped into two subsets, one associated to discrete variables and another to continuous ones. A graph with a single type of vertices, i.e., that is not marked, it is also called {\it pure}. Diferent types of graphs determine different GMM classes. For a comprehensive description of different GMM classes and more elaborate descriptions of the terminology and notation used in this vignette the reader may consult the book of \citet{Lauritzen:1996fj}. The first step to simulate a GMM consists of simulating its associated graph. While there are many R packages that provide procedures to simulate graphs, \Biocpkg{qpgraph} provides its own minimal functionality for this purpose tailored to ease the downstream simulation of GMMs. This functionality allows the user to simulate undirected graphs according to two main criteria, the type of graph (pure or marked) and the type of model to simulate the random graph. The simplest type of model to simulate random undirected graphs is the so-called Erd\H{o}s-R\'enyi which is generated by either including an edge between every pair of vertices with a pre-specified probability or drawing a graph uniformly at random among those with a pre-specified number of edges. In the context of exploring the performance of GMM structure estimation procedures under different degrees of sparseness of the underlying graph, it is handy to work with the so-called $d$-regular graphs \citep{Harary:1969}. These are graphs with a constant degree vertex $d$ which, in one hand, make the graph density a linear function of $d$ and, on the other hand, bound the smallest minimal separator between any two vertices \citep[see pg.~2646]{Castelo:2006yu}. To specify the parameters that define one specific type of graph we want to simulate \Biocpkg{qpgraph} provides the following functions that build parameter objects which can be used afterwards to simulate graphs through a single call to the function \Rfunction{rgraphBAM()}: <<>>= library(qpgraph) args(erGraphParam) args(erMarkedGraphParam) args(dRegularGraphParam) args(dRegularMarkedGraphParam) @ As we can see from their default values, a single call without arguments already define some small graphs on 5 vertices: <<>>= erGraphParam() erMarkedGraphParam() dRegularGraphParam() dRegularMarkedGraphParam() @ The objects returned by these functions belong to different S4 classes derived from the following two main ones, \Rclass{graphParam} and \Rclass{markedGraphParam}: <<>>= showClass("graphParam") showClass("markedGraphParam") @ While this level of detail is not crucial for the end-user, knowing the distinction between these two main types of graph parameter objects, \Rclass{graphParam} and \Rclass{markedGraphParam}, may help to get more quickly acquainted with the type of arguments we need in calls described below to simulate GMMs. Finally, the function \Rfunction{rgraphBAM()} simulates one or more random graphs as objects of the class \Rclass{graphBAM} defined in the \Biocpkg{graph} package. Its arguments are: <<>>= args(rgraphBAM) @ where \Robject{n} is the number of graphs to simulate (default {\tt n=1}) and \Robject{param} is an object generated by one of the previous parameter functions. A couple of minimal examples are: <<>>= rgraphBAM(erGraphParam()) rgraphBAM(n=2, dRegularGraphParam()) @ In a slightly more ellaborate example, if we would like to simulate a $d$-regular graph on 2 discrete vertices and 5 continuous ones with a constant degree $d=3$, we would make the following call to \Rfunction{rgraphBAM()}, which we previously seed to enable the reader reproducing the same graph shown here: <<3regulargraph, fig=TRUE, include=FALSE, echo=TRUE, results=verbatim, height=5, width=5>>= set.seed(1234) g <- rgraphBAM(dRegularMarkedGraphParam(pI=2, pY=10, d=3)) plot(g) @ where the last \Rfunction{plot} function call is defined (overloaded) in the \Biocpkg{qpgraph} package to ease plotting the graph which is displayed in Figure~\ref{fig:3regulargraph}. This function uses the plotting capabilities from the \Biocpkg{Rgraphviz} package and further arguments, such as \Robject{layoutType}, can be passed downstream to the \Biocpkg{Rgraphviz} plotting function to fine tune the display of the graph. \begin{figure} \centerline{\includegraphics[width=0.3\textwidth]{qpgraphSimulate-3regulargraph}} \caption{A random 3-regular marked undirected graph.} \label{fig:3regulargraph} \end{figure} \section{Simulation of undirected Gaussian GMMs} Undirected Gaussian GMMs are multivariate normal models on continuous r.v.'s $X_V=\{X_1, \dots, X_p\}$ determined by an undirected graph $G=(V, E)$ with $V=\{1, \dots, p\}$ and $E\subseteq (V\times V)$. In particular, the zero-pattern of the inverse covariance matrix corresponds to the missing edges in $G$ \citep{Lauritzen:1996fj}. Therefore, simulating this type of GMM amounts to simulate a covariance matrix whose inverse matches the missing edges of a given, or simulated, undirected graph in its zero pattern and whose scaled covariance matches a given marginal correlation on the cells corresponding to present edges. This can be easily accomplished with \Biocpkg{qpgraph} using the function \Rfunction{rUGgmm}: <<>>= args(rUGgmm) @ where \Robject{n} corresponds to the number of undirected Gaussian GMMs we want to simulate (default \Robject{n=1} and \Robject{g} corresponds to either a \Rclass{graphParam} object, a \Rclass{graphBAM} object or a matrix. This depends on whether we want to simulate both the graph and the covariance matrix underlying the GMM, by providing a \Rclass{graphParam} object, or we just want to simulate the covariance matrix given a graph specified as either a \Rclass{graphBAM} object, an squared and symmetric adjacency matrix or a two-column matrix describing an edge set. Examples of these are the following: <<>>= rUGgmm(dRegularGraphParam(p=4, d=2)) rUGgmm(matrix(c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0), nrow=4, byrow=TRUE)) rUGgmm(matrix(c(1, 2, 2, 3, 3, 4, 4, 1), ncol=2, byrow=TRUE)) @ These three calls to \Rfunction{rUGgmm()} return objects of class \Rclass{UGgmm} containing undirected Gaussian GMMs with an underlying graph structure formed by a single undirected cycle on four vertices. The elements of an \Rclass{UGgmm} object can be quickly explored with the \Rfunction{summary()} function call: <<>>= set.seed(12345) gmm <- rUGgmm(dRegularGraphParam(p=4, d=2)) summary(gmm) @ and the individual elements that are available to the user can be accessed as if it were a \Rclass{list} object: <<>>= class(gmm) names(gmm) gmm$X gmm$p gmm$g gmm$mean gmm$sigma @ We can also directly plot the \Rclass{UGgmm} object to see the underlying undirected graph and, in this particular example, note how the zeroes of the inverse covariance match the missing edges shown in Figure~\ref{fig:4cyclegraph}. <<4cyclegraph, fig=TRUE, include=FALSE, echo=TRUE, results=verbatim, height=5, width=5>>= plot(gmm) round(solve(gmm$sigma), digits=1) @ \begin{figure} \centerline{\includegraphics[width=0.3\textwidth]{qpgraphSimulate-4cyclegraph}} \caption{A 4-cycle undirected graph.} \label{fig:4cyclegraph} \end{figure} Further arguments to \Rfunction{rUGgmm()} can be the desired mean marginal correlation derived from the cells of the covariance matrix that match the present edges in the underlying graph (\Robject{rho=0.5}), the minimum tolerance at which the iterative matrix completion algorithm that builds the covariance matrix stops (\Robject{tol=0.001}) and whether the function should report progress on the calculations (\Robject{verbose=FALSE}). It is important to set the latter argument \Robject{verbose=TRUE} when we want to simulate an undirected Gaussian GMM with more than, let's say, 200 vertices, since around that number of vertices and beyond the simulation of the covariance matrix becomes computationally demanding, specially when the underlying graph is not very sparse. Further technical information on the algorithms employed to simulate the covariance matrix can be found in the help pages of the \Biocpkg{qpgraph} functions \Rfunction{qpG2Sigma()}, \Rfunction{qpRndWishart()}, \Rfunction{qpIPF()} and \Rfunction{qpHTF()} which are called by the procedures described here. Finally, to simulate multivariate normal observations from the undirected Gaussian GMM we just need to use the \Rfunction{rmvnorm()} function from the \Biocpkg{mvtnorm} package which is overloaded in the \Biocpkg{qpgraph} package to take directly an \Rclass{UGgmm} object, as follows: <<>>= rmvnorm(10, gmm) @ Note that with sufficient data we can directly recover the zero-pattern of the inverse covariance matrix: <<>>= set.seed(123) X <- rmvnorm(10000, gmm) round(solve(cov(X)), digits=1) round(solve(gmm$sigma), digits=1) @ However, such a sample size would be exceptional and for more limited sample size but still with $p < n$ the user may use the \Biocpkg{qpgraph} function \Rfunction{qpPAC()} which performs zero-partial correlation tests and when $p\gg n$, then the user may estimate non-rejection rates \cite{Castelo:2006yu} with the \Rfunction{qpNrr()} function and simplify the saturated model such that it may become possible to apply \Rfunction{qpPAC()}. Obviously, gene expression data, either from microarrays or log-transformed count data, are far from being multivariate normal. However, many available methods for estimating molecular regulatory networks from expression data, such as \Biocpkg{qpgraph}, make such an assumption and simulating data from undirected Gaussian GMMs can help us to test these methods under a controlled experiment, learning their basic properties and obvious pitfalls with such a clean data. \section{Simulation of homogeneous mixed GMMs} Mixed GMMs are GMMs for multivariate data defined by mixed discrete and continuous r.v.'s, $X=\{I, Y\}$ where $I=\{I_1, \dots, I_{p_I}\}$ denote discrete r.v.'s and $Y=\{Y_1,\dots, Y_{p_Y}\}$ denote continuous ones. This class of GMMs are determined by marked graphs $G=(V, E)$ with $p$ marked vertices $V=\Delta\cup\Gamma$ where $\Delta=\{1, \dots, p_I\}$ are plotted with dots and index the discrete r.v.'s in $I$ and $\Gamma=\{1, \dots, p_Y\}$ are denoted by circles and index the continuous r.v.'s in $Y$. In the context of molecular regulatory networks and, particularly, of genetical genomics data where we associate discrete r.v.'s to genotypes and continuous ones to expression profiles, we make the assumption that discrete genotypes affect gene expression and not the other way around. Under this assumption, we will consider the underlying graph $G$ not only with mixed vertices but also with mixed edges, where some are directed and represented by arrows and some are undirected. More concretely $G$ will be a partially-directed graph with arrows pointing from discrete vertices to continuous ones and undirected edges between continuous vertices. From this restricted definition of a partially-directed graph it follows inmediately that there are no semi-directed (direction preserving) cycles and allows one to interpret these GMMs also as {\it chain graphs}, which are graphs formed by undirected subgraphs connected by directed edges \citep{Lauritzen:1996fj}. Currently, the \Biocpkg{graph} and \Biocpkg{Rgraphviz} packages, in which \Biocpkg{qpgraph} relies to handle and plot graph objects, do not directly allow one to define and work with partially-directed graphs. However, in the functionality described below \Biocpkg{qpgraph} tries to hide to the user complications derived from this fact. A second important assumption \Biocpkg{qpgraph} makes is that the joint distribution of the r.v.'s in $X$ is a conditional Gaussian distribution (also known as CG-distribution) by which continuous r.v.'s follow a multivariate normal distribution $\mathcal{N}_{|\Gamma|}(\mu(i), \Sigma(i))$ conditioned on the joint levels $i\in\mathcal{I}$ from the discrete variables in $I$. A third and final assumption is that the conditional covariance matrix is constant across $i\in\mathcal{I}$, the joint levels of $I$, i.e., $\Sigma(i)\equiv\Sigma$. This implies that we are actually simulating the so-called {\it homogeneous} mixed GMMs. In the context of genetical genomics data, this assumption implies that genotype alleles affect only the mean expression level of genes and not the correlations between them. Two restrictions currently constain further the type of mixed GMMs we can simulate with \Biocpkg{qpgraph}. The first one is that discrete variables are simulated under marginal independence between them and the second one is that every continuous variable cannot be associated to more than one discrete variable. As we shall see below, the first restriction does not apply when simulating expression quantitative trait loci data in experimental crosses, as genotype marker data is simulated by another package, the \CRANpkg{qtl} ackage \citep{Broman:2003fk}. We are working to remove the second restriction in the near future and enable multiple discrete variables having linear additive effects and interaction effects, on a common continuous variable. Similarly to how we did it with undirected Gaussian GMMs, simulating mixed GMMs is done with a call to the function \Rfunction{rHMgmm()}: <<>>= args(rHMgmm) @ where \Robject{n} corresponds to the number of mixed GMMs we want to simulate (default \Robject{n=1} and \Robject{g} corresponds to either a \Rclass{markedGraphParam} object, a \Rclass{graphBAM} object or a matrix. Note that the first assumption made before enables specifying the underlying partially-directed graph just as we did for an undirected one, since directed edges are completely determined by the vertex type at their endpoints (discrete to continuous). Examples of these are the following: <<>>= rHMgmm(dRegularMarkedGraphParam(pI=1, pY=3, d=2)) rHMgmm(matrix(c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0), nrow=4, byrow=TRUE), I=1) rHMgmm(matrix(c(1, 2, 2, 3, 3, 4, 4, 1), ncol=2, byrow=TRUE), I=1) @ These three calls to \Rfunction{rHMgmm()} return objects of class \Rclass{HMgmm} containing homogenous mixed GMMs with an underlying graph structured formed by one discrete vertex pointing to two continous ones which are themselves forming an undirected connected component with a fourth vertex, all together forming an undirected cycle on four vertices. Just as with \Rclass{UGgmm} objects, the elements of an \Rclass{HMgmm} object can be quickly explored with the \Rfunction{summary()} function call: <<>>= set.seed(12345) gmm <- rHMgmm(dRegularMarkedGraphParam(pI=1, pY=3, d=2)) summary(gmm) @ and again the individual elements that are available to the user can be accessed as if it were a \Rclass{list} object: <<>>= class(gmm) names(gmm) gmm$X gmm$I gmm$Y gmm$p gmm$pI gmm$pY gmm$g gmm$mean() gmm$sigma gmm$a gmm$eta2 @ We can also directly plot the \Rclass{HMgmm} object and \Biocpkg{qpgraph} will use the necessary instructions from the \Biocpkg{graph} and \Biocpkg{Rgraphviz} libraries to display a partially-directed graph as the one shown in Figure~\ref{fig:hmgmm}. <>= plot(gmm) @ \begin{figure} \centerline{\includegraphics[width=0.3\textwidth]{qpgraphSimulate-hmgmm}} \caption{Homogeneous mixed graphical (chain) model with one discrete variable and three continuous ones forming an undirected cycle on four vertices.} \label{fig:hmgmm} \end{figure} Further arguments to \Rfunction{rHMgmm()} are all we described previously for the \Rfunction{rUGgmm()} function plus the desired additive linear effect (\Robject{a=1}) of the discrete levels (alleles in the genetics context) on the continuous variables (genes in the genetics context). To simulate conditional multivariate normal observations from the homogeneous mixed GMM we use the \Rfunction{rcmvnorm()} function, which uses its pure continuous counterpart \Rfunction{rmvnorm()} from the \CRANpkg{mvtnorm} package, but which is defined in the \Biocpkg{qpgraph} package as it needs to calculate the corresponding conditional mean vectors $\mu(i)$: <<>>= rcmvnorm(10, gmm) @ Note that with sufficient data we can directly recover the zero-pattern of the inverse {\it conditional} covariance matrix: <<>>= set.seed(123) X <- rcmvnorm(10000, gmm) csigma <- (1/10000)*sum(X[, gmm$I] == 1)*cov(X[X[, gmm$I]==1, gmm$Y]) + (1/10000)*sum(X[, gmm$I] == 2)*cov(X[X[, gmm$I]==2, gmm$Y]) round(solve(csigma), digits=1) round(solve(gmm$sigma), digits=1) @ and that the sample sample mean vectors and additive effects approach the ones specified in the model according to the mixed associations between the discrete and continuous variables: <<>>= smean <- apply(X[, gmm$Y], 2, function(x, i) tapply(x, i, mean), X[, gmm$I]) smean gmm$mean() abs(smean[1, ] - smean[2, ]) gmm$a @ \section{Simulation of eQTL models of experimental crosses} We illustrate in this section how we can use \Biocpkg{qpgraph} in conjunction with the \CRANpkg{qtl} package \citep{Broman:2003fk} to simulate expression quantatitve trait loci (eQTL) models of experimental crosses and data from them. This functionality employs the previously described procedures to simulate an homogeneous mixed GMM that represents the underlying model of regulatory {\it cis}-eQTL, {\it trans}-eQTL and gene-gene associations, although this fact appears hidden to the user. More concretely, the \Biocpkg{qpgraph} package defines an object class called \Rclass{eQTLcross} which basically holds a genetic map of the genotype markers (as defined by the \Rclass{map} class in the \CRANpkg{qtl} package from \citealp{Broman:2003fk}) and an homogeneous mixed GMM defining the underlying molecular regulatory network that connects genotypes with genes and genes themselves, where we use the term {\it gene} to refer to a gene expression profile. In a similar vein to the way we simulated before graphs and GMMs, we need to create a parameter object that defines the main features of the eQTL model we want to simulate. This is done through the function \Rfunction{eQTLcrossParam()} which by default defines some minimal eQTL model: <<>>= eQTLcrossParam() args(eQTLcrossParam) @ To simulate an \Rclass{eQTLcross} object \Biocpkg{qpgraph} provides the function \Rfunction{reQTLcross()} giving as first argument the number of \Rclass{eQTLcross} objects we want to simulate and a \Rclass{eQTLcrossParam} object: <<>>= reQTLcross(n=2, eQTLcrossParam()) @ When the first argument \Robject{n} is omitted, then \Robject{n=1} is assumed by default. Other arguments to \Rfunction{reQTLcross()} are the mean marginal correlation between genes (\Robject{rho=0.5}), the magnitude of the linear additive effect of the simulated eQTL associations (\Robject{a=1}), the minimum tolerance of the matrix completion algorithm that is involved in the construction of the conditional covariance matrix (\Robject{tol=0.001}) and whether progress on the calculations should be shown \Robject{verbose=FALSE}. To simulate a larger \Rclass{eQTLcross} object we need to simulate a genetic map using the \Rfunction{sim.map()} function from the \CRANpkg{qtl} package \citep{Broman:2003fk}, which should be loaded first. Since \Biocpkg{qpgraph} overloads the \CRANpkg{qtl} function \Rfunction{sim.cross()}, which will be used later to simulate data from an \Rclass{eQTLcross} object, we will detach \Biocpkg{qpgraph} before loading \CRANpkg{qtl}, and load \Biocpkg{qpgraph} again. <<>>= detach("package:qpgraph") library(qtl) library(qpgraph) map <- sim.map(len=rep(100, times=20), n.mar=rep(10, times=20), anchor.tel=FALSE, eq.spacing=FALSE, include.x=TRUE) @ and using it in combination with a larger number of genes (50) we can easily simulate this larger \Rclass{eQTLcross} object: <<>>= eqtl <- reQTLcross(eQTLcrossParam(map=map, genes=50)) class(eqtl) eqtl @ which, as we can see, it corresponds a backcross model with 50 genes each of them associated to a {\it cis}-eQTL, and with a certain underlying gene network embedded into an homogeneous mixed GMM that forms part of this object and which can be accessed as follows: <<>>= eqtl$model @ A dot plot describing the eQTL associations along the given genetic map can be obtained by calling the \Rfunction{plot} function with the \Rclass{eQTLcross} object as argument. In Figure~\ref{fig:geneticmap} we see on the right panel such a plot, and on the left panel the genetic map plotted by the plot function defined in the \CRANpkg{qtl} package \citep{Broman:2003fk} to plot genetic maps. <>= par(mfrow=c(1,2)) plot(map) plot(eqtl, main="eQTL model with cis-associations only") @ \begin{figure} \centerline{\includegraphics[width=\textwidth]{qpgraphSimulate-geneticmap}} \caption{A genetic map simulated with the \CRANpkg{qtl} package \citep{Broman:2003fk} on the left, and on the right, an eQTL model simulated using the genetic map with the \Biocpkg{qpgraph} package.} \label{fig:geneticmap} \end{figure} A somewhat more complex eQTL model with {\it trans} associations can be simulated by using the \Robject{trans} argument as follows: <<>>= set.seed(123) eqtl <- reQTLcross(eQTLcrossParam(map=map, genes=50, cis=0.5, trans=c(5, 5)), a=5) @ In this call, \Robject{cis=0.5} indicates that 50\% of the genes should have {\it cis}-eQTL associations and among the remaining ones, 10 will be associated to two {\it trans}-eQTL affecting 5 genes each of the two. We have also increased the default additive linear effect from the default value \Robject{a=1} to \Robject{a=5} which corresponds to a very strong linear additive effect from genotype markers on gene expression. We can examine the {\it cis} and {\it trans} associations of the \Rclass{eQTLcross} object with the \Rfunction{ciseQTL()} and \Rfunction{transeQTL()} functions: <<>>= head(ciseQTL(eqtl)) transeQTL(eqtl) @ and examine where the eQTL associations occur and what genes map to {\it trans}-eQTL, as shown in Figure~\ref{fig:transeqtl}. <>= plot(eqtl, main="eQTL model with trans-eQTL") @ \begin{figure} \centerline{\includegraphics[width=0.5\textwidth]{qpgraphSimulate-transeqtl}} \caption{An eQTL model including trans-acting associations simulated using the genetic map from Fig.~\ref{fig:geneticmap}.} \label{fig:transeqtl} \end{figure} Using this \Rclass{eQTLcross} object we can simulate data from the corresponding experimental cross with the function \Rfunction{sim.cross()} from the \CRANpkg{qtl} package \citep{Broman:2003fk} but which is overloaded in \Biocpkg{qpgraph} to plug the eQTL associations into the corresponding genetic loci: <<>>= set.seed(123) cross <- sim.cross(map, eqtl) cross @ Note that here, the genotype data is simulated by the procedures implemented in the \CRANpkg{qtl} package \citep{Broman:2003fk} and \Biocpkg{qpgraph} adds to that simulation the eQTL and gene network associations. Thus, while the \Rfunction{rHMgmm()} function described in the previous section, does not simulate correlated discrete variables, here the genotypes will be correlated according to the input arguments of the \Rfunction{sim.cross()} function in \CRANpkg{qtl} (mainly, the \Robject{map.function} argument that converts genetic distances into recombination fractions) and which can be passed through the previous call to \Rfunction{sim.cross()}. Let's focus on a specific simulated eQTL, concretely on the second one of the following list: <<>>= allcis <- ciseQTL(eqtl) allcis[allcis$chrom==1, ] gene <- allcis[2, "gene"] chrom <- allcis[2, "chrom"] location <- allcis[2, "location"] @ Find out the genes connected to gene \Sexpr{gene} in the underlying regulatory network: <<>>= connectedGenes <- graph::inEdges(gene, eqtl$model$g)[[1]] connectedGenes <- connectedGenes[connectedGenes %in% eqtl$model$Y] connectedGenes @ Now, using the \CRANpkg{qtl} package \citep{Broman:2003fk} and its \Rfunction{scanone()} function, we perform a simple QTL analysis by single marker regression using the expression profiles from genes \Sexpr{paste(c(gene, connectedGenes), collapse=", ")} as phenotypes: <<>>= out.mr <- scanone(cross, method="mr", pheno.col=c(gene, connectedGenes)) @ By using the plotting functionalities of the \CRANpkg{qtl} package \citep{Broman:2003fk} we can examine the LOD score profile of these three genes on chromosome \Sexpr{chrom} where the eQTL of gene \Sexpr{gene} is located: <>= plot(out.mr, chr=chrom, ylab="LOD score", lodcolumn=1:3, col=c("black", "blue", "red"), lwd=2) abline(v=allcis[allcis$gene == gene, "location"]) legend("topleft", names(out.mr)[3:5], col=c("black", "blue", "red"), lwd=2, inset=0.05) @ \begin{figure} \centerline{\includegraphics[width=0.6\textwidth]{qpgraphSimulate-lodprofiles}} \caption{Profile of LOD scores for three genes with direct and indirect eQTL.} \label{fig:lodprofiles} \end{figure} We can observe in Figure~\ref{fig:lodprofiles} that not only the directly associated gene \Sexpr{gene} seems to have an eQTL at position \Sexpr{round(location, digits=1)} cM, but also the genes \Sexpr{paste(connectedGenes, collapse=", ")} connected to \Sexpr{gene} in the underlying gene network. Using a permutation procedure implemented in \CRANpkg{qtl}, we calculate LOD score threholds that yield genome-wide statistical significant eQTL associations: <<>>= out.perm <- scanone(cross, method="mr", pheno.col=c(gene, connectedGenes), n.perm=100, verbose=FALSE) summary(out.perm, alpha=c(0.05, 0.10)) @ and examine which genotype markers yield such significant associations to these the expression profiles of these genes: <<>>= summary(out.mr, perms=out.perm, alpha=0.05) @ Notice that the directly associated gene \Sexpr{gene} as well as the indirectly associated ones \Sexpr{paste(connectedGenes, collapse=", ")} have genome-wide significant LOD scores on the same eQTL located at \Sexpr{round(location, digits=1)} cM in chromosome \Sexpr{chrom}. \section{Session information} <>= toLatex(sessionInfo()) @ \bibliography{qpgraph} \bibliographystyle{apalike} \end{document}