%\VignetteIndexEntry{Simulating molecular regulatory networks using qpgraph}
%\VignetteKeywords{graphical Markov model, qp-graph, microarray, network, eQTL}
%\VignettePackage{qpgraphSimulate}

\documentclass{article}

<<style, eval=TRUE, echo=FALSE, results=tex>>=
BiocStyle::latex(use.unsrturl=FALSE)
@

\usepackage{natbib}

\bioctitle[Simulating networks using \Biocpkg{qpgraph}]%
    {Simulating molecular regulatory networks using {\tt qpgraph}}
\author{Inma Tur$^{1,3}$, Alberto Roverato$^2$ and Robert Castelo$^1$}

\begin{document}

\SweaveOpts{eps=FALSE}

\DefineVerbatimEnvironment{Sinput}{Verbatim}
{formatcom = {\color{Sinput}}}
\DefineVerbatimEnvironment{Soutput}{Verbatim}
{formatcom = {\color{Soutput}}}
\DefineVerbatimEnvironment{Scode}{Verbatim}
{formatcom = {\color{Scode}}}

\definecolor{Sinput}{rgb}{0.21,0.49,0.72}
\definecolor{Soutput}{rgb}{0.32,0.32,0.32}
\definecolor{Scode}{rgb}{0.75,0.19,0.19}

<<setup, echo=FALSE>>=
pdf.options(useDingbats=FALSE)
options(width=80)
rm(list=ls())
try(detach("package:mvtnorm"), silent=TRUE)
try(detach("package:qtl"), silent=TRUE)
@

\maketitle

\begin{quote}
{\scriptsize
1. Universitat Pompeu Fabra, Barcelona, Spain. \\
2. Universit\`a di Bologna, Bologna, Italy. \\
3. Now at Kernel Analytics, Barcelona, Spain.
}
\end{quote}

\section{Introduction}

The theoretical substrate used by \Biocpkg{qpgraph} to estimate network models
of molecular regulatory interactions is that of graphical Markov models (GMMs). A useful
way to understand the underpinnings of these models is to simulate them and simulate
data from them. More importantly, these simulated data may serve the purpose of verifying
properties of GMM estimation procedures, such as correctness or asymptotic behavior.
Here we illustrate the functionalities of \Biocpkg{qpgraph} to perform these simulations.
If you use them in your own research, please cite the following article:

\begin{quote}
  Tur, I., Roverato, A. and Castelo, R. Mapping eQTL networks with mixed graphical
  Markov models. \textit{Genetics}, 198(4):1377-1393, 2014.
\end{quote}

The interface provided by \Biocpkg{qpgraph} tries to comply with the available functions
in the base R \CRANpkg{stats} package for simulating data from probability distributions
and the names of functions described below for the purpose of simulating graphs, models
and data follow the convention:

\begin{quote}
  r$<\!\!\!objectclass\!\!\!>$($n$, ...)
\end{quote}
where $<\!\!\!objectclass\!\!\!>$ refers to the class of object (in a broad sense, not
just a formal S3 or S4 class) being simulated and $n$ is the number of observations to
simulate. Except for the case of data, since the simulated observations are other than
R atomic types of objects, when $n>1$, these functions return simulated observations in
the form of a list with $n$ elements.

\section{Simulation of graphs}

An undirected graph $G$ is a mathematical object defined by a pair of sets $G=(V, E)$
where $V=\{1,\dots,p\}$ is the vertex set and $E\subseteq (V\times V)$ is the edge set.
In the context of GMMs {\it labeled} undirected graphs are employed to represent
conditional (in)dependences among random variables (r.v.'s) $X=\{X_1, \dots, X_p\}$
indexed by the vertices in $V$ whose values occur on equal footing. Stepwise data
generating processes can be represented by directed graphs. In the context of GMMs
one may also consider so-called {\it marked} graphs, which are graphs with {\it marked}
vertices grouped into two subsets, one associated to discrete variables and
another to continuous ones. A graph with a single type of vertices, i.e., that is not
marked, it is also called {\it pure}. Diferent types of graphs determine different GMM classes.
For a comprehensive description of different GMM classes and more elaborate descriptions
of the terminology and notation used in this vignette the reader may consult the
book of \citet{Lauritzen:1996fj}.

The first step to simulate a GMM consists of simulating its associated graph. While
there are many R packages that provide procedures to simulate graphs, \Biocpkg{qpgraph}
provides its own minimal functionality for this purpose tailored to ease the downstream
simulation of GMMs. This functionality allows the user to simulate undirected graphs
according to two main criteria, the type of graph (pure or marked) and the type of model
to simulate the random graph.

The simplest type of model to simulate random undirected graphs is the so-called
Erd\H{o}s-R\'enyi which is generated by either including an edge between every pair
of vertices with a pre-specified probability or drawing a graph uniformly at random
among those with a pre-specified number of edges. In the context of exploring the
performance of GMM structure estimation procedures under different degrees of
sparseness of the underlying graph, it is handy to work with the so-called $d$-regular
graphs \citep{Harary:1969}. These are graphs with a constant degree vertex $d$
which, in one hand, make the graph density a linear function of $d$ and, on the
other hand, bound the smallest minimal separator between any two vertices
\citep[see pg.~2646]{Castelo:2006yu}.

To specify the parameters that define one specific type of graph we want to simulate
\Biocpkg{qpgraph} provides the following functions that build parameter
objects which can be used afterwards to simulate graphs through a single call to the
function \Rfunction{rgraphBAM()}:

<<>>=
library(qpgraph)

args(erGraphParam)
args(erMarkedGraphParam)
args(dRegularGraphParam)
args(dRegularMarkedGraphParam)
@
As we can see from their default values, a single call without arguments already define
some small graphs on 5 vertices:

<<>>=
erGraphParam()
erMarkedGraphParam()
dRegularGraphParam()
dRegularMarkedGraphParam()
@
The objects returned by these functions belong to different S4 classes derived from the
following two main ones, \Rclass{graphParam} and \Rclass{markedGraphParam}:

<<>>=
showClass("graphParam")
showClass("markedGraphParam")
@
While this level of detail is not crucial for the end-user, knowing the distinction
between these two main types of graph parameter objects, \Rclass{graphParam} and
\Rclass{markedGraphParam}, may help to get more quickly acquainted with the type
of arguments we need in calls described below to simulate GMMs.

Finally, the function \Rfunction{rgraphBAM()} simulates one or more random graphs
as objects of the class \Rclass{graphBAM} defined in the \Biocpkg{graph} package.
Its arguments are:

<<>>=
args(rgraphBAM)
@
where \Robject{n} is the number of graphs to simulate (default {\tt n=1}) and
\Robject{param} is an object generated by one of the previous parameter functions.
A couple of minimal examples are:

<<>>=
rgraphBAM(erGraphParam())
rgraphBAM(n=2, dRegularGraphParam())
@
In a slightly more ellaborate example, if we would like to simulate a $d$-regular
graph on 2 discrete vertices and 5 continuous ones with a constant degree $d=3$,
we would make the following call to \Rfunction{rgraphBAM()}, which we previously
seed to enable the reader reproducing the same graph shown here:

<<3regulargraph, fig=TRUE, include=FALSE, echo=TRUE, results=verbatim, height=5, width=5>>=
set.seed(1234)
g <- rgraphBAM(dRegularMarkedGraphParam(pI=2, pY=10, d=3))
plot(g)
@
where the last \Rfunction{plot} function call is defined (overloaded) in the
\Biocpkg{qpgraph} package to ease plotting the graph which is displayed in
Figure~\ref{fig:3regulargraph}. This function uses the plotting capabilities
from the \Biocpkg{Rgraphviz} package and further arguments, such as
\Robject{layoutType}, can be passed downstream to the \Biocpkg{Rgraphviz}
plotting function to fine tune the display of the graph.

\begin{figure}
  \centerline{\includegraphics[width=0.3\textwidth]{qpgraphSimulate-3regulargraph}}
  \caption{A random 3-regular marked undirected graph.}
  \label{fig:3regulargraph}
\end{figure}

\section{Simulation of undirected Gaussian GMMs}

Undirected Gaussian GMMs are multivariate normal models on continuous r.v.'s
$X_V=\{X_1, \dots, X_p\}$ determined by an undirected graph $G=(V, E)$ with
$V=\{1, \dots, p\}$ and $E\subseteq (V\times V)$. In particular, the
zero-pattern of the inverse covariance matrix corresponds to the missing edges
in $G$ \citep{Lauritzen:1996fj}. Therefore, simulating this type of GMM amounts
to simulate a covariance matrix whose inverse matches the missing edges of a
given, or simulated, undirected graph in its zero pattern and whose scaled
covariance matches a given marginal correlation on the cells corresponding to
present edges. This can be easily accomplished with \Biocpkg{qpgraph} using
the function \Rfunction{rUGgmm}:

<<>>=
args(rUGgmm)
@
where \Robject{n}  corresponds to the number of undirected Gaussian GMMs we want to
simulate (default \Robject{n=1} and \Robject{g} corresponds to either a
\Rclass{graphParam} object, a \Rclass{graphBAM} object or a matrix. This depends on
whether we want to simulate both the graph and the covariance matrix underlying the
GMM, by providing a \Rclass{graphParam} object, or we just want to simulate the
covariance matrix given a graph specified as either a \Rclass{graphBAM} object,
an squared and symmetric adjacency matrix or a two-column matrix describing an edge
set. Examples of these are the following:

<<>>=
rUGgmm(dRegularGraphParam(p=4, d=2))
rUGgmm(matrix(c(0, 1, 0, 1,
                1, 0, 1, 0,
                0, 1, 0, 1,
                1, 0, 1, 0), nrow=4, byrow=TRUE))
rUGgmm(matrix(c(1, 2,
                2, 3,
                3, 4,
                4, 1), ncol=2, byrow=TRUE))
@
These three calls to \Rfunction{rUGgmm()} return objects of class \Rclass{UGgmm}
containing undirected Gaussian GMMs with an underlying graph structure formed by
a single undirected cycle on four vertices. The elements of an \Rclass{UGgmm}
object can be quickly explored with the \Rfunction{summary()} function call:

<<>>=
set.seed(12345)
gmm <- rUGgmm(dRegularGraphParam(p=4, d=2))
summary(gmm)
@
and the individual elements that are available to the user can be
accessed as if it were a \Rclass{list} object:
<<>>=
class(gmm)
names(gmm)
gmm$X
gmm$p
gmm$g
gmm$mean
gmm$sigma
@
We can also directly plot the \Rclass{UGgmm} object to see the underlying
undirected graph and, in this particular example, note how the zeroes of the
inverse covariance match the missing edges shown in Figure~\ref{fig:4cyclegraph}.

<<4cyclegraph, fig=TRUE, include=FALSE, echo=TRUE, results=verbatim, height=5, width=5>>=
plot(gmm)
round(solve(gmm$sigma), digits=1)
@
\begin{figure}
  \centerline{\includegraphics[width=0.3\textwidth]{qpgraphSimulate-4cyclegraph}}
  \caption{A 4-cycle undirected graph.}
  \label{fig:4cyclegraph}
\end{figure}

Further arguments to \Rfunction{rUGgmm()} can be the desired mean marginal correlation
derived from the cells of the covariance matrix that match the present edges in the
underlying graph (\Robject{rho=0.5}), the minimum tolerance at which the iterative
matrix completion algorithm that builds the covariance matrix stops (\Robject{tol=0.001})
and whether the function should report progress on the calculations (\Robject{verbose=FALSE}).
It is important to set the latter argument \Robject{verbose=TRUE} when we want to simulate
an undirected Gaussian GMM with more than, let's say, 200 vertices, since around that number
of vertices and beyond the simulation of the covariance matrix becomes computationally
demanding, specially when the underlying graph is not very sparse. Further technical
information on the algorithms employed to simulate the covariance matrix can be found
in the help pages of the \Biocpkg{qpgraph} functions \Rfunction{qpG2Sigma()},
\Rfunction{qpRndWishart()}, \Rfunction{qpIPF()} and \Rfunction{qpHTF()} which are called
by the procedures described here.

Finally, to simulate multivariate normal observations from the undirected Gaussian GMM
we just need to use the \Rfunction{rmvnorm()} function from the \Biocpkg{mvtnorm} package
which is overloaded in the \Biocpkg{qpgraph} package to take directly an \Rclass{UGgmm}
object, as follows:

<<>>=
rmvnorm(10, gmm)
@
Note that with sufficient data we can directly recover the zero-pattern of the inverse
covariance matrix:

<<>>=
set.seed(123)
X <- rmvnorm(10000, gmm)
round(solve(cov(X)), digits=1)
round(solve(gmm$sigma), digits=1)
@
However, such a sample size would be exceptional and for more limited sample size but
still with $p < n$ the user may use the \Biocpkg{qpgraph} function \Rfunction{qpPAC()}
which performs zero-partial correlation tests and when $p\gg n$, then the user may 
estimate non-rejection rates \cite{Castelo:2006yu} with the \Rfunction{qpNrr()} function
and simplify the saturated model such that it may become possible to apply \Rfunction{qpPAC()}.

Obviously, gene expression data, either from microarrays or log-transformed count data, are
far from being multivariate normal. However, many available methods for estimating
molecular regulatory networks from expression data, such as \Biocpkg{qpgraph}, make such
an assumption and simulating data from undirected Gaussian GMMs can help us to test
these methods under a controlled experiment, learning their basic properties and obvious
pitfalls with such a clean data.

\section{Simulation of homogeneous mixed GMMs}

Mixed GMMs are GMMs for multivariate data defined by mixed discrete and continuous r.v.'s,
$X=\{I, Y\}$ where $I=\{I_1, \dots, I_{p_I}\}$ denote discrete r.v.'s and
$Y=\{Y_1,\dots, Y_{p_Y}\}$ denote continuous ones. This class of GMMs are determined
by marked graphs $G=(V, E)$ with $p$ marked vertices $V=\Delta\cup\Gamma$ where
$\Delta=\{1, \dots, p_I\}$ are plotted with dots and index the discrete r.v.'s in $I$ and
$\Gamma=\{1, \dots, p_Y\}$ are denoted by circles and index the continuous r.v.'s in $Y$.

In the context of molecular regulatory networks and, particularly, of genetical genomics
data where we associate discrete r.v.'s to genotypes and continuous ones to expression
profiles, we make the assumption that discrete genotypes affect gene expression and not
the other way around. Under this assumption, we will consider the underlying graph $G$
not only with mixed vertices but also with mixed edges, where some are directed and
represented by arrows and some are undirected. More concretely $G$ will be a partially-directed
graph with arrows pointing from discrete vertices to continuous ones and undirected edges
between continuous vertices. From this restricted definition of a partially-directed graph
it follows inmediately that there are no semi-directed (direction preserving) cycles and
allows one to interpret these GMMs also as {\it chain graphs}, which are graphs formed by
undirected subgraphs connected by directed edges \citep{Lauritzen:1996fj}. Currently, the
\Biocpkg{graph} and \Biocpkg{Rgraphviz} packages, in which \Biocpkg{qpgraph} relies to
handle and plot graph objects, do not directly allow one to define and work with
partially-directed graphs. However, in the functionality described below \Biocpkg{qpgraph}
tries to hide to the user complications derived from this fact.

A second important assumption \Biocpkg{qpgraph} makes is that the joint distribution of
the r.v.'s in $X$ is a conditional Gaussian distribution (also known as CG-distribution) by which
continuous r.v.'s follow a multivariate normal distribution $\mathcal{N}_{|\Gamma|}(\mu(i), \Sigma(i))$
conditioned on the joint levels $i\in\mathcal{I}$ from the discrete variables in $I$.

A third and final assumption is that the conditional covariance matrix is constant across
$i\in\mathcal{I}$, the joint levels of $I$, i.e., $\Sigma(i)\equiv\Sigma$. This implies
that we are actually simulating the so-called {\it homogeneous} mixed GMMs. In the context
of genetical genomics data, this assumption implies that genotype alleles affect only the
mean expression level of genes and not the correlations between them.

Two restrictions currently constain further the type of mixed GMMs we can simulate with
\Biocpkg{qpgraph}. The first one is that discrete variables are simulated under marginal
independence between them and the second one is that every continuous variable cannot be
associated to more than one discrete variable. As we shall see below, the first restriction
does not apply when simulating expression quantitative trait loci data in experimental
crosses, as genotype marker data is simulated by another package, the \CRANpkg{qtl}
ackage \citep{Broman:2003fk}. We are working to remove the second restriction in the near
future and enable multiple discrete variables having linear additive effects and interaction
effects, on a common continuous variable.

Similarly to how we did it with undirected Gaussian GMMs, simulating mixed GMMs is done
with a call to the function \Rfunction{rHMgmm()}:

<<>>=
args(rHMgmm)
@
where \Robject{n} corresponds to the number of mixed GMMs we want to simulate (default
\Robject{n=1} and \Robject{g} corresponds to either a \Rclass{markedGraphParam} object,
a \Rclass{graphBAM} object or a matrix. Note that the first assumption made before enables
specifying the underlying partially-directed graph just as we did for an undirected one,
since directed edges are completely determined by the vertex type at their endpoints
(discrete to continuous). Examples of these are the following:

<<>>=
rHMgmm(dRegularMarkedGraphParam(pI=1, pY=3, d=2))
rHMgmm(matrix(c(0, 1, 0, 1,
                1, 0, 1, 0,
                0, 1, 0, 1,
                1, 0, 1, 0), nrow=4, byrow=TRUE), I=1)
rHMgmm(matrix(c(1, 2,
                2, 3,
                3, 4,
                4, 1), ncol=2, byrow=TRUE), I=1)
@
These three calls to \Rfunction{rHMgmm()} return objects of class \Rclass{HMgmm}
containing homogenous mixed GMMs with an underlying graph structured formed by one
discrete vertex pointing to two continous ones which are themselves forming an
undirected connected component with a fourth vertex, all together forming an
undirected cycle on four vertices. Just as with \Rclass{UGgmm} objects, the
elements of an \Rclass{HMgmm} object can be quickly explored with the
\Rfunction{summary()} function call:

<<>>=
set.seed(12345)
gmm <- rHMgmm(dRegularMarkedGraphParam(pI=1, pY=3, d=2))
summary(gmm)
@
and again the individual elements that are available to the user can be
accessed as if it were a \Rclass{list} object:

<<>>=
class(gmm)
names(gmm)
gmm$X
gmm$I
gmm$Y
gmm$p
gmm$pI
gmm$pY
gmm$g
gmm$mean()
gmm$sigma
gmm$a
gmm$eta2
@
We can also directly plot the \Rclass{HMgmm} object and \Biocpkg{qpgraph} will use
the necessary instructions from the \Biocpkg{graph} and \Biocpkg{Rgraphviz} libraries
to display a partially-directed graph as the one shown in Figure~\ref{fig:hmgmm}.

<<hmgmm, fig=TRUE, include=FALSE, echo=TRUE, results=verbatim, height=5, width=5>>=
plot(gmm)
@

\begin{figure}
  \centerline{\includegraphics[width=0.3\textwidth]{qpgraphSimulate-hmgmm}}
  \caption{Homogeneous mixed graphical (chain) model with one discrete variable and
           three continuous ones forming an undirected cycle on four vertices.}
  \label{fig:hmgmm}
\end{figure}

Further arguments to \Rfunction{rHMgmm()} are all we described previously for the
\Rfunction{rUGgmm()} function plus the desired additive linear effect (\Robject{a=1}) of the
discrete levels (alleles in the genetics context) on the continuous variables (genes in the
genetics context). To simulate conditional multivariate normal observations from the homogeneous
mixed GMM we use the \Rfunction{rcmvnorm()} function, which uses its pure continuous
counterpart \Rfunction{rmvnorm()} from the \CRANpkg{mvtnorm} package, but which is defined
in the \Biocpkg{qpgraph} package as it needs to calculate the corresponding conditional mean
vectors $\mu(i)$:

<<>>=
rcmvnorm(10, gmm)
@
Note that with sufficient data we can directly recover the zero-pattern of the inverse
{\it conditional} covariance matrix:

<<>>=
set.seed(123)
X <- rcmvnorm(10000, gmm)
csigma <- (1/10000)*sum(X[, gmm$I] == 1)*cov(X[X[, gmm$I]==1, gmm$Y]) +
          (1/10000)*sum(X[, gmm$I] == 2)*cov(X[X[, gmm$I]==2, gmm$Y])
round(solve(csigma), digits=1)
round(solve(gmm$sigma), digits=1)
@
and that the sample sample mean vectors and additive effects approach the ones specified
in the model according to the mixed associations between the discrete and continuous variables:

<<>>=
smean <- apply(X[, gmm$Y], 2, function(x, i) tapply(x, i, mean), X[, gmm$I])
smean
gmm$mean()
abs(smean[1, ] - smean[2, ])
gmm$a
@

\section{Simulation of eQTL models of experimental crosses}

We illustrate in this section how we can use \Biocpkg{qpgraph} in conjunction with the
\CRANpkg{qtl} package \citep{Broman:2003fk} to simulate expression quantatitve trait
loci (eQTL) models of experimental crosses and data from them. This functionality employs
the previously described procedures to simulate an homogeneous mixed GMM that represents
the underlying model of regulatory {\it cis}-eQTL, {\it trans}-eQTL and gene-gene
associations, although this fact appears hidden to the user.

More concretely, the \Biocpkg{qpgraph} package defines an object class called
\Rclass{eQTLcross} which basically holds a genetic map of the genotype markers
(as defined by the \Rclass{map} class in the \CRANpkg{qtl} package from \citealp{Broman:2003fk})
and an homogeneous mixed GMM defining the underlying molecular regulatory network
that connects genotypes with genes and genes themselves, where we use the term
{\it gene} to refer to a gene expression profile.

In a similar vein to the way we simulated before graphs and GMMs, we need to create
a parameter object that defines the main features of the eQTL model we want to
simulate. This is done through the function \Rfunction{eQTLcrossParam()} which by
default defines some minimal eQTL model:

<<>>=
eQTLcrossParam()
args(eQTLcrossParam)
@
To simulate an \Rclass{eQTLcross} object \Biocpkg{qpgraph} provides the function
\Rfunction{reQTLcross()} giving as first argument the number of \Rclass{eQTLcross}
objects we want to simulate and a \Rclass{eQTLcrossParam} object:

<<>>=
reQTLcross(n=2, eQTLcrossParam())
@
When the first argument \Robject{n} is omitted, then \Robject{n=1} is assumed by
default. Other arguments to \Rfunction{reQTLcross()} are the mean marginal correlation
between genes (\Robject{rho=0.5}), the magnitude of the linear additive effect of
the simulated eQTL associations (\Robject{a=1}), the minimum tolerance of the
matrix completion algorithm that is involved in the construction of the conditional
covariance matrix (\Robject{tol=0.001}) and whether progress on the calculations
should be shown \Robject{verbose=FALSE}.

To simulate a larger \Rclass{eQTLcross} object we need to simulate a genetic map
using the \Rfunction{sim.map()} function from the \CRANpkg{qtl} package \citep{Broman:2003fk},
which should be loaded first. Since \Biocpkg{qpgraph} overloads the \CRANpkg{qtl}
function \Rfunction{sim.cross()}, which will be used later to simulate data from
an \Rclass{eQTLcross} object, we will detach \Biocpkg{qpgraph} before loading
\CRANpkg{qtl}, and load \Biocpkg{qpgraph} again.

<<>>=
detach("package:qpgraph")
library(qtl)
library(qpgraph)

map <- sim.map(len=rep(100, times=20),
               n.mar=rep(10, times=20),
               anchor.tel=FALSE,
               eq.spacing=FALSE,
               include.x=TRUE)
@
and using it in combination with a larger number of genes (50) we can easily
simulate this larger \Rclass{eQTLcross} object:

<<>>=
eqtl <- reQTLcross(eQTLcrossParam(map=map, genes=50))
class(eqtl)
eqtl
@
which, as we can see, it corresponds a backcross model with 50 genes each of them associated
to a {\it cis}-eQTL, and with a certain underlying gene network embedded into an
homogeneous mixed GMM that forms part of this object and which can be accessed as
follows:

<<>>=
eqtl$model
@
A dot plot describing the eQTL associations along the given genetic map can be obtained
by calling the \Rfunction{plot} function with the \Rclass{eQTLcross} object as
argument. In Figure~\ref{fig:geneticmap} we see on the right panel such a plot,
and on the left panel the genetic map plotted by the plot function defined in the
\CRANpkg{qtl} package \citep{Broman:2003fk} to plot genetic maps.

<<geneticmap, fig=TRUE, include=FALSE, echo=TRUE, results=verbatim, height=6, width=12>>=
par(mfrow=c(1,2))
plot(map)
plot(eqtl, main="eQTL model with cis-associations only")
@
\begin{figure}
  \centerline{\includegraphics[width=\textwidth]{qpgraphSimulate-geneticmap}}
  \caption{A genetic map simulated with the \CRANpkg{qtl} package \citep{Broman:2003fk} on
           the left, and on the right, an eQTL model simulated using the genetic map with
           the \Biocpkg{qpgraph} package.}
  \label{fig:geneticmap}
\end{figure}

A somewhat more complex eQTL model with {\it trans} associations can be simulated by using
the \Robject{trans} argument as follows:

<<>>=
set.seed(123)
eqtl <- reQTLcross(eQTLcrossParam(map=map, genes=50,
                                  cis=0.5, trans=c(5, 5)), a=5)
@
In this call, \Robject{cis=0.5} indicates that 50\% of the genes should have {\it cis}-eQTL
associations and among the remaining ones, 10 will be associated to two {\it trans}-eQTL
affecting 5 genes each of the two. We have also increased the default additive linear
effect from the default value \Robject{a=1} to \Robject{a=5} which corresponds to a very strong
linear additive effect from genotype markers on gene expression. We can examine the
{\it cis} and {\it trans} associations of the \Rclass{eQTLcross} object with the
\Rfunction{ciseQTL()} and \Rfunction{transeQTL()} functions:

<<>>=
head(ciseQTL(eqtl))
transeQTL(eqtl)
@
and examine where the eQTL associations occur and what genes map to {\it trans}-eQTL,
as shown in Figure~\ref{fig:transeqtl}.

<<transeqtl, fig=TRUE, include=FALSE, echo=TRUE, results=verbatim, height=6, width=6>>=
plot(eqtl, main="eQTL model with trans-eQTL")
@
\begin{figure}
  \centerline{\includegraphics[width=0.5\textwidth]{qpgraphSimulate-transeqtl}}
  \caption{An eQTL model including trans-acting associations simulated using the
           genetic map from Fig.~\ref{fig:geneticmap}.}
  \label{fig:transeqtl}
\end{figure}

Using this \Rclass{eQTLcross} object we can simulate data from the corresponding
experimental cross with the function \Rfunction{sim.cross()} from the \CRANpkg{qtl}
package \citep{Broman:2003fk}  but which is overloaded in \Biocpkg{qpgraph} to plug
the eQTL associations into the corresponding genetic loci:

<<>>=
set.seed(123)
cross <- sim.cross(map, eqtl)
cross
@
Note that here, the genotype data is simulated by the procedures implemented in
the \CRANpkg{qtl} package \citep{Broman:2003fk}  and \Biocpkg{qpgraph} adds to
that simulation the eQTL and gene network associations. Thus, while the
\Rfunction{rHMgmm()} function described in the previous section, does not simulate
correlated discrete variables, here the genotypes will be correlated according to
the input arguments of the \Rfunction{sim.cross()} function in \CRANpkg{qtl} (mainly,
the \Robject{map.function} argument that converts genetic distances into recombination
fractions) and which can be passed through the previous call to \Rfunction{sim.cross()}.

Let's focus on a specific simulated eQTL, concretely on the second one of the following
list:

<<>>=
allcis <- ciseQTL(eqtl)
allcis[allcis$chrom==1, ]
gene <- allcis[2, "gene"]
chrom <- allcis[2, "chrom"]
location <- allcis[2, "location"]
@
Find out the genes connected to gene \Sexpr{gene} in the underlying regulatory network:
<<>>=
connectedGenes <- graph::inEdges(gene, eqtl$model$g)[[1]]
connectedGenes <- connectedGenes[connectedGenes %in% eqtl$model$Y]
connectedGenes
@
Now, using the \CRANpkg{qtl} package \citep{Broman:2003fk} and its \Rfunction{scanone()}
function, we perform a simple QTL analysis by single marker regression using the expression
profiles from genes \Sexpr{paste(c(gene, connectedGenes), collapse=", ")} as
phenotypes:

<<>>=
out.mr <- scanone(cross, method="mr", pheno.col=c(gene, connectedGenes))
@
By using the plotting functionalities of the \CRANpkg{qtl} package \citep{Broman:2003fk} we
can examine the LOD score profile of these three genes on chromosome \Sexpr{chrom} where the
eQTL of gene \Sexpr{gene} is located:

<<lodprofiles, fig=TRUE, include=FALSE, echo=TRUE, results=verbatim, height=6, width=6>>=
plot(out.mr, chr=chrom, ylab="LOD score", lodcolumn=1:3, col=c("black", "blue", "red"), lwd=2)
abline(v=allcis[allcis$gene == gene, "location"])
legend("topleft", names(out.mr)[3:5], col=c("black", "blue", "red"), lwd=2, inset=0.05)
@
\begin{figure}
  \centerline{\includegraphics[width=0.6\textwidth]{qpgraphSimulate-lodprofiles}}
  \caption{Profile of LOD scores for three genes with direct and indirect eQTL.}
  \label{fig:lodprofiles}
\end{figure}

We can observe in Figure~\ref{fig:lodprofiles} that not only the directly associated gene
\Sexpr{gene} seems to have an eQTL at position \Sexpr{round(location, digits=1)} cM, but
also the genes \Sexpr{paste(connectedGenes, collapse=", ")} connected to \Sexpr{gene} in
the underlying gene network. Using a permutation procedure implemented in \CRANpkg{qtl},
we calculate LOD score threholds that yield genome-wide statistical significant eQTL associations:
<<>>=
out.perm <- scanone(cross, method="mr", pheno.col=c(gene, connectedGenes),
                    n.perm=100, verbose=FALSE)
summary(out.perm, alpha=c(0.05, 0.10))
@
and examine which genotype markers yield such significant associations to these the expression
profiles of these genes:

<<>>=
summary(out.mr, perms=out.perm, alpha=0.05)
@
Notice that the directly associated gene \Sexpr{gene} as well as the indirectly associated
ones \Sexpr{paste(connectedGenes, collapse=", ")} have genome-wide significant LOD scores
on the same eQTL located at \Sexpr{round(location, digits=1)} cM in chromosome \Sexpr{chrom}.

\section{Session information}

<<info, results=tex>>=
toLatex(sessionInfo())
@

\bibliography{qpgraph}
\bibliographystyle{apalike}

\end{document}