%\VignetteIndexEntry{Using multiple time points to train logic models to data}
%\VignetteKeywords{}
%\VignettePackage{CNORdt}

\documentclass{article}
\usepackage{Sweave, fullpage}
\usepackage{url}

\title{Training of boolean logic models of signalling networks to time course data with \emph{CNORdt}}
\author{Aidan MacNamara}

\begin{document}
\maketitle

\tableofcontents

\section{Background}

This software is written in the R language, so in order to use it you will need to have R installed on your computer. For more information, please refer to \url{http://www.r-project.org/}. For more information about how to install R packages, please refer to \url{http://cran.r-project.org/doc/manuals/R-admin.html#Installing-packages}.

\emph{CNORdt} is an add-on to \emph{CellNOptR}~\cite{CellNOptR}, a software package that trains logic models to data~\cite{julio2009}. More details can be found using the following command to install:

<<installPackage, eval=FALSE>>=
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("CNORdt")
@

\section{CNORdt}

\emph{CNORdt} introduces a scaling parameter that defines the time scale of the Boolean synchronous simulation. Where each `tick' ($t$) (or simulation step) is the synchronous updating of all nodes in the model according to their inputs at $t-1$, the scaling parameter defines the tick frequency relative to the time scale of the real data. Although this is a crude approach (i.e.~it implies a single rate across all reactions), it allows us to fit a synchronous Boolean simulation to data. Hence, all data points can be fitted to the model and hyperedges that cause feedback in the model can be included, which allows the model to reveal more complex dynamics such as oscillations.

\section{Load Data}

The first step of the analysis with \emph{CNORdt} is to load the necessary libraries and the data:

<<Ropts, echo=FALSE, results=hide>>=
options(width=70)
@ 

<<loadLib, eval=TRUE>>=
library(CellNOptR)
library(CNORdt)
@

The model and data are then loaded. These are taken from MacNamara et al.~\cite{macnamara2012} and consist of a biologically realistic toy model based on the EGFR signaling pathway and \emph{in silico}-generated data:

<<getData, eval=TRUE>>=
data(CNOlistPB, package="CNORdt")
data(modelPB, package="CNORdt")
@

\section{Preprocessing}

The full details of preprocessing the model can be found in the \emph{CellNOptR} package (the vignette gives a comprehensive explanation). The following steps are taken:

<<preProcess, eval=TRUE>>=
# pre-process model
model = preprocessing(CNOlistPB, modelPB)
initBstring <- rep(1, length(model$reacID))
@

\section{Optimization}

This is where the difference between \emph{CNORdt} and its parent package \emph{CellNOptR} becomes visible. \emph{CellNOptR} fits the model to data at steady state~i.e. the simulation runs until all species are static. This gives a robust overview of the model behaviour but this formalism cannot model more complex dynamics such as oscillations. \emph{CNORdt} uses full time course data by scaling the boolean simulation. Hence the difference betweeen model and data can be calculated and optimized across all data points at not just a single one or two.

The other data that currently needs to be supplied in addition to the optimization parameters described are `boolUpdates' and the upper and lower bounds for the optimization of the scaling factor (`upperB' and `lowerB'). The variable `boolUpdates' controls the number of simulation steps (`ticks'). This will be set to an intelligent default value in future releases (it can be considered a function of model size and experimental time) but it is currently a manual entry. The upper and lower bounds control the range of values for the optimization of the scaling factor.

<<optimise, eval=TRUE>>=
opt1 <- gaBinaryDT(CNOlist=CNOlistPB, model=model, initBstring=initBstring,
verbose=FALSE, boolUpdates=10, maxTime=30, lowerB=0.8, upperB=10)
@

Visualize the result (figure~\ref{fig:fullPlot}):

<<plotData, eval=TRUE>>=
cutAndPlotResultsDT(
	model=model,
	CNOlist=CNOlistPB,
	bString=opt1$bString,
	plotPDF=FALSE,
	boolUpdates=10,
	lowerB=0.8,
	upperB=10
)
@

\begin{figure}[ht]
\begin{center}
\includegraphics[width=\textwidth]{CNORdt-vignette-plot}
\caption{The fit of the model at multiple time points. The training took 30 seconds.
}
\label{fig:fullPlot}
\end{center}
\end{figure}

\section{Post-Optimization}

As mentioned above, the difference between \emph{CNORdt} and \emph{CellNOptR} is in the simulation of the data. Hence, it is possible to use the range of post-optimization functions available in \emph{CellNOptR} to view, for example, the evolution of fit during optimization (\emph{plotFit}), or map the optimized network onto the input network or PKN (\emph{writeNetwork}). Please see the \emph{CellNOptR} package for more details about the following:

<<writeRes, eval=TRUE>>=
writeScaffold(
	modelComprExpanded=model,
	optimResT1=opt1,
	optimResT2=NA,
	modelOriginal=modelPB,
	CNOlist=CNOlistPB
)
	
writeNetwork(
	modelOriginal=modelPB,
	modelComprExpanded=model,
	optimResT1=opt1,
	CNOlist=CNOlistPB
)
@

\begin{thebibliography}{1}

\bibitem{CellNOptR}
C.~Terfve.
\newblock CellNOptR: R version of CellNOpt, boolean features only.
\newblock {em R package version 0.99.14, (2012)
http://www.bioconductor.org/packages/release/bioc/html/CellNOptR.html}

\bibitem{julio2009}
J.~Saez-Rodriguez, L.~Alexopoulos, J.~Epperlein, R.~Samaga, D.~Lauffenburger, S.~Klamt and P.K.~Sorger.
\newblock Discrete logic modeling as a means to link protein signaling networks with functional analysis of mammalian signal transduction.
\newblock {\em Molecular Systems Biology}, 5:331, 2009.

\bibitem{macnamara2012}
A.~MacNamara, C.~Terfve, D.~Henriques, B.~Pe\~nalver Bernab\'e and J.~Saez-Rodriguez.
\newblock State-time spectrum of signal transduction logic models.
\newblock {\em Physical Biology}, 2012, 9(4), p.045003.

\end{thebibliography}

\end{document}