%\VignetteIndexEntry{bbl: Boltzmann Bayes Learner for High-Dimensional Inference with Discrete Predictors in R} %\VignetteDepends{BiocManager,Biostrings} %\documentclass[article]{jss} \documentclass[nojss]{jss} %% -- LaTeX packages and custom commands --------------------------------------- %% recommended packages \usepackage{thumbpdf,lmodern,amsmath,bbm} \DeclareMathOperator*{\argmax}{arg\:max} \DeclareMathOperator{\Tr}{Tr} %% another package (only for this demo article) %\usepackage{framed} %% new custom commands \newcommand{\class}[1]{`\code{#1}'} \newcommand{\fct}[1]{\code{#1()}} %% For Sweave-based articles about R packages: %% need no \usepackage{Sweave} \usepackage[noae]{Sweave} \SweaveOpts{engine=R, eps=FALSE, keep.source = TRUE} <>= options(prompt = "R> ", continue = "+ ", width = 70, useFancyQuotes = FALSE) @ %% -- Article metainformation (author, title, ...) ----------------------------- %% - \author{} with primary affiliation %% - \Plainauthor{} without affiliations %% - Separate authors by \And or \AND (in \author) or by comma (in \Plainauthor). %% - \AND starts a new line, \And does not. \author{Jun Woo\\University of Minnesota, Minneapolis \And Jinhua Wang\\University of Minnesota, Minneapolis} \Plainauthor{Jun Woo, Jinhua Wang} %% - \title{} in title case %% - \Plaintitle{} without LaTeX markup (if any) %% - \Shorttitle{} with LaTeX markup (if any), used as running title %\title{A Short Demo Article: Regression Models for Count Data in \proglang{R}} \title{\pkg{bbl}: Boltzmann Bayes Learner for High-Dimensional Inference with Discrete Predictors in \proglang{R}} \Plaintitle{bbl: Boltzmann Bayes Learner for High-Dimensional Inference with Discrete Predictors in R} \Shorttitle{\pkg{bbl}: Boltzmann Bayes Learner in \proglang{R}} %% - \Abstract{} almost as usual \Abstract{ Non-regression-based inferences, such as discriminant analysis, can account for the effect of predictor distributions that may be significant in big data modeling. We describe \pkg{bbl}, an \proglang{R} package for Boltzmann Bayes learning, which enables a comprehensive supervised learning of the association between a large number of categorical predictors and multi-level response variables. Its basic underlying statistical model is a collection of (fully visible) Boltzmann machines inferred for each distinct response level. The algorithm reduces to the naive Bayes learner when interaction is ignored. We illustrate example use cases for various scenarios, ranging from modeling of a relatively small set of factors with heterogeneous levels to those with hundreds or more predictors with uniform levels such as image or genomic data. We show how \pkg{bbl} explicitly quantifies the extra power provided by interactions via higher predictive performance of the model. In comparison to deep learning-based methods such as restricted Boltzmann machines, \pkg{bbl}-trained models can be interpreted directly via their bias and interaction parameters. } \Keywords{Supervised learning, Boltzmann machine, naive Bayes, discriminant analysis, \proglang{R}} \Plainkeywords{Supervised learning, Boltzmann machine, naive Bayes, discriminant analysis, R} %% - \Address{} of at least one author %% - May contain multiple affiliations for each author %% (in extra lines, separated by \emph{and}\\). %% - May contain multiple authors for the same affiliation %% (in the same first line, separated by comma). \Address{ Jun Woo\footnote{Current address: Memorial Sloan Kettering Cancer Center, New York, New York, USA} ({\it corresponding author}), Jinhua Wang\\ Institute for Health Informatics\\ \emph{and}\\ Masonic Cancer Center\\ University of Minnesota\\ Minneapolis, Minnesota, USA\\ E-mail: \email{wooh@mskcc.org} } \begin{document} \SweaveOpts{concordance=TRUE} %\section[Introduction: Count data regression in R]{Introduction: Count data regression in \proglang{R}} \label{sec:intro} \section{Introduction}\label{sec:intro} Many supervised learning tasks involve modeling discrete response variables $y$ using predictors ${\bf x}$ that can occupy categorical factor levels \citep{hastie_etal}. Ideally, it would be best to model the joint distribution $P({\bf x},y)$ via maximum likelihood, \begin{equation} {\hat \Theta} = \argmax_\Theta \left[\ln P({\bf x},y|\Theta)\right], \end{equation} to find parameters $\Theta$. Regression-based methods use $P({\bf x},y)=P(y|{\bf x})P({\bf x})\approx P(y|{\bf x})$. Many rigorous formal results known for regression coefficients facilitate interpretation of their significance. An alternative is to use $P({\bf x},y)=P({\bf x}|y)P(y)$ and fit $P({\bf x}|y)$. Since $y$ is low-dimensional, this approach could capture extra information not accessible from regression when there are many covarying predictors. To make predictions for $y$ using $P({\bf x}|y)$, one uses the Bayes' formula. Examples include linear and quadratic discriminant analyses \citep[pp.~106-119]{hastie_etal} for continuous ${\bf x}$. For discrete ${\bf x}$, naive Bayes is the simplest approach, where the covariance among ${\bf x}$ is ignored via \begin{equation} P({\bf x}|y)\approx \prod_i P(x_i|y) \label{eq:nbayes} \end{equation} with ${\bf x}=(x_1,\cdots,x_m)$. In this paper, we focus on supervised learners taking into account the high-dimensional nature of $P({\bf x}|y)$ beyond the naive Bayes-level description given by Eq.~(\ref{eq:nbayes}). Namely, a suitable parametrization is provided by the Boltzmann machine \citep{ackley_etal}, which for the simple binary predictor $x_i=0,1$, \begin{equation} P({\bf x}|y)=\frac{1}{Z_y}\exp\left(\sum_i h_i^{(y)}x_i + \sum_{i