\documentclass[article,nojss]{jss}

%% -- LaTeX packages and custom commands ---------------------------------------

%% recommended packages
\usepackage{thumbpdf,lmodern}

%% other packages
\usepackage{amsmath}
\usepackage[T1]{fontenc}
\usepackage{makecell}

%% define colors
\definecolor{graySPSS}{gray}{0.87}
\definecolor{lightgraySPSS}{RGB}{250,250,252}
\definecolor{darkgraySPSS}{gray}{0.66}
\definecolor{blueSPSS}{RGB}{39,73,96}


%% -- Article metainformation (author, title, ...) -----------------------------

%% - \author{} with primary affiliation
%% - \Plainauthor{} without affiliations
%% - Separate authors by \And or \AND (in \author) or by comma (in \Plainauthor).
%% - \AND starts a new line, \And does not.
\author{Andreas Alfons \\ Erasmus University Rotterdam}
\Plainauthor{Andreas Alfons}

%% - \title{} in title case
%% - \Plaintitle{} without LaTeX markup (if any)
%% - \Shorttitle{} with LaTeX markup (if any), used as running title
\title{\pkg{r2spss}: Format \proglang{R} Output to Look Like \proglang{SPSS}}
\Plaintitle{r2spss: Format R Output to Look Like SPSS}
\Shorttitle{\pkg{r2spss}: Format \proglang{R} Output to Look Like \proglang{SPSS}}

%% - \Abstract{} almost as usual
\Abstract{
The \proglang{R} package \pkg{r2spss} allows to create plots and
\proglang{LaTeX} tables that look like \proglang{SPSS} output for use in
teaching materials.  Rather than copying-and-pasting \proglang{SPSS} output
into documents, \proglang{R} code that mocks up \proglang{SPSS} output can be
integrated directly into dynamic \proglang{LaTeX} documents with tools such
as the \proglang{R} package \pkg{knitr}.  Package \pkg{r2spss} provides
functionality for statistical techniques that are typically covered in
introductory statistics courses: descriptive statistics, common hypothesis
tests, ANOVA, and linear regression, as well as box plots, histograms,
scatter plots, and line plots (including profile plots).
}

%% - \Keywords{} with LaTeX markup, at least one required
%% - \Plainkeywords{} without LaTeX markup (if necessary)
%% - Should be comma-separated and in sentence case.
\Keywords{\proglang{R}, \proglang{SPSS}, statistics, teaching}
\Plainkeywords{R, SPSS, statistics, teaching}

%% - \Address{} of at least one author
%% - May contain multiple affiliations for each author
%%   (in extra lines, separated by \emph{and}\\).
%% - May contain multiple authors for the same affiliation
%%   (in the same first line, separated by comma).
\Address{
  Andreas Alfons\\
  Econometric Institute\\
  Erasmus School of Economics\\
  Erasmus University Rotterdam\\
  PO Box 1738\\
  3000DR Rotterdam, The Netherlands\\
  E-mail: \email{alfons@ese.eur.nl}\\
  URL: \url{https://personal.eur.nl/alfons/}
}

%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{r2spss: Format R Output to Look Like SPSS}
%\VignetteDepends{r2spss}
%\VignetteKeywords{R, SPSS, statistics, teaching}
%\VignettePackage{r2spss}


\begin{document}


% -------------
% knitr options
% -------------

<<include=FALSE>>=
library("knitr")
options(prompt="R> ", continue = "+  ", width = 75, useFancyQuotes = FALSE)
opts_chunk$set(fig.path = "figures/figure-", fig.width = 4.5, fig.height = 4.5,
               out.width = "0.5\\textwidth", fig.align = "center",
               fig.lp = "fig:", fig.pos = "h!", tidy = FALSE)
render_sweave()             # use Sweave environments
set_header(highlight = "")  # do not use the Sweave.sty package
@


%% -- Introduction -------------------------------------------------------------

%% - In principle "as usual".
%% - But should typically have some discussion of both _software_ and _methods_.
%% - Use \proglang{}, \pkg{}, and \code{} markup throughout the manuscript.
%% - If such markup is in (sub)section titles, a plain text version has to be
%%   added as well.
%% - All software mentioned should be properly \cite-d.
%% - All abbreviations should be introduced.
%% - Unless the expansions of abbreviations are proper names (like "Journal
%%   of Statistical Software" above) they should be in sentence case (like
%%   "generalized linear models" below).

\section{Introduction} \label{sec:intro}

Many academic programs in the behavioral and social sciences require to teach
statistics with \proglang{SPSS} \citep{SPSS}.  Preparing teaching materials in
this case typically involves copying-and-pasting \proglang{SPSS} output into
documents or slides, which is cumbersome and prone to errors.  Moreover, this
approach is not scalable for regular updates of the materials, or for
individualizing assignments and exams in order to combat fraud.  On the other
hand, tools such as package \pkg{knitr} \citep{xie15, knitr} for integrating
the statistical computing environment \proglang{R} \citep{R} and the document
preparation system \proglang{LaTeX} \citep[e.g.,][]{LaTeX} make preparing
teaching materials easier, less error-prone, and more scalable.  There are even
specialized tools such as package \pkg{exams} \citep{gruen09, zeileis14, exams}
that allow assignments and exams to be individualized in a scalable manner.
Package \pkg{r2spss} \citep{r2spss} makes it possible to leverage those
developments for creating teaching materials with \proglang{SPSS} output by
mocking up such output with \proglang{R}.


%% -- Manuscript ---------------------------------------------------------------

%% - In principle "as usual" again.
%% - When using equations (e.g., {equation}, {eqnarray}, {align}, etc.
%%   avoid empty lines before and after the equation (which would signal a new
%%   paragraph.
%% - When describing longer chunks of code that are _not_ meant for execution
%%   (e.g., a function synopsis or list of arguments), the environment {Code}
%%   is recommended. Alternatively, a plain {verbatim} can also be used.
%%   (For executed code see the next section.)


% ------------------------------------
% LaTeX requirements and knitr options
% ------------------------------------

\section[LaTeX documents containing output from r2spss]%
{\proglang{LaTeX} documents containing output from \pkg{r2spss}}

We first load the package to discuss its main functionality to generate
\proglang{LaTeX} tables.
%
<<results="hide", message=FALSE, warning=FALSE>>=
library("r2spss")
@


\subsection[LaTeX requirements]{\proglang{LaTeX} requirements}

\proglang{LaTeX} tables created with package \pkg{r2spss} build upon several
\proglang{LaTeX} packages.  A \proglang{LaTeX} style file that includes all
requirements can be produced with function \code{r2spss.sty()}.  By default,
it prints the content of the style file on the \proglang{R} console, but its
only argument \code{path} can be used to specify the path to a folder in
which to put the file \emph{r2spss.sty}.  For instance, the following command
can be used to put the style file in the current working directory.
%
<<eval=FALSE>>=
r2spss.sty(path = ".")
@

After putting the style file in the folder that contains your \proglang{LaTeX}
document, the following command should be included in the preamble of your
\proglang{LaTeX} document, i.e., somewhere in between \verb+\documentclass{}+
and \verb+\begin{document}+.
\begin{Code}
\usepackage{r2spss}
\end{Code}


\subsection[Workhorse functions to create LaTeX tables with r2spss]%
{Workhorse functions to create \proglang{LaTeX} tables with \pkg{r2spss}}

Functions in package \pkg{r2spss} create certain \proglang{R} objects, whose
\code{print()} method prints the \proglang{LaTeX} tables that mimic the
corresponding \proglang{SPSS} output.  Essentially, such a \code{print()}
method first calls function \code{to_SPSS()}, which produces an object of class
\code{"SPSS_table"}.  Its component \code{table} contains a data frame of the
results in SPSS format.  Other components of the object contain any necessary
additional information of the SPSS table, such as the main title, the header
layout, or footnotes.  Afterwards, the \code{print()} method calls function
\code{to_latex()} with the \code{"SPSS_table"} object to print the
\proglang{LaTeX} table.

These two function can also be called separately by the user, which allows
for further customization of the \proglang{LaTeX} tables.  Some examples
can be found in the help file of \code{to_SPSS()} or \code{to_latex()},
which can be accessed from the \proglang{R} console with \code{?to_SPSS} and
\code{?to_latex}, respectively.  In addition, the \code{"data.frame"} method
of \code{to_latex()} allows to extend the functionality of \pkg{r2spss} with
additional \proglang{LaTeX} tables that mimic the look of \proglang{SPSS}
output.

Package \pkg{r2spss} can create output that mimics the look of current
\proglang{SPSS} versions, as well as the look of older versions.  The above
mentioned functions contain the argument \code{version} for specifying which
type of output to create.  Possible values are \code{"modern"} to mimic recent
versions and \code{"legacy"} to mimic older versions.  \proglang{LaTeX} tables
that mimic the look of recent \proglang{SPSS} version thereby build upon the
\proglang{LaTeX} package \pkg{nicematrix} \citep{nicematrix} and its
\code{NiceTabular} environment, which is preferred for its seamless display of
background colors in the table.

However, \pkg{r2spss} requires \pkg{nicematrix} version 6.5 (2022-01-23) or
later.  It is also important to note that tables using the \code{NiceTabular}
environment may require several \proglang{LaTeX} compilations to be displayed
correctly.


\subsection{Global package options}

Package \pkg{r2spss} allows to set global options within the current
\proglang{R} session, which can be read and modified with the accessor
functions \code{r2spss_options$get()} and \code{r2spss_options$set()},
respectively.  Most importantly, the option \code{version} controls the
default for whether tables and plots should mimic the content and look
of recent \proglang{SPSS} versions (\code{"modern"}) or older versions
(\code{"legacy"}).

\proglang{SPSS} tables by default include horizontal grid lines in between all
rows, which in particular in the look of older \proglang{SPSS} versions can be
distracting from the content of the tables.  Package \pkg{r2spss} therefore
distinguishes between major and minor grid lines in tables.  Minor grid lines
can easily be suppressed by setting the global option \code{minor} to
\code{FALSE}, which increases the readability of the tables while still
closely mimicking the look of \proglang{SPSS}.

For portability reasons, this vignette only displays \proglang{LaTeX} tables
that mimic the simpler look of older \proglang{SPSS} versions, but with minor
grid lines removed.  This is realized by setting global options with the
following command.
%
<<>>=
r2spss_options$set(version = "legacy", minor = FALSE)
@


\subsection[Dynamic documents and knitr options]%
{Dynamic documents and \pkg{knitr} options}

Package \pkg{r2spss} is the most useful when writing dynamic \proglang{LaTeX}
documents with tools such as the \proglang{R} package \pkg{knitr}
\citep{xie15, knitr}.  When creating \proglang{LaTeX} tables in \proglang{R}
code chunks with \pkg{knitr}, the output of the chunk should be written
directly into the output document by setting the chunk option
\verb+results='asis'+.  For more information on \pkg{knitr} chunk options,
in particular various options for figures, please see
\url{https://yihui.org/knitr/options/}.


%% -- Illustrations ------------------------------------------------------------

%% - Virtually all JSS manuscripts list source code along with the generated
%%   output. The style files provide dedicated environments for this.
%% - In R, the environments {Sinput} and {Soutput} - as produced by Sweave() or
%%   or knitr using the render_sweave() hook - are used (without the need to
%%   load Sweave.sty).
%% - Equivalently, {CodeInput} and {CodeOutput} can be used.
%% - The code input should use "the usual" command prompt in the respective
%%   software system.
%% - For R code, the prompt "R> " should be used with "+  " as the
%%   continuation prompt.
%% - Comments within the code chunks should be avoided - these should be made
%%   within the regular LaTeX text.


% -------------
% illustrations
% -------------

\section[Illustrations: Using package r2spss]%
{Illustrations: Using package \pkg{r2spss}} \label{sec:illustrations}

Several examples showcase the functionality of \pkg{r2spss} to mock up
\proglang{SPSS} tables and graphics.


\subsection{Example data sets}

The following two data sets from package \pkg{r2spss} will be used to
illustrate its functionality: \code{Eredivisie} and \code{Exams}.  The former
contains information on all football players in the Dutch Eredivisie, the
highest men's football league in the Netherlands, who played at least one match
in the 2013-14 season.  The latter contains grades for an applied statistics
course at Erasmus University Rotterdam for students who took both the regular
exam and the resit.
%
<<results="hide", message=FALSE, warning=FALSE>>=
data("Eredivisie")
data("Exams")
@

Among other information, the \code{Eredivisie} data contain the market values
of the football players.  In many examples, we will use the logarithm of the
market values rather that the market values themselves, so we add those to the
data set.
%
<<>>=
Eredivisie$logMarketValue <- log(Eredivisie$MarketValue)
@


\subsection{Descriptive statistics and plots}

Descriptive statistics can be produced with function \code{descriptives()},
for example of the age, minutes played, and logarithm of market value of
football players in the \code{Eredivisie} data.
%
\begin{center}
<<results="asis">>=
descriptives(Eredivisie, c("Age", "Minutes", "logMarketValue"))
@
\end{center}

Functions \code{histogram()} and \code{box_plot()} can be used to create a
histogram or box plot, respectively, of a specified variable.
<<>>=
histogram(Eredivisie, "logMarketValue")
@
%
<<>>=
box_plot(Eredivisie, "logMarketValue")
@

A scatter plot or scatter plot matrix can be produced with function
\code{scatter_plot()} by specifying the corresponding variables.
<<>>=
scatter_plot(Eredivisie, c("Age", "logMarketValue"))
@
%
<<>>=
scatter_plot(Eredivisie, c("Age", "Minutes", "logMarketValue"))
@

\subsection{Analyzing one sample}

With the \code{Exams} data, we can perform a one-sample $t$ test on whether the
average grade on the resit exam differs from 5.5, which is the minimum passing
grade in the Netherlands.  For this purpose, we can use function \code{t_test()}
with a single variable as well as the value under the null-hypothesis.
%
\begin{center}
<<results="asis">>=
t_test(Exams, "Resit", mu = 5.5)
@
\end{center}


\subsection{Analyzing paired observations}

Similarly, we can perform a paired-sample $t$ test on whether the average
grades differ between the regular exam and the resit by supplying the two
corresponding variables to function \code{t_test()}.
%
\begin{center}
<<results="asis">>=
t_test(Exams, c("Resit", "Regular"))
@
\end{center}

As nonparametric alternatives, we can perform a Wilcoxon signed rank test
with function \code{wilcoxon_test()} or a sign test with function
\code{sign_test()}.
%
\begin{center}
<<results="asis">>=
wilcoxon_test(Exams, c("Regular", "Resit"))
@
\end{center}
%
\begin{center}
<<results="asis">>=
sign_test(Exams, c("Regular", "Resit"))
@
\end{center}

Note that the order of the variables in the nonparametric test is reversed
compared to the paired-sample $t$ test, but all three tests compute the
differences in the form \code{Resit - Regular}.  This behavior is carried over
from \proglang{SPSS}.

To check which of these tests are suitable for the given data, we can for
example use a box plot.  Function \code{box_plot()} allows to specify
multiple variables to be plotted.
%
<<>>=
box_plot(Exams, c("Regular", "Resit"))
@


\subsection{Comparing two groups}

An independent-samples $t$ test can be performed with function \code{t_test()}
by specifying the numeric variable of interest as well as a grouping variable.
As an example, we test whether the average log market values differ between
Dutch and foreign football players.
%
\begin{center}
<<eval=FALSE>>=
t_test(Eredivisie, "logMarketValue", group = "Foreign")
@
\footnotesize
<<echo=FALSE, results="asis">>=
t_test(Eredivisie, "logMarketValue", group = "Foreign")
@
\end{center}

As a nonparametric alternative, we can perform a Wilcoxon rank sum test
with function \code{wilcoxon_test()} in a similar manner.  Note that it is
not necessary to use the logarithms of the market values here, as this test
works with ranks instead of the observed values.
%
\begin{center}
<<results="asis">>=
wilcoxon_test(Eredivisie, "MarketValue", group = "Foreign")
@
\end{center}

We can again use a box plot to check whether the $t$ test is  suitable for the
given data, as function \code{box_plot()} allows to specify a grouping
variable as well.
%
<<>>=
box_plot(Eredivisie, "logMarketValue", group = "Foreign")
@


\subsection{Comparing multiple groups}

For comparing the means of multiple groups, one-way ANOVA can be performed with
function \code{ANOVA()}.  Here we test whether there are differences among
the average log market values for players on different positions.
%
\begin{center}
<<results="asis">>=
oneway <- ANOVA(Eredivisie, "logMarketValue", group = "Position")
oneway
@
\end{center}

The \code{plot()} method for the resulting object produces a profile plot.
%
<<>>=
plot(oneway)
@

A nonparametric alternative based on ranks is the Kruskal-Wallis test, which
can be applied with function \code{kruskal_test()}.  It is again not necessary
to use the logarithms of the market values for this test.
%
\begin{center}
<<results="asis">>=
kruskal_test(Eredivisie, "MarketValue", group = "Position")
@
\end{center}

Similarly, two-way ANOVA can be performed by supplying two grouping variables
to function \code{ANOVA()}.
%
\begin{center}
<<results="asis">>=
twoway <- ANOVA(Eredivisie, "logMarketValue",
                group = c("Position", "Foreign"))
twoway
@
\end{center}

We can again produce a profile plot with the \code{plot()} method for the
resulting object.  Argument \code{which} can be used to specify which of the
two grouping variables should be used on the $x$-axis of the profile plot, with
the default being the first grouping variable.
%
<<fig.width=6, out.width = "0.7\\textwidth">>=
plot(twoway)
@

The \code{plot()} method illustrated works similarly to function
\code{line_plot()}.  The latter is more generally applicable and
can also be used, e.g., for plotting time series.


\subsection[Chi-squared tests]{$\chi^{2}$ tests}

Function \code{chisq_test()} implements $\chi^{2}$ goodness-of-fit tests and
$\chi^{2}$ tests on independence.  With the \code{Eredivisie} data, we can
first perform a goodness-of-fit test to see whether the traditional Dutch 4-3-3
system of total football is still reflected in player composition of Dutch
football teams.  In other words, we test for a multinomial distribution of
variable \code{Position} with the probabilities $1/11$, $4/11$, $3/11$, and
$3/11$ for goalkeepers, defenders, midfielders, and forwards, respectively.
%
\begin{center}
<<results="asis">>=
chisq_test(Eredivisie, "Position", p = c(1, 4, 3, 3)/11)
@
\end{center}

Furthermore, we can test whether the categorical variables \code{Position} and
\code{Foreign} are independent, i.e., whether the proportions of Dutch and
foreign players are the same for all playing positions.
\begin{center}
<<results="asis">>=
chisq_test(Eredivisie, c("Position", "Foreign"))
@
\end{center}


\subsection{Linear regression}

In this section, we compare two regression models to explain the log market
values of football players.  The first model uses only the player's age as a
linear and a squared effect, while the second model adds the remaining contract
length and a dummy variable for foreign players.

We first add the squared values of age to the data set.
%
<<>>=
Eredivisie$AgeSq <- Eredivisie$Age^2
@

We then estimate the regression models with function \code{regression()}.  As
usual in \proglang{R}, we specify the regression models with formulas.
%
\begin{center}
<<results="asis">>=
fit <- regression(logMarketValue ~ Age + AgeSq,
                  logMarketValue ~ Age + AgeSq + Contract + Foreign,
                  data = Eredivisie)
fit
@
\end{center}

If we only want to print the table containing the model summaries, we can use
the argument \code{statistics} of the \code{print()} method.  In addition,
argument \code{change} can be set to \code{TRUE} in order to include a test
on the change in $R^{2}$ from one model to the next.

\begin{center}
<<results="asis">>=
print(fit, statistics = "summary", change = TRUE)
@
\end{center}

Of course, all \code{print()} methods for objects returned by functions from
package \pkg{r2spss} allow to select which tables to print.  See the respective
help files for details.

The \code{plot()} method of the regression results can be used to create a
histogram of the residuals or a scatter plot of the standardized residuals
against the standardized fitted values.  Argument \code{which} can be used
to select between those two plots.  Mimicking \proglang{SPSS} functionality,
the plot is created for the \emph{last} specified model in the call to
\code{regression()}.
%
<<>>=
plot(fit, which = "histogram")
@
%
<<>>=
plot(fit, which = "scatter")
@


%% -- Summary/conclusions/discussion -------------------------------------------

% \section{Conclusions} \label{sec:summary}


%% -- Optional special unnumbered sections -------------------------------------

% \section*{Computational details}

% \begin{leftbar}
% If necessary or useful, information about certain computational details
% such as version numbers, operating systems, or compilers could be included
% in an unnumbered section. Also, auxiliary packages (say, for visualizations,
% maps, tables, \dots) that are not cited in the main text can be credited here.
% \end{leftbar}


% \section*{Acknowledgments}

% \begin{leftbar}
% All acknowledgments (note the AE spelling) should be collected in this
% unnumbered section before the references. It may contain the usual information
% about funding and feedback from colleagues/reviewers/etc. Furthermore,
% information such as relative contributions of the authors may be added here
% (if any).
% \end{leftbar}

% Andreas Alfons is supported by a grant of the Dutch Research Council (NWO),
% research program Vidi, project number \mbox{VI.Vidi.195.141}.


%% -- Bibliography -------------------------------------------------------------
%% - References need to be provided in a .bib BibTeX database.
%% - All references should be made with \cite, \citet, \citep, \citealp etc.
%%   (and never hard-coded). See the FAQ for details.
%% - JSS-specific markup (\proglang, \pkg, \code) should be used in the .bib.
%% - Titles in the .bib should be in title case.
%% - DOIs should be included where available.

\bibliography{r2spss}


%% -- Appendix (if any) --------------------------------------------------------
%% - After the bibliography with page break.
%% - With proper section titles and _not_ just "Appendix".

% \newpage
%
% \begin{appendix}
%
% \section{}
% \label{app:}
%
% \end{appendix}

%% -----------------------------------------------------------------------------


\end{document}