\documentclass[article,nojss]{jss} %% -- LaTeX packages and custom commands --------------------------------------- %% recommended packages \usepackage{thumbpdf,lmodern} %% other packages \usepackage{amsmath} \usepackage[T1]{fontenc} \usepackage{makecell} %% define colors \definecolor{graySPSS}{gray}{0.87} \definecolor{lightgraySPSS}{RGB}{250,250,252} \definecolor{darkgraySPSS}{gray}{0.66} \definecolor{blueSPSS}{RGB}{39,73,96} %% -- Article metainformation (author, title, ...) ----------------------------- %% - \author{} with primary affiliation %% - \Plainauthor{} without affiliations %% - Separate authors by \And or \AND (in \author) or by comma (in \Plainauthor). %% - \AND starts a new line, \And does not. \author{Andreas Alfons \\ Erasmus University Rotterdam} \Plainauthor{Andreas Alfons} %% - \title{} in title case %% - \Plaintitle{} without LaTeX markup (if any) %% - \Shorttitle{} with LaTeX markup (if any), used as running title \title{\pkg{r2spss}: Format \proglang{R} Output to Look Like \proglang{SPSS}} \Plaintitle{r2spss: Format R Output to Look Like SPSS} \Shorttitle{\pkg{r2spss}: Format \proglang{R} Output to Look Like \proglang{SPSS}} %% - \Abstract{} almost as usual \Abstract{ The \proglang{R} package \pkg{r2spss} allows to create plots and \proglang{LaTeX} tables that look like \proglang{SPSS} output for use in teaching materials. Rather than copying-and-pasting \proglang{SPSS} output into documents, \proglang{R} code that mocks up \proglang{SPSS} output can be integrated directly into dynamic \proglang{LaTeX} documents with tools such as the \proglang{R} package \pkg{knitr}. Package \pkg{r2spss} provides functionality for statistical techniques that are typically covered in introductory statistics courses: descriptive statistics, common hypothesis tests, ANOVA, and linear regression, as well as box plots, histograms, scatter plots, and line plots (including profile plots). } %% - \Keywords{} with LaTeX markup, at least one required %% - \Plainkeywords{} without LaTeX markup (if necessary) %% - Should be comma-separated and in sentence case. \Keywords{\proglang{R}, \proglang{SPSS}, statistics, teaching} \Plainkeywords{R, SPSS, statistics, teaching} %% - \Address{} of at least one author %% - May contain multiple affiliations for each author %% (in extra lines, separated by \emph{and}\\). %% - May contain multiple authors for the same affiliation %% (in the same first line, separated by comma). \Address{ Andreas Alfons\\ Econometric Institute\\ Erasmus School of Economics\\ Erasmus University Rotterdam\\ PO Box 1738\\ 3000DR Rotterdam, The Netherlands\\ E-mail: \email{alfons@ese.eur.nl}\\ URL: \url{https://personal.eur.nl/alfons/} } %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{r2spss: Format R Output to Look Like SPSS} %\VignetteDepends{r2spss} %\VignetteKeywords{R, SPSS, statistics, teaching} %\VignettePackage{r2spss} \begin{document} % ------------- % knitr options % ------------- <>= library("knitr") options(prompt="R> ", continue = "+ ", width = 75, useFancyQuotes = FALSE) opts_chunk$set(fig.path = "figures/figure-", fig.width = 4.5, fig.height = 4.5, out.width = "0.5\\textwidth", fig.align = "center", fig.lp = "fig:", fig.pos = "h!", tidy = FALSE) render_sweave() # use Sweave environments set_header(highlight = "") # do not use the Sweave.sty package @ %% -- Introduction ------------------------------------------------------------- %% - In principle "as usual". %% - But should typically have some discussion of both _software_ and _methods_. %% - Use \proglang{}, \pkg{}, and \code{} markup throughout the manuscript. %% - If such markup is in (sub)section titles, a plain text version has to be %% added as well. %% - All software mentioned should be properly \cite-d. %% - All abbreviations should be introduced. %% - Unless the expansions of abbreviations are proper names (like "Journal %% of Statistical Software" above) they should be in sentence case (like %% "generalized linear models" below). \section{Introduction} \label{sec:intro} Many academic programs in the behavioral and social sciences require to teach statistics with \proglang{SPSS} \citep{SPSS}. Preparing teaching materials in this case typically involves copying-and-pasting \proglang{SPSS} output into documents or slides, which is cumbersome and prone to errors. Moreover, this approach is not scalable for regular updates of the materials, or for individualizing assignments and exams in order to combat fraud. On the other hand, tools such as package \pkg{knitr} \citep{xie15, knitr} for integrating the statistical computing environment \proglang{R} \citep{R} and the document preparation system \proglang{LaTeX} \citep[e.g.,][]{LaTeX} make preparing teaching materials easier, less error-prone, and more scalable. There are even specialized tools such as package \pkg{exams} \citep{gruen09, zeileis14, exams} that allow assignments and exams to be individualized in a scalable manner. Package \pkg{r2spss} \citep{r2spss} makes it possible to leverage those developments for creating teaching materials with \proglang{SPSS} output by mocking up such output with \proglang{R}. %% -- Manuscript --------------------------------------------------------------- %% - In principle "as usual" again. %% - When using equations (e.g., {equation}, {eqnarray}, {align}, etc. %% avoid empty lines before and after the equation (which would signal a new %% paragraph. %% - When describing longer chunks of code that are _not_ meant for execution %% (e.g., a function synopsis or list of arguments), the environment {Code} %% is recommended. Alternatively, a plain {verbatim} can also be used. %% (For executed code see the next section.) % ------------------------------------ % LaTeX requirements and knitr options % ------------------------------------ \section[LaTeX documents containing output from r2spss]% {\proglang{LaTeX} documents containing output from \pkg{r2spss}} We first load the package to discuss its main functionality to generate \proglang{LaTeX} tables. % <>= library("r2spss") @ \subsection[LaTeX requirements]{\proglang{LaTeX} requirements} \proglang{LaTeX} tables created with package \pkg{r2spss} build upon several \proglang{LaTeX} packages. A \proglang{LaTeX} style file that includes all requirements can be produced with function \code{r2spss.sty()}. By default, it prints the content of the style file on the \proglang{R} console, but its only argument \code{path} can be used to specify the path to a folder in which to put the file \emph{r2spss.sty}. For instance, the following command can be used to put the style file in the current working directory. % <>= r2spss.sty(path = ".") @ After putting the style file in the folder that contains your \proglang{LaTeX} document, the following command should be included in the preamble of your \proglang{LaTeX} document, i.e., somewhere in between \verb+\documentclass{}+ and \verb+\begin{document}+. \begin{Code} \usepackage{r2spss} \end{Code} \subsection[Workhorse functions to create LaTeX tables with r2spss]% {Workhorse functions to create \proglang{LaTeX} tables with \pkg{r2spss}} Functions in package \pkg{r2spss} create certain \proglang{R} objects, whose \code{print()} method prints the \proglang{LaTeX} tables that mimic the corresponding \proglang{SPSS} output. Essentially, such a \code{print()} method first calls function \code{to_SPSS()}, which produces an object of class \code{"SPSS_table"}. Its component \code{table} contains a data frame of the results in SPSS format. Other components of the object contain any necessary additional information of the SPSS table, such as the main title, the header layout, or footnotes. Afterwards, the \code{print()} method calls function \code{to_latex()} with the \code{"SPSS_table"} object to print the \proglang{LaTeX} table. These two function can also be called separately by the user, which allows for further customization of the \proglang{LaTeX} tables. Some examples can be found in the help file of \code{to_SPSS()} or \code{to_latex()}, which can be accessed from the \proglang{R} console with \code{?to_SPSS} and \code{?to_latex}, respectively. In addition, the \code{"data.frame"} method of \code{to_latex()} allows to extend the functionality of \pkg{r2spss} with additional \proglang{LaTeX} tables that mimic the look of \proglang{SPSS} output. Package \pkg{r2spss} can create output that mimics the look of current \proglang{SPSS} versions, as well as the look of older versions. The above mentioned functions contain the argument \code{version} for specifying which type of output to create. Possible values are \code{"modern"} to mimic recent versions and \code{"legacy"} to mimic older versions. \proglang{LaTeX} tables that mimic the look of recent \proglang{SPSS} version thereby build upon the \proglang{LaTeX} package \pkg{nicematrix} \citep{nicematrix} and its \code{NiceTabular} environment, which is preferred for its seamless display of background colors in the table. However, \pkg{r2spss} requires \pkg{nicematrix} version 6.5 (2022-01-23) or later. It is also important to note that tables using the \code{NiceTabular} environment may require several \proglang{LaTeX} compilations to be displayed correctly. \subsection{Global package options} Package \pkg{r2spss} allows to set global options within the current \proglang{R} session, which can be read and modified with the accessor functions \code{r2spss_options$get()} and \code{r2spss_options$set()}, respectively. Most importantly, the option \code{version} controls the default for whether tables and plots should mimic the content and look of recent \proglang{SPSS} versions (\code{"modern"}) or older versions (\code{"legacy"}). \proglang{SPSS} tables by default include horizontal grid lines in between all rows, which in particular in the look of older \proglang{SPSS} versions can be distracting from the content of the tables. Package \pkg{r2spss} therefore distinguishes between major and minor grid lines in tables. Minor grid lines can easily be suppressed by setting the global option \code{minor} to \code{FALSE}, which increases the readability of the tables while still closely mimicking the look of \proglang{SPSS}. For portability reasons, this vignette only displays \proglang{LaTeX} tables that mimic the simpler look of older \proglang{SPSS} versions, but with minor grid lines removed. This is realized by setting global options with the following command. % <<>>= r2spss_options$set(version = "legacy", minor = FALSE) @ \subsection[Dynamic documents and knitr options]% {Dynamic documents and \pkg{knitr} options} Package \pkg{r2spss} is the most useful when writing dynamic \proglang{LaTeX} documents with tools such as the \proglang{R} package \pkg{knitr} \citep{xie15, knitr}. When creating \proglang{LaTeX} tables in \proglang{R} code chunks with \pkg{knitr}, the output of the chunk should be written directly into the output document by setting the chunk option \verb+results='asis'+. For more information on \pkg{knitr} chunk options, in particular various options for figures, please see \url{https://yihui.org/knitr/options/}. %% -- Illustrations ------------------------------------------------------------ %% - Virtually all JSS manuscripts list source code along with the generated %% output. The style files provide dedicated environments for this. %% - In R, the environments {Sinput} and {Soutput} - as produced by Sweave() or %% or knitr using the render_sweave() hook - are used (without the need to %% load Sweave.sty). %% - Equivalently, {CodeInput} and {CodeOutput} can be used. %% - The code input should use "the usual" command prompt in the respective %% software system. %% - For R code, the prompt "R> " should be used with "+ " as the %% continuation prompt. %% - Comments within the code chunks should be avoided - these should be made %% within the regular LaTeX text. % ------------- % illustrations % ------------- \section[Illustrations: Using package r2spss]% {Illustrations: Using package \pkg{r2spss}} \label{sec:illustrations} Several examples showcase the functionality of \pkg{r2spss} to mock up \proglang{SPSS} tables and graphics. \subsection{Example data sets} The following two data sets from package \pkg{r2spss} will be used to illustrate its functionality: \code{Eredivisie} and \code{Exams}. The former contains information on all football players in the Dutch Eredivisie, the highest men's football league in the Netherlands, who played at least one match in the 2013-14 season. The latter contains grades for an applied statistics course at Erasmus University Rotterdam for students who took both the regular exam and the resit. % <>= data("Eredivisie") data("Exams") @ Among other information, the \code{Eredivisie} data contain the market values of the football players. In many examples, we will use the logarithm of the market values rather that the market values themselves, so we add those to the data set. % <<>>= Eredivisie$logMarketValue <- log(Eredivisie$MarketValue) @ \subsection{Descriptive statistics and plots} Descriptive statistics can be produced with function \code{descriptives()}, for example of the age, minutes played, and logarithm of market value of football players in the \code{Eredivisie} data. % \begin{center} <>= descriptives(Eredivisie, c("Age", "Minutes", "logMarketValue")) @ \end{center} Functions \code{histogram()} and \code{box_plot()} can be used to create a histogram or box plot, respectively, of a specified variable. <<>>= histogram(Eredivisie, "logMarketValue") @ % <<>>= box_plot(Eredivisie, "logMarketValue") @ A scatter plot or scatter plot matrix can be produced with function \code{scatter_plot()} by specifying the corresponding variables. <<>>= scatter_plot(Eredivisie, c("Age", "logMarketValue")) @ % <<>>= scatter_plot(Eredivisie, c("Age", "Minutes", "logMarketValue")) @ \subsection{Analyzing one sample} With the \code{Exams} data, we can perform a one-sample $t$ test on whether the average grade on the resit exam differs from 5.5, which is the minimum passing grade in the Netherlands. For this purpose, we can use function \code{t_test()} with a single variable as well as the value under the null-hypothesis. % \begin{center} <>= t_test(Exams, "Resit", mu = 5.5) @ \end{center} \subsection{Analyzing paired observations} Similarly, we can perform a paired-sample $t$ test on whether the average grades differ between the regular exam and the resit by supplying the two corresponding variables to function \code{t_test()}. % \begin{center} <>= t_test(Exams, c("Resit", "Regular")) @ \end{center} As nonparametric alternatives, we can perform a Wilcoxon signed rank test with function \code{wilcoxon_test()} or a sign test with function \code{sign_test()}. % \begin{center} <>= wilcoxon_test(Exams, c("Regular", "Resit")) @ \end{center} % \begin{center} <>= sign_test(Exams, c("Regular", "Resit")) @ \end{center} Note that the order of the variables in the nonparametric test is reversed compared to the paired-sample $t$ test, but all three tests compute the differences in the form \code{Resit - Regular}. This behavior is carried over from \proglang{SPSS}. To check which of these tests are suitable for the given data, we can for example use a box plot. Function \code{box_plot()} allows to specify multiple variables to be plotted. % <<>>= box_plot(Exams, c("Regular", "Resit")) @ \subsection{Comparing two groups} An independent-samples $t$ test can be performed with function \code{t_test()} by specifying the numeric variable of interest as well as a grouping variable. As an example, we test whether the average log market values differ between Dutch and foreign football players. % \begin{center} <>= t_test(Eredivisie, "logMarketValue", group = "Foreign") @ \footnotesize <>= t_test(Eredivisie, "logMarketValue", group = "Foreign") @ \end{center} As a nonparametric alternative, we can perform a Wilcoxon rank sum test with function \code{wilcoxon_test()} in a similar manner. Note that it is not necessary to use the logarithms of the market values here, as this test works with ranks instead of the observed values. % \begin{center} <>= wilcoxon_test(Eredivisie, "MarketValue", group = "Foreign") @ \end{center} We can again use a box plot to check whether the $t$ test is suitable for the given data, as function \code{box_plot()} allows to specify a grouping variable as well. % <<>>= box_plot(Eredivisie, "logMarketValue", group = "Foreign") @ \subsection{Comparing multiple groups} For comparing the means of multiple groups, one-way ANOVA can be performed with function \code{ANOVA()}. Here we test whether there are differences among the average log market values for players on different positions. % \begin{center} <>= oneway <- ANOVA(Eredivisie, "logMarketValue", group = "Position") oneway @ \end{center} The \code{plot()} method for the resulting object produces a profile plot. % <<>>= plot(oneway) @ A nonparametric alternative based on ranks is the Kruskal-Wallis test, which can be applied with function \code{kruskal_test()}. It is again not necessary to use the logarithms of the market values for this test. % \begin{center} <>= kruskal_test(Eredivisie, "MarketValue", group = "Position") @ \end{center} Similarly, two-way ANOVA can be performed by supplying two grouping variables to function \code{ANOVA()}. % \begin{center} <>= twoway <- ANOVA(Eredivisie, "logMarketValue", group = c("Position", "Foreign")) twoway @ \end{center} We can again produce a profile plot with the \code{plot()} method for the resulting object. Argument \code{which} can be used to specify which of the two grouping variables should be used on the $x$-axis of the profile plot, with the default being the first grouping variable. % <>= plot(twoway) @ The \code{plot()} method illustrated works similarly to function \code{line_plot()}. The latter is more generally applicable and can also be used, e.g., for plotting time series. \subsection[Chi-squared tests]{$\chi^{2}$ tests} Function \code{chisq_test()} implements $\chi^{2}$ goodness-of-fit tests and $\chi^{2}$ tests on independence. With the \code{Eredivisie} data, we can first perform a goodness-of-fit test to see whether the traditional Dutch 4-3-3 system of total football is still reflected in player composition of Dutch football teams. In other words, we test for a multinomial distribution of variable \code{Position} with the probabilities $1/11$, $4/11$, $3/11$, and $3/11$ for goalkeepers, defenders, midfielders, and forwards, respectively. % \begin{center} <>= chisq_test(Eredivisie, "Position", p = c(1, 4, 3, 3)/11) @ \end{center} Furthermore, we can test whether the categorical variables \code{Position} and \code{Foreign} are independent, i.e., whether the proportions of Dutch and foreign players are the same for all playing positions. \begin{center} <>= chisq_test(Eredivisie, c("Position", "Foreign")) @ \end{center} \subsection{Linear regression} In this section, we compare two regression models to explain the log market values of football players. The first model uses only the player's age as a linear and a squared effect, while the second model adds the remaining contract length and a dummy variable for foreign players. We first add the squared values of age to the data set. % <<>>= Eredivisie$AgeSq <- Eredivisie$Age^2 @ We then estimate the regression models with function \code{regression()}. As usual in \proglang{R}, we specify the regression models with formulas. % \begin{center} <>= fit <- regression(logMarketValue ~ Age + AgeSq, logMarketValue ~ Age + AgeSq + Contract + Foreign, data = Eredivisie) fit @ \end{center} If we only want to print the table containing the model summaries, we can use the argument \code{statistics} of the \code{print()} method. In addition, argument \code{change} can be set to \code{TRUE} in order to include a test on the change in $R^{2}$ from one model to the next. \begin{center} <>= print(fit, statistics = "summary", change = TRUE) @ \end{center} Of course, all \code{print()} methods for objects returned by functions from package \pkg{r2spss} allow to select which tables to print. See the respective help files for details. The \code{plot()} method of the regression results can be used to create a histogram of the residuals or a scatter plot of the standardized residuals against the standardized fitted values. Argument \code{which} can be used to select between those two plots. Mimicking \proglang{SPSS} functionality, the plot is created for the \emph{last} specified model in the call to \code{regression()}. % <<>>= plot(fit, which = "histogram") @ % <<>>= plot(fit, which = "scatter") @ %% -- Summary/conclusions/discussion ------------------------------------------- % \section{Conclusions} \label{sec:summary} %% -- Optional special unnumbered sections ------------------------------------- % \section*{Computational details} % \begin{leftbar} % If necessary or useful, information about certain computational details % such as version numbers, operating systems, or compilers could be included % in an unnumbered section. Also, auxiliary packages (say, for visualizations, % maps, tables, \dots) that are not cited in the main text can be credited here. % \end{leftbar} % \section*{Acknowledgments} % \begin{leftbar} % All acknowledgments (note the AE spelling) should be collected in this % unnumbered section before the references. It may contain the usual information % about funding and feedback from colleagues/reviewers/etc. Furthermore, % information such as relative contributions of the authors may be added here % (if any). % \end{leftbar} % Andreas Alfons is supported by a grant of the Dutch Research Council (NWO), % research program Vidi, project number \mbox{VI.Vidi.195.141}. %% -- Bibliography ------------------------------------------------------------- %% - References need to be provided in a .bib BibTeX database. %% - All references should be made with \cite, \citet, \citep, \citealp etc. %% (and never hard-coded). See the FAQ for details. %% - JSS-specific markup (\proglang, \pkg, \code) should be used in the .bib. %% - Titles in the .bib should be in title case. %% - DOIs should be included where available. \bibliography{r2spss} %% -- Appendix (if any) -------------------------------------------------------- %% - After the bibliography with page break. %% - With proper section titles and _not_ just "Appendix". % \newpage % % \begin{appendix} % % \section{} % \label{app:} % % \end{appendix} %% ----------------------------------------------------------------------------- \end{document}