| Type: | Package | 
| Title: | Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models | 
| Version: | 2.8.0 | 
| Date: | 2024-07-03 | 
| Maintainer: | Jonah Gabry <jsg2201@columbia.edu> | 
| URL: | https://mc-stan.org/loo/, https://discourse.mc-stan.org | 
| BugReports: | https://github.com/stan-dev/loo/issues | 
| Description: | Efficient approximate leave-one-out cross-validation (LOO) for Bayesian models fit using Markov chain Monte Carlo, as described in Vehtari, Gelman, and Gabry (2017) <doi:10.1007/s11222-016-9696-4>. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions. | 
| License: | GPL (≥ 3) | 
| LazyData: | TRUE | 
| Depends: | R (≥ 3.1.2) | 
| Imports: | checkmate, matrixStats (≥ 0.52), parallel, posterior (≥ 1.5.0), stats | 
| Suggests: | bayesplot (≥ 1.7.0), brms (≥ 2.10.0), ggplot2, graphics, knitr, rmarkdown, rstan, rstanarm (≥ 2.19.0), rstantools, spdep, testthat (≥ 2.1.0) | 
| VignetteBuilder: | knitr | 
| Encoding: | UTF-8 | 
| SystemRequirements: | pandoc (>= 1.12.3), pandoc-citeproc | 
| RoxygenNote: | 7.3.1 | 
| NeedsCompilation: | no | 
| Packaged: | 2024-07-03 16:26:26 UTC; jgabry | 
| Author: | Aki Vehtari [aut], Jonah Gabry [cre, aut], Måns Magnusson [aut], Yuling Yao [aut], Paul-Christian Bürkner [aut], Topi Paananen [aut], Andrew Gelman [aut], Ben Goodrich [ctb], Juho Piironen [ctb], Bruno Nicenboim [ctb], Leevi Lindgren [ctb] | 
| Repository: | CRAN | 
| Date/Publication: | 2024-07-03 18:50:02 UTC | 
Efficient LOO-CV and WAIC for Bayesian models
Description
 Stan Development Team
Stan Development Team
This package implements the methods described in Vehtari, Gelman, and
Gabry (2017), Vehtari, Simpson, Gelman, Yao, and Gabry (2024), and
Yao et al. (2018). To get started see the loo package
vignettes, the
loo() function for efficient approximate leave-one-out
cross-validation (LOO-CV), the psis() function for the Pareto
smoothed importance sampling (PSIS) algorithm, or
loo_model_weights() for an implementation of Bayesian stacking of
predictive distributions from multiple models.
Details
Leave-one-out cross-validation (LOO-CV) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO-CV and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. This package implements the fast and stable computations for approximate LOO-CV laid out in Vehtari, Gelman, and Gabry (2017). From existing posterior simulation draws, we compute LOO-CV using Pareto smoothed importance sampling (PSIS; Vehtari, Simpson, Gelman, Yao, and Gabry, 2024), a new procedure for stabilizing and diagnosing importance weights. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparing of predictive errors between two models.
We recommend PSIS-LOO-CV instead of WAIC, because PSIS provides useful diagnostics and effective sample size and Monte Carlo standard error estimates.
Author(s)
Maintainer: Jonah Gabry jsg2201@columbia.edu
Authors:
- Aki Vehtari Aki.Vehtari@aalto.fi 
- Måns Magnusson 
- Yuling Yao 
- Paul-Christian Bürkner 
- Topi Paananen 
- Andrew Gelman 
Other contributors:
- Ben Goodrich [contributor] 
- Juho Piironen [contributor] 
- Bruno Nicenboim [contributor] 
- Leevi Lindgren [contributor] 
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, doi:10.1214/17-BA1091. (online).
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019). Leave-One-Out Cross-Validation for Large Data. In Thirty-sixth International Conference on Machine Learning, PMLR 97:4244-4253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020). Leave-One-Out Cross-Validation for Model Comparison in Large Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341-351.
Epifani, I., MacEachern, S. N., and Peruggia, M. (2008). Case-deletion importance sampling estimators: Central limit theorems and related results. Electronic Journal of Statistics 2, 774-806.
Gelfand, A. E. (1996). Model determination using sampling-based methods. In Markov Chain Monte Carlo in Practice, ed. W. R. Gilks, S. Richardson, D. J. Spiegelhalter, 145-162. London: Chapman and Hall.
Gelfand, A. E., Dey, D. K., and Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based methods. In Bayesian Statistics 4, ed. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, 147-167. Oxford University Press.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing 24, 997-1016.
Ionides, E. L. (2008). Truncated importance sampling. Journal of Computational and Graphical Statistics 17, 295-311.
Koopman, S. J., Shephard, N., and Creal, D. (2009). Testing the assumptions behind importance sampling. Journal of Econometrics 149, 2-11.
Peruggia, M. (1997). On the variability of case-deletion importance sampling weights in the Bayesian linear model. Journal of the American Statistical Association 92, 199-207.
Stan Development Team (2017). The Stan C++ Library, Version 2.17.0. https://mc-stan.org.
Stan Development Team (2018). RStan: the R interface to Stan, Version 2.17.3. https://mc-stan.org.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely application information criterion in singular learning theory. Journal of Machine Learning Research 11, 3571-3594.
Zhang, J., and Stephens, M. A. (2009). A new and efficient estimation method for the generalized Pareto distribution. Technometrics 51, 316-325.
See Also
Useful links:
- Report bugs at https://github.com/stan-dev/loo/issues 
Compute a point estimate from a draws object
Description
Compute a point estimate from a draws object
Usage
.compute_point_estimate(draws)
## S3 method for class 'matrix'
.compute_point_estimate(draws)
## Default S3 method:
.compute_point_estimate(draws)
Arguments
| draws | A draws object with draws from the posterior. | 
Details
This is a generic function to compute point estimates from draws
objects. The function is internal and should only be used by developers to
enable loo_subsample() for arbitrary draws objects.
Value
A 1 by P matrix with point estimates from a draws object.
The number of posterior draws in a draws object.
Description
The number of posterior draws in a draws object.
Usage
.ndraws(x)
## S3 method for class 'matrix'
.ndraws(x)
## Default S3 method:
.ndraws(x)
Arguments
| x | A draws object with posterior draws. | 
Details
This is a generic function to return the total number of draws from
an arbitrary draws objects. The function is internal and should only be
used by developers to enable loo_subsample() for arbitrary draws objects.
Value
An integer with the number of draws.
Thin a draws object
Description
Thin a draws object
Usage
.thin_draws(draws, loo_approximation_draws)
## S3 method for class 'matrix'
.thin_draws(draws, loo_approximation_draws)
## S3 method for class 'numeric'
.thin_draws(draws, loo_approximation_draws)
## Default S3 method:
.thin_draws(draws, loo_approximation_draws)
Arguments
| draws | A draws object with posterior draws. | 
| loo_approximation_draws | The number of posterior draws to return (ie after thinning). | 
Details
This is a generic function to thin draws from arbitrary draws
objects. The function is internal and should only be used by developers to
enable loo_subsample() for arbitrary draws objects.
Value
A thinned draws object.
Compute weighted expectations
Description
The E_loo() function computes weighted expectations (means, variances,
quantiles) using the importance weights obtained from the
PSIS smoothing procedure. The expectations estimated by the
E_loo() function assume that the PSIS approximation is working well.
A small Pareto k estimate is necessary,
but not sufficient, for E_loo() to give reliable estimates. Additional
diagnostic checks for gauging the reliability of the estimates are in
development and will be added in a future release.
Usage
E_loo(x, psis_object, ...)
## Default S3 method:
E_loo(
  x,
  psis_object,
  ...,
  type = c("mean", "variance", "sd", "quantile"),
  probs = NULL,
  log_ratios = NULL
)
## S3 method for class 'matrix'
E_loo(
  x,
  psis_object,
  ...,
  type = c("mean", "variance", "sd", "quantile"),
  probs = NULL,
  log_ratios = NULL
)
Arguments
| x | A numeric vector or matrix. | 
| psis_object | An object returned by  | 
| ... | Arguments passed to individual methods. | 
| type | The type of expectation to compute. The options are
 | 
| probs | For computing quantiles, a vector of probabilities. | 
| log_ratios | Optionally, a vector or matrix (the same dimensions as  | 
Value
A named list with the following components:
- value
- 
The result of the computation. For the matrix method, valueis a vector withncol(x)elements, with one exception: whentype="quantile"and multiple values are specified inprobsthevaluecomponent of the returned object is alength(probs)byncol(x)matrix.For the default/vector method the valuecomponent is scalar, with one exception: whentype="quantile"and multiple values are specified inprobsthevaluecomponent is a vector withlength(probs)elements.
- pareto_k
- 
Function-specific diagnostic. For the matrix method it will be a vector of length ncol(x)containing estimates of the shape parameterkof the generalized Pareto distribution. For the default/vector method, the estimate is a scalar. Iflog_ratiosis not specified when callingE_loo(), the smoothed log-weights are used to estimate Pareto-k's, which may produce optimistic estimates.For type="mean",type="var", andtype="sd", the returned Pareto-k is usually the maximum of the Pareto-k's for the left and right tail ofhrand the right tail ofr, whereris the importance ratio andh=xfortype="mean"andh=x^2fortype="var"andtype="sd". Ifhis binary, constant, or not finite, or iftype="quantile", the returned Pareto-k is the Pareto-k for the right tail ofr.
Examples
if (requireNamespace("rstanarm", quietly = TRUE)) {
# Use rstanarm package to quickly fit a model and get both a log-likelihood
# matrix and draws from the posterior predictive distribution
library("rstanarm")
# data from help("lm")
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
d <- data.frame(
  weight = c(ctl, trt),
  group = gl(2, 10, 20, labels = c("Ctl","Trt"))
)
fit <- stan_glm(weight ~ group, data = d, refresh = 0)
yrep <- posterior_predict(fit)
dim(yrep)
log_ratios <- -1 * log_lik(fit)
dim(log_ratios)
r_eff <- relative_eff(exp(-log_ratios), chain_id = rep(1:4, each = 1000))
psis_object <- psis(log_ratios, r_eff = r_eff, cores = 2)
E_loo(yrep, psis_object, type = "mean")
E_loo(yrep, psis_object, type = "var")
E_loo(yrep, psis_object, type = "sd")
E_loo(yrep, psis_object, type = "quantile", probs = 0.5) # median
E_loo(yrep, psis_object, type = "quantile", probs = c(0.1, 0.9))
# We can get more accurate Pareto k diagnostic if we also provide
# the log_ratios argument
E_loo(yrep, psis_object, type = "mean", log_ratios = log_ratios)
}
Pareto smoothed importance sampling (PSIS) using approximate posteriors
Description
Pareto smoothed importance sampling (PSIS) using approximate posteriors
Usage
ap_psis(log_ratios, log_p, log_g, ...)
## S3 method for class 'array'
ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1))
## S3 method for class 'matrix'
ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1))
## Default S3 method:
ap_psis(log_ratios, log_p, log_g, ...)
Arguments
| log_ratios | The log-likelihood ratios (ie -log_liks) | 
| log_p | The log-posterior (target) evaluated at S samples from the proposal distribution (g). A vector of length S. | 
| log_g | The log-density (proposal) evaluated at S samples from the proposal distribution (g). A vector of length S. | 
| ... | Currently not in use. | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
Methods (by class)
-  ap_psis(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number of chains, andNis the number of data points.
-  ap_psis(matrix): AnSbyNmatrix, whereSis the size of the posterior sample (with all chains merged) andNis the number of data points.
-  ap_psis(default): A vector of lengthS(posterior sample size).
Model comparison (deprecated, old version)
Description
This function is deprecated. Please use the new loo_compare() function
instead.
Usage
compare(..., x = list())
Arguments
| ... | |
| x | A list of at least two objects returned by  | 
Details
When comparing two fitted models, we can estimate the difference in their
expected predictive accuracy by the difference in elpd_loo or
elpd_waic (or multiplied by -2, if desired, to be on the
deviance scale).
When that difference, elpd_diff, is positive then the expected
predictive accuracy for the second model is higher. A negative
elpd_diff favors the first model.
When using compare() with more than two models, the values in the
elpd_diff and se_diff columns of the returned matrix are
computed by making pairwise comparisons between each model and the model
with the best ELPD (i.e., the model in the first row).
Although the elpd_diff column is equal to the difference in
elpd_loo, do not expect the se_diff column to be equal to the
the difference in se_elpd_loo.
To compute the standard error of the difference in ELPD we use a paired estimate to take advantage of the fact that the same set of N data points was used to fit both models. These calculations should be most useful when N is large, because then non-normality of the distribution is not such an issue when estimating the uncertainty in these sums. These standard errors, for all their flaws, should give a better sense of uncertainty than what is obtained using the current standard approach of comparing differences of deviances to a Chi-squared distribution, a practice derived for Gaussian linear models or asymptotically, and which only applies to nested models in any case.
Value
A vector or matrix with class 'compare.loo' that has its own
print method. If exactly two objects are provided in ... or
x, then the difference in expected predictive accuracy and the
standard error of the difference are returned. If more than two objects are
provided then a matrix of summary information is returned (see Details).
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
Examples
## Not run: 
loo1 <- loo(log_lik1)
loo2 <- loo(log_lik2)
print(compare(loo1, loo2), digits = 3)
print(compare(x = list(loo1, loo2)))
waic1 <- waic(log_lik1)
waic2 <- waic(log_lik2)
compare(waic1, waic2)
## End(Not run)
Continuously ranked probability score
Description
The crps() and scrps() functions and their loo_*() counterparts can be
used to compute the continuously ranked probability score (CRPS) and scaled
CRPS (SCRPS) (see Bolin and Wallin, 2022). CRPS is a proper scoring rule, and
strictly proper when the first moment of the predictive distribution is
finite. Both can be expressed in terms of samples form the predictive
distribution. See e.g. Gneiting and Raftery (2007) for a comprehensive
discussion on CRPS.
Usage
crps(x, ...)
scrps(x, ...)
loo_crps(x, ...)
loo_scrps(x, ...)
## S3 method for class 'matrix'
crps(x, x2, y, ..., permutations = 1)
## S3 method for class 'numeric'
crps(x, x2, y, ..., permutations = 1)
## S3 method for class 'matrix'
loo_crps(
  x,
  x2,
  y,
  log_lik,
  ...,
  permutations = 1,
  r_eff = 1,
  cores = getOption("mc.cores", 1)
)
## S3 method for class 'matrix'
scrps(x, x2, y, ..., permutations = 1)
## S3 method for class 'numeric'
scrps(x, x2, y, ..., permutations = 1)
## S3 method for class 'matrix'
loo_scrps(
  x,
  x2,
  y,
  log_lik,
  ...,
  permutations = 1,
  r_eff = 1,
  cores = getOption("mc.cores", 1)
)
Arguments
| x | A  | 
| ... | Passed on to  | 
| x2 | Independent draws from the same distribution as draws in  | 
| y | A vector of observations or a single value. | 
| permutations | An integer, with default value of 1,  specifying how many
times the expected value of  |X - X'| ( | 
| log_lik | A log-likelihood matrix the same size as  | 
| r_eff | An optional vector of relative effective sample size estimates
containing one element per observation. See  | 
| cores | The number of cores to use for parallelization of  | 
Details
To compute (S)CRPS, the user needs to provide two sets of draws, x and
x2, from the predictive distribution. This is due to the fact that formulas
used to compute CRPS involve an expectation of the absolute difference of x
and x2, both having the same distribution. See the permutations argument,
as well as Gneiting and Raftery (2007) for details.
Value
A list containing two elements: estimates and pointwise.
The former reports estimator and standard error and latter the pointwise
values.
References
Bolin, D., & Wallin, J. (2022). Local scale invariance and robustness of proper scoring rules. arXiv. doi:10.48550/arXiv.1912.05642
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378.
Examples
## Not run: 
# An example using rstanarm
library(rstanarm)
data("kidiq")
fit <- stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq)
ypred1 <- posterior_predict(fit)
ypred2 <- posterior_predict(fit)
crps(ypred1, ypred2, y = fit$y)
loo_crps(ypred1, ypred2, y = fit$y, log_lik = log_lik(fit))
## End(Not run)
Generic (expected) log-predictive density
Description
The elpd() methods for arrays and matrices can compute the expected log
pointwise predictive density for a new dataset or the log pointwise
predictive density of the observed data (an overestimate of the elpd).
Usage
elpd(x, ...)
## S3 method for class 'array'
elpd(x, ...)
## S3 method for class 'matrix'
elpd(x, ...)
Arguments
| x | A log-likelihood array or matrix. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. | 
| ... | Currently ignored. | 
Details
The elpd() function is an S3 generic and methods are provided for
3-D pointwise log-likelihood arrays and matrices.
Methods (by class)
-  elpd(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number of chains, andNis the number of data points.
-  elpd(matrix): AnSbyNmatrix, whereSis the size of the posterior sample (with all chains merged) andNis the number of data points.
See Also
The vignette Holdout validation and K-fold cross-validation of Stan
programs with the loo package for demonstrations of using the elpd()
methods.
Examples
# Calculate the lpd of the observed data
LLarr <- example_loglik_array()
elpd(LLarr)
Objects to use in examples and tests
Description
Example pointwise log-likelihood objects to use in demonstrations and tests. See the Value and Examples sections below.
Usage
example_loglik_array()
example_loglik_matrix()
Value
example_loglik_array() returns a 500 (draws) x 2 (chains) x 32
(observations) pointwise log-likelihood array.
example_loglik_matrix() returns the same pointwise log-likelihood values
as example_loglik_array() but reshaped into a 1000 (draws*chains) x 32
(observations) matrix.
Examples
LLarr <- example_loglik_array()
(dim_arr <- dim(LLarr))
LLmat <- example_loglik_matrix()
(dim_mat <- dim(LLmat))
all.equal(dim_mat[1], dim_arr[1] * dim_arr[2])
all.equal(dim_mat[2], dim_arr[3])
all.equal(LLarr[, 1, ], LLmat[1:500, ])
all.equal(LLarr[, 2, ], LLmat[501:1000, ])
Extract pointwise log-likelihood from a Stan model
Description
Convenience function for extracting the pointwise log-likelihood
matrix or array from a stanfit object from the rstan package.
Note: recent versions of rstan now include a loo() method for
stanfit objects that handles this internally.
Usage
extract_log_lik(stanfit, parameter_name = "log_lik", merge_chains = TRUE)
Arguments
| stanfit | A  | 
| parameter_name | A character string naming the parameter (or generated quantity) in the Stan model corresponding to the log-likelihood. | 
| merge_chains | If  | 
Details
Stan does not automatically compute and store the log-likelihood. It is up to the user to incorporate it into the Stan program if it is to be extracted after fitting the model. In a Stan model, the pointwise log likelihood can be coded as a vector in the transformed parameters block (and then summed up in the model block) or it can be coded entirely in the generated quantities block. We recommend using the generated quantities block so that the computations are carried out only once per iteration rather than once per HMC leapfrog step.
For example, the following is the generated quantities block for
computing and saving the log-likelihood for a linear regression model with
N data points, outcome y, predictor matrix X,
coefficients beta, and standard deviation sigma:
vector[N] log_lik;
for (n in 1:N) log_lik[n] = normal_lpdf(y[n] | X[n, ] * beta, sigma);
Value
If merge_chains=TRUE, an S by N matrix of
(post-warmup) extracted draws, where S is the size of the posterior
sample and N is the number of data points. If
merge_chains=FALSE, an I by C by N array, where
I \times C = S.
References
Stan Development Team (2017). The Stan C++ Library, Version 2.16.0. https://mc-stan.org/
Stan Development Team (2017). RStan: the R interface to Stan, Version 2.16.1. https://mc-stan.org/
Find the model names associated with "loo" objects
Description
Find the model names associated with "loo" objects
Usage
find_model_names(x)
Arguments
| x | List of  | 
Value
Character vector of model names the same length as x.
Estimate parameters of the Generalized Pareto distribution
Description
Given a sample x, Estimate the parameters k and \sigma of
the generalized Pareto distribution (GPD), assuming the location parameter is
0. By default the fit uses a prior for k, which will stabilize
estimates for very small sample sizes (and low effective sample sizes in the
case of MCMC samples). The weakly informative prior is a Gaussian prior
centered at 0.5.
Usage
gpdfit(x, wip = TRUE, min_grid_pts = 30, sort_x = TRUE)
Arguments
| x | A numeric vector. The sample from which to estimate the parameters. | 
| wip | Logical indicating whether to adjust  | 
| min_grid_pts | The minimum number of grid points used in the fitting
algorithm. The actual number used is  | 
| sort_x | If  | 
Details
Here the parameter k is the negative of k in Zhang &
Stephens (2009).
Value
A named list with components k and sigma.
References
Zhang, J., and Stephens, M. A. (2009). A new and efficient estimation method for the generalized Pareto distribution. Technometrics 51, 316-325.
See Also
A parent class for different importance sampling methods.
Description
A parent class for different importance sampling methods.
Usage
importance_sampling(log_ratios, method, ...)
## S3 method for class 'array'
importance_sampling(
  log_ratios,
  method,
  ...,
  r_eff = 1,
  cores = getOption("mc.cores", 1)
)
## S3 method for class 'matrix'
importance_sampling(
  log_ratios,
  method,
  ...,
  r_eff = 1,
  cores = getOption("mc.cores", 1)
)
## Default S3 method:
importance_sampling(log_ratios, method, ..., r_eff = 1)
Arguments
| log_ratios | An array, matrix, or vector of importance ratios on the log scale (for PSIS-LOO these are negative log-likelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. | 
| method | The importance sampling method to use. The following methods are implemented: | 
| ... | Arguments passed on to the various methods. | 
| r_eff | Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
Generic function for K-fold cross-validation for developers
Description
For developers of Bayesian modeling packages, loo includes
a generic function kfold() so that methods may be defined for K-fold
CV without name conflicts between packages. See, for example, the
kfold() methods in the rstanarm and brms packages.
The Value section below describes the objects that kfold()
methods should return in order to be compatible with
loo_compare() and the loo package print methods.
Usage
kfold(x, ...)
is.kfold(x)
Arguments
| x | A fitted model object. | 
| ... | Arguments to pass to specific methods. | 
Value
For developers defining a kfold() method for a class
"foo", the kfold.foo() function should return a list with class
c("kfold", "loo") with at least the following named elements:
-  "estimates": A1x2matrix containing the ELPD estimate and its standard error. The matrix must have row name "elpd_kfold" and column names"Estimate"and"SE".
-  "pointwise": ANx1matrix with column name"elpd_kfold"containing the pointwise contributions for each data point.
It is important for the object to have at least these classes and
components so that it is compatible with other functions like
loo_compare() and print() methods.
Helper functions for K-fold cross-validation
Description
These functions can be used to generate indexes for use with K-fold cross-validation. See the Details section for explanations.
Usage
kfold_split_random(K = 10, N = NULL)
kfold_split_stratified(K = 10, x = NULL)
kfold_split_grouped(K = 10, x = NULL)
Arguments
| K | The number of folds to use. | 
| N | The number of observations in the data. | 
| x | A discrete variable of length  | 
Details
kfold_split_random() splits the data into K groups
of equal size (or roughly equal size).
For a categorical variable x kfold_split_stratified()
splits the observations into K groups ensuring that relative
category frequencies are approximately preserved.
For a grouping variable x, kfold_split_grouped() places
all observations in x from the same group/level together in
the same fold. The selection of which groups/levels go into which
fold (relevant when when there are more groups than folds) is
randomized.
Value
An integer vector of length N where each element is an index in 1:K.
Examples
ids <- kfold_split_random(K = 5, N = 20)
print(ids)
table(ids)
x <- sample(c(0, 1), size = 200, replace = TRUE, prob = c(0.05, 0.95))
table(x)
ids <- kfold_split_stratified(K = 5, x = x)
print(ids)
table(ids, x)
grp <- gl(n = 50, k = 15, labels = state.name)
length(grp)
head(table(grp))
ids_10 <- kfold_split_grouped(K = 10, x = grp)
(tab_10 <- table(grp, ids_10))
colSums(tab_10)
ids_9 <- kfold_split_grouped(K = 9, x = grp)
(tab_9 <- table(grp, ids_9))
colSums(tab_9)
Efficient approximate leave-one-out cross-validation (LOO)
Description
The loo() methods for arrays, matrices, and functions compute PSIS-LOO
CV, efficient approximate leave-one-out (LOO) cross-validation for Bayesian
models using Pareto smoothed importance sampling (PSIS). This is
an implementation of the methods described in Vehtari, Gelman, and Gabry
(2017) and Vehtari, Simpson, Gelman, Yao, and Gabry (2024).
The loo_i() function enables testing log-likelihood
functions for use with the loo.function() method.
Usage
loo(x, ...)
## S3 method for class 'array'
loo(
  x,
  ...,
  r_eff = 1,
  save_psis = FALSE,
  cores = getOption("mc.cores", 1),
  is_method = c("psis", "tis", "sis")
)
## S3 method for class 'matrix'
loo(
  x,
  ...,
  r_eff = 1,
  save_psis = FALSE,
  cores = getOption("mc.cores", 1),
  is_method = c("psis", "tis", "sis")
)
## S3 method for class ''function''
loo(
  x,
  ...,
  data = NULL,
  draws = NULL,
  r_eff = 1,
  save_psis = FALSE,
  cores = getOption("mc.cores", 1),
  is_method = c("psis", "tis", "sis")
)
loo_i(i, llfun, ..., data = NULL, draws = NULL, r_eff = 1, is_method = "psis")
is.loo(x)
is.psis_loo(x)
Arguments
| x | A log-likelihood array, matrix, or function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. | 
| r_eff | Vector of relative effective sample size estimates for the
likelihood ( | 
| save_psis | Should the  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
| is_method | The importance sampling method to use. The following methods are implemented: | 
| data,draws,... | For the  | 
| i | For  | 
| llfun | For  | 
Details
The loo() function is an S3 generic and methods are provided for
3-D pointwise log-likelihood arrays, pointwise log-likelihood matrices, and
log-likelihood functions. The array and matrix methods are the most
convenient, but for models fit to very large datasets the loo.function()
method is more memory efficient and may be preferable.
Value
The loo() methods return a named list with class
c("psis_loo", "loo") and components:
- estimates
- 
A matrix with two columns ( Estimate,SE) and three rows (elpd_loo,p_loo,looic). This contains point estimates and standard errors of the expected log pointwise predictive density (elpd_loo), the effective number of parameters (p_loo) and the LOO information criterionlooic(which is just-2 * elpd_loo, i.e., converted to deviance scale).
- pointwise
- 
A matrix with five columns (and number of rows equal to the number of observations) containing the pointwise contributions of the measures ( elpd_loo,mcse_elpd_loo,p_loo,looic,influence_pareto_k). in addition to the three measures inestimates, we also report pointwise values of the Monte Carlo standard error ofelpd_loo(mcse_elpd_loo), and statistics describing the influence of each observation on the posterior distribution (influence_pareto_k). These are the estimates of the shape parameterkof the generalized Pareto fit to the importance ratios for each leave-one-out distribution (see the pareto-k-diagnostic page for details).
- diagnostics
- 
A named list containing two vectors: -  pareto_k: Importance sampling reliability diagnostics. By default, these are equal to theinfluence_pareto_kinpointwise. Some algorithms can improve importance sampling reliability and modify these diagnostics. See the pareto-k-diagnostic page for details.
-  n_eff: PSIS effective sample size estimates.
 
-  
- psis_object
- 
This component will be NULLunless thesave_psisargument is set toTRUEwhen callingloo(). In that casepsis_objectwill be the object of class"psis"that is created when theloo()function callspsis()internally to do the PSIS procedure.
The loo_i() function returns a named list with components
pointwise and diagnostics. These components have the same
structure as the pointwise and diagnostics components of the
object returned by loo() except they contain results for only a single
observation.
Methods (by class)
-  loo(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number of chains, andNis the number of data points.
-  loo(matrix): AnSbyNmatrix, whereSis the size of the posterior sample (with all chains merged) andNis the number of data points.
-  loo(`function`): A functionf()that takes argumentsdata_ianddrawsand returns a vector containing the log-likelihood for a single observationievaluated at each posterior draw. The function should be written such that, for each observationiin1:N, evaluatingf(data_i = data[i,, drop=FALSE], draws = draws) results in a vector of length S(size of posterior sample). The log-likelihood function can also have additional arguments butdata_ianddrawsare required.If using the function method then the arguments dataanddrawsmust also be specified in the call toloo():-  data: A data frame or matrix containing the data (e.g. observed outcome and predictors) needed to compute the pointwise log-likelihood. For each observationi, theith row ofdatawill be passed to thedata_iargument of the log-likelihood function.
-  draws: An object containing the posterior draws for any parameters needed to compute the pointwise log-likelihood. Unlikedata, which is indexed by observation, for each observation the entire objectdrawswill be passed to thedrawsargument of the log-likelihood function.
- The - ...can be used if your log-likelihood function takes additional arguments. These arguments are used like the- drawsargument in that they are recycled for each observation.
 
-  
Defining loo() methods in a package
Package developers can define
loo() methods for fitted models objects. See the example loo.stanfit()
method in the Examples section below for an example of defining a
method that calls loo.array(). The loo.stanreg() method in the
rstanarm package is an example of defining a method that calls
loo.function().
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
See Also
- The loo package vignettes for demonstrations. 
- The FAQ page on the loo website for answers to frequently asked questions. 
-  psis()for the underlying Pareto Smoothed Importance Sampling (PSIS) procedure used in the LOO-CV approximation.
-  pareto-k-diagnostic for convenience functions for looking at diagnostics. 
-  loo_compare()for model comparison.
Examples
### Array and matrix methods (using example objects included with loo package)
# Array method
LLarr <- example_loglik_array()
rel_n_eff <- relative_eff(exp(LLarr))
loo(LLarr, r_eff = rel_n_eff, cores = 2)
# Matrix method
LLmat <- example_loglik_matrix()
rel_n_eff <- relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500))
loo(LLmat, r_eff = rel_n_eff, cores = 2)
### Using log-likelihood function instead of array or matrix
set.seed(124)
# Simulate data and draw from posterior
N <- 50; K <- 10; S <- 100; a0 <- 3; b0 <- 2
p <- rbeta(1, a0, b0)
y <- rbinom(N, size = K, prob = p)
a <- a0 + sum(y); b <- b0 + N * K - sum(y)
fake_posterior <- as.matrix(rbeta(S, a, b))
dim(fake_posterior) # S x 1
fake_data <- data.frame(y,K)
dim(fake_data) # N x 2
llfun <- function(data_i, draws) {
  # each time called internally within loo the arguments will be equal to:
  # data_i: ith row of fake_data (fake_data[i,, drop=FALSE])
  # draws: entire fake_posterior matrix
  dbinom(data_i$y, size = data_i$K, prob = draws, log = TRUE)
}
# Use the loo_i function to check that llfun works on a single observation
# before running on all obs. For example, using the 3rd obs in the data:
loo_3 <- loo_i(i = 3, llfun = llfun, data = fake_data, draws = fake_posterior)
print(loo_3$pointwise[, "elpd_loo"])
# Use loo.function method (default r_eff=1 is used as this posterior not obtained via MCMC)
loo_with_fn <- loo(llfun, draws = fake_posterior, data = fake_data)
# If we look at the elpd_loo contribution from the 3rd obs it should be the
# same as what we got above with the loo_i function and i=3:
print(loo_with_fn$pointwise[3, "elpd_loo"])
print(loo_3$pointwise[, "elpd_loo"])
# Check that the loo.matrix method gives same answer as loo.function method
log_lik_matrix <- sapply(1:N, function(i) {
  llfun(data_i = fake_data[i,, drop=FALSE], draws = fake_posterior)
})
loo_with_mat <- loo(log_lik_matrix)
all.equal(loo_with_mat$estimates, loo_with_fn$estimates) # should be TRUE!
## Not run: 
### For package developers: defining loo methods
# An example of a possible loo method for 'stanfit' objects (rstan package).
# A similar method is included in the rstan package.
# In order for users to be able to call loo(stanfit) instead of
# loo.stanfit(stanfit) the NAMESPACE needs to be handled appropriately
# (roxygen2 and devtools packages are good for that).
#
loo.stanfit <-
 function(x,
         pars = "log_lik",
         ...,
         save_psis = FALSE,
         cores = getOption("mc.cores", 1)) {
  stopifnot(length(pars) == 1L)
  LLarray <- loo::extract_log_lik(stanfit = x,
                                  parameter_name = pars,
                                  merge_chains = FALSE)
  r_eff <- loo::relative_eff(x = exp(LLarray), cores = cores)
  loo::loo.array(LLarray,
                 r_eff = r_eff,
                 cores = cores,
                 save_psis = save_psis)
}
## End(Not run)
Datasets for loo examples and vignettes
Description
Small datasets for use in loo examples and vignettes. The Kline
and milk datasets are also included in the rethinking package
(McElreath, 2016a), but we include them here as rethinking is not
on CRAN.
Details
Currently the data sets included are:
-  Kline: Small dataset from Kline and Boyd (2010) on tool complexity and demography in Oceanic islands societies. This data is discussed in detail in McElreath (2016a,2016b). (Link to variable descriptions)
-  milk: Small dataset from Hinde and Milligan (2011) on primate milk composition.This data is discussed in detail in McElreath (2016a,2016b). (Link to variable descriptions)
-  voice: Voice rehabilitation data from Tsanas et al. (2014).
References
Hinde and Milligan. 2011. Evolutionary Anthropology 20:9-23.
Kline, M.A. and R. Boyd. 2010. Proc R Soc B 277:2559-2564.
McElreath, R. (2016a). rethinking: Statistical Rethinking book package. R package version 1.59.
McElreath, R. (2016b). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman & Hall/CRC.
A. Tsanas, M.A. Little, C. Fox, L.O. Ramig: Objective automatic assessment of rehabilitative speech treatment in Parkinson's disease, IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 22, pp. 181-190, January 2014
Examples
str(Kline)
str(milk)
LOO package glossary
Description
The pages provides definitions to key terms. Also see the FAQ page on the loo website for answers to frequently asked questions.
Note: VGG2017 refers to Vehtari, Gelman, and Gabry (2017). See References, below.
ELPD and elpd_loo
The ELPD is the theoretical expected log pointwise predictive density for a new
dataset (Eq 1 in VGG2017), which can be estimated, e.g., using
cross-validation. elpd_loo is the Bayesian LOO estimate of the
expected log pointwise predictive density (Eq 4 in VGG2017) and
is a sum of N individual pointwise log predictive densities. Probability
densities can be smaller or larger than 1, and thus log predictive densities
can be negative or positive. For simplicity the ELPD acronym is used also for
expected log pointwise predictive probabilities for discrete models.
Probabilities are always equal or less than 1, and thus log predictive
probabilities are 0 or negative.
Standard error of elpd_loo
As elpd_loo is defined as the sum of N independent components (Eq 4 in
VGG2017), we can compute the standard error by using the standard deviation
of the N components and multiplying by sqrt(N) (Eq 23 in VGG2017).
This standard error is a coarse description of our uncertainty about the
predictive performance for unknown future data. When N is small or there is
severe model misspecification, the current SE estimate is overoptimistic and
the actual SE can even be twice as large. Even for moderate N, when the SE
estimate is an accurate estimate for the scale, it ignores the skewness. When
making model comparisons, the SE of the component-wise (pairwise) differences
should be used instead (see the se_diff section below and Eq 24 in
VGG2017). Sivula et al. (2022) discuss the conditions when the normal
approximation used for SE and se_diff is good.
Monte Carlo SE of elpd_loo
The Monte Carlo standard error is the estimate for the computational accuracy
of MCMC and importance sampling used to compute elpd_loo. Usually this
is negligible compared to the standard describing the uncertainty due to
finite number of observations (Eq 23 in VGG2017).
p_loo (effective number of parameters)
p_loo is the difference between elpd_loo and the non-cross-validated
log posterior predictive density. It describes how much more difficult it
is to predict future data than the observed data. Asymptotically under
certain regularity conditions, p_loo can be interpreted as the
effective number of parameters. In well behaving cases p_loo < N and
p_loo < p, where p is the total number of parameters in the
model. p_loo > N  or p_loo > p indicates that the model has very
weak predictive capability and may indicate a severe model misspecification.
See below for more on interpreting p_loo when there are warnings
about high Pareto k diagnostic values.
Pareto k estimates
The Pareto k estimate is a diagnostic for Pareto smoothed importance
sampling (PSIS), which is used to compute components of elpd_loo. In
importance-sampling LOO the full posterior distribution is used as the
proposal distribution. The Pareto k diagnostic estimates how far an
individual leave-one-out distribution is from the full distribution. If
leaving out an observation changes the posterior too much then importance
sampling is not able to give a reliable estimate. Pareto smoothing stabilizes
importance sampling and guarantees a finite variance estimate at the
cost of some bias.
The diagnostic threshold for Pareto k depends on sample size
S (sample size dependent threshold was introduced by Vehtari
et al., 2024, and before that fixed thresholds of 0.5 and 0.7 were
recommended). For simplicity, loo package uses the nominal sample
size S  when computing the sample size specific
threshold. This provides an optimistic threshold if the effective
sample size is less than 2200, but even then if ESS/S > 1/2 the difference
is usually negligible. Thinning of MCMC draws can be used to improve
the ratio ESS/S.
- If - k < \min(1 - 1 / \log_{10}(S), 0.7), where- Sis the sample size, the PSIS estimate and the corresponding Monte Carlo standard error estimate are reliable.
- If - 1 - 1 / \log_{10}(S) <= k < 0.7, the PSIS estimate and the corresponding Monte Carlo standard error estimate are not reliable, but increasing the (effective) sample size- Sabove 2200 may help (this will increase the sample size specific threshold- (1 - 1 / \log_{10}(2200) > 0.7and then the bias specific threshold 0.7 dominates).
- If - 0.7 <= k < 1, the PSIS estimate and the corresponding Monte Carlo standard error have large bias and are not reliable. Increasing the sample size may reduce the variability in the- kestimate, which may also result in a lower- kestimate.
- If - k \geq 1, the target distribution is estimated to have non-finite mean. The PSIS estimate and the corresponding Monte Carlo standard error are not well defined. Increasing the sample size may reduce the variability in- kestimate, which may also result in a lower- kestimate.
Pareto k is also useful as a measure of influence of an
observation.  Highly influential observations have high k
values. Very high k values often indicate model
misspecification, outliers or mistakes in data processing. See
Section 6 of Gabry et al. (2019) for an example.
Interpreting p_loo when Pareto k is large
If k > 0.7 then we can also look at
the p_loo estimate for some additional information about the problem:
- If - p_loo << p(the total number of parameters in the model), then the model is likely to be misspecified. Posterior predictive checks (PPCs) are then likely to also detect the problem. Try using an overdispersed model, or add more structural information (nonlinearity, mixture model, etc.).
- If - p_loo < pand the number of parameters- pis relatively large compared to the number of observations (e.g.,- p>N/5), it is likely that the model is so flexible or the population prior so weak that it’s difficult to predict the left out observation (even for the true model). This happens, for example, in the simulated 8 schools (in VGG2017), random effect models with a few observations per random effect, and Gaussian processes and spatial models with short correlation lengths.
- If - p_loo > p, then the model is likely to be badly misspecified. If the number of parameters- p<<N, then PPCs are also likely to detect the problem. See the case study at https://avehtari.github.io/modelselection/roaches.html for an example. If- pis relatively large compared to the number of observations, say- p>N/5(more accurately we should count number of observations influencing each parameter as in hierarchical models some groups may have few observations and other groups many), it is possible that PPCs won't detect the problem.
elpd_diff
elpd_diff is the difference in elpd_loo for two models. If more
than two models are compared, the difference is computed relative to the
model with highest elpd_loo.
se_diff
The standard error of component-wise differences of elpd_loo (Eq 24 in VGG2017) between two models. This SE is smaller than the SE for individual models due to correlation (i.e., if some observations are easier and some more difficult to predict for all models).
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
Sivula, T, Magnusson, M., Matamoros A. A., and Vehtari, A. (2022). Uncertainty in Bayesian leave-one-out cross-validation based model comparison. preprint arXiv:2008.10296v3..
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and Gelman, A. (2019), Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378 (journal version, preprint arXiv:1709.01449, code on GitHub)
Efficient approximate leave-one-out cross-validation (LOO) for posterior approximations
Description
Efficient approximate leave-one-out cross-validation (LOO) for posterior approximations
Usage
loo_approximate_posterior(x, log_p, log_g, ...)
## S3 method for class 'array'
loo_approximate_posterior(
  x,
  log_p,
  log_g,
  ...,
  save_psis = FALSE,
  cores = getOption("mc.cores", 1)
)
## S3 method for class 'matrix'
loo_approximate_posterior(
  x,
  log_p,
  log_g,
  ...,
  save_psis = FALSE,
  cores = getOption("mc.cores", 1)
)
## S3 method for class ''function''
loo_approximate_posterior(
  x,
  ...,
  data = NULL,
  draws = NULL,
  log_p = NULL,
  log_g = NULL,
  save_psis = FALSE,
  cores = getOption("mc.cores", 1)
)
Arguments
| x | A log-likelihood array, matrix, or function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. | 
| log_p | The log-posterior (target) evaluated at S samples from the proposal distribution (g). A vector of length S. | 
| log_g | The log-density (proposal) evaluated at S samples from the proposal distribution (g). A vector of length S. | 
| save_psis | Should the  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
| data,draws,... | For the  | 
Details
The loo_approximate_posterior() function is an S3 generic and
methods are provided for 3-D pointwise log-likelihood arrays, pointwise
log-likelihood matrices, and log-likelihood functions. The implementation
works for posterior approximations where it is possible to compute the log
density for the posterior approximation.
Value
The loo_approximate_posterior() methods return a named list with
class c("psis_loo_ap", "psis_loo", "loo"). It has the same structure
as the objects returned by loo() but with the additional slot:
- posterior_approximation
- 
A list with two vectors, log_pandlog_gof the same length containing the posterior density and the approximation density for the individual draws.
Methods (by class)
-  loo_approximate_posterior(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number of chains, andNis the number of data points.
-  loo_approximate_posterior(matrix): AnSbyNmatrix, whereSis the size of the posterior sample (with all chains merged) andNis the number of data points.
-  loo_approximate_posterior(`function`): A functionf()that takes argumentsdata_ianddrawsand returns a vector containing the log-likelihood for a single observationievaluated at each posterior draw. The function should be written such that, for each observationiin1:N, evaluatingf(data_i = data[i,, drop=FALSE], draws = draws) results in a vector of length S(size of posterior sample). The log-likelihood function can also have additional arguments butdata_ianddrawsare required.If using the function method then the arguments dataanddrawsmust also be specified in the call toloo():-  data: A data frame or matrix containing the data (e.g. observed outcome and predictors) needed to compute the pointwise log-likelihood. For each observationi, theith row ofdatawill be passed to thedata_iargument of the log-likelihood function.
-  draws: An object containing the posterior draws for any parameters needed to compute the pointwise log-likelihood. Unlikedata, which is indexed by observation, for each observation the entire objectdrawswill be passed to thedrawsargument of the log-likelihood function.
- The - ...can be used if your log-likelihood function takes additional arguments. These arguments are used like the- drawsargument in that they are recycled for each observation.
 
-  
References
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019). Leave-One-Out Cross-Validation for Large Data. In Thirty-sixth International Conference on Machine Learning, PMLR 97:4244-4253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020). Leave-One-Out Cross-Validation for Model Comparison in Large Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341-351.
See Also
Model comparison
Description
Compare fitted models based on ELPD.
By default the print method shows only the most important information. Use
print(..., simplify=FALSE) to print a more detailed summary.
Usage
loo_compare(x, ...)
## Default S3 method:
loo_compare(x, ...)
## S3 method for class 'compare.loo'
print(x, ..., digits = 1, simplify = TRUE)
## S3 method for class 'compare.loo_ss'
print(x, ..., digits = 1, simplify = TRUE)
Arguments
| x | An object of class  | 
| ... | Additional objects of class  | 
| digits | For the print method only, the number of digits to use when printing. | 
| simplify | For the print method only, should only the essential columns of the summary matrix be printed? The entire matrix is always returned, but by default only the most important columns are printed. | 
Details
When comparing two fitted models, we can estimate the difference in their
expected predictive accuracy by the difference in
elpd_loo or elpd_waic (or multiplied by -2, if
desired, to be on the deviance scale).
When using loo_compare(), the returned matrix will have one row per model
and several columns of estimates. The values in the
elpd_diff and se_diff columns of the
returned matrix are computed by making pairwise comparisons between each
model and the model with the largest ELPD (the model in the first row). For
this reason the elpd_diff column will always have the value 0 in the
first row (i.e., the difference between the preferred model and itself) and
negative values in subsequent rows for the remaining models.
To compute the standard error of the difference in ELPD —
which should not be expected to equal the difference of the standard errors
— we use a paired estimate to take advantage of the fact that the same
set of N data points was used to fit both models. These calculations
should be most useful when N is large, because then non-normality of
the distribution is not such an issue when estimating the uncertainty in
these sums. These standard errors, for all their flaws, should give a
better sense of uncertainty than what is obtained using the current
standard approach of comparing differences of deviances to a Chi-squared
distribution, a practice derived for Gaussian linear models or
asymptotically, and which only applies to nested models in any case.
Sivula et al. (2022) discuss the conditions when the normal
approximation used for SE and se_diff is good.
If more than 11 models are compared, we internally recompute the model
differences using the median model by ELPD as the baseline model. We then
estimate whether the differences in predictive performance are potentially
due to chance as described by McLatchie and Vehtari (2023). This will flag
a warning if it is deemed that there is a risk of over-fitting due to the
selection process. In that case users are recommended to avoid model
selection based on LOO-CV, and instead to favor model averaging/stacking or
projection predictive inference.
Value
A matrix with class "compare.loo" that has its own
print method. See the Details section.
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
Sivula, T, Magnusson, M., Matamoros A. A., and Vehtari, A. (2022). Uncertainty in Bayesian leave-one-out cross-validation based model comparison. preprint arXiv:2008.10296v3..
McLatchie, Y., and Vehtari, A. (2023). Efficient estimation and correction of selection-induced bias with order statistics. preprint arXiv:2309.03742
See Also
- The FAQ page on the loo website for answers to frequently asked questions. 
Examples
# very artificial example, just for demonstration!
LL <- example_loglik_array()
loo1 <- loo(LL)     # should be worst model when compared
loo2 <- loo(LL + 1) # should be second best model when compared
loo3 <- loo(LL + 2) # should be best model when compared
comp <- loo_compare(loo1, loo2, loo3)
print(comp, digits = 2)
# show more details with simplify=FALSE
# (will be the same for all models in this artificial example)
print(comp, simplify = FALSE, digits = 3)
# can use a list of objects with custom names
# will use apple, banana, and cherry, as the names in the output
loo_compare(list("apple" = loo1, "banana" = loo2, "cherry" = loo3))
## Not run: 
# works for waic (and kfold) too
loo_compare(waic(LL), waic(LL - 10))
## End(Not run)
Model averaging/weighting via stacking or pseudo-BMA weighting
Description
Model averaging via stacking of predictive distributions, pseudo-BMA weighting or pseudo-BMA+ weighting with the Bayesian bootstrap. See Yao et al. (2018), Vehtari, Gelman, and Gabry (2017), and Vehtari, Simpson, Gelman, Yao, and Gabry (2024) for background.
Usage
loo_model_weights(x, ...)
## Default S3 method:
loo_model_weights(
  x,
  ...,
  method = c("stacking", "pseudobma"),
  optim_method = "BFGS",
  optim_control = list(),
  BB = TRUE,
  BB_n = 1000,
  alpha = 1,
  r_eff_list = NULL,
  cores = getOption("mc.cores", 1)
)
stacking_weights(lpd_point, optim_method = "BFGS", optim_control = list())
pseudobma_weights(lpd_point, BB = TRUE, BB_n = 1000, alpha = 1)
Arguments
| x | A list of  | 
| ... | Unused, except for the generic to pass arguments to individual methods. | 
| method | Either  | 
| optim_method | If  | 
| optim_control | If  | 
| BB | Logical used when  | 
| BB_n | For pseudo-BMA+ weighting only, the number of samples to use for
the Bayesian bootstrap. The default is  | 
| alpha | Positive scalar shape parameter in the Dirichlet distribution
used for the Bayesian bootstrap. The default is  | 
| r_eff_list | Optionally, a list of relative effective sample size
estimates for the likelihood  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
| lpd_point | If calling  | 
Details
loo_model_weights() is a wrapper around the stacking_weights() and
pseudobma_weights() functions that implements stacking, pseudo-BMA, and
pseudo-BMA+ weighting for combining multiple predictive distributions. We can
use approximate or exact leave-one-out cross-validation (LOO-CV) or K-fold CV
to estimate the expected log predictive density (ELPD).
The stacking method (method="stacking"), which is the default for
loo_model_weights(), combines all models by maximizing the leave-one-out
predictive density of the combination distribution. That is, it finds the
optimal linear combining weights for maximizing the leave-one-out log score.
The pseudo-BMA method (method="pseudobma") finds the relative weights
proportional to the ELPD of each model. However, when
method="pseudobma", the default is to also use the Bayesian bootstrap
(BB=TRUE), which corresponds to the pseudo-BMA+ method. The Bayesian
bootstrap  takes into account the uncertainty of finite data points and
regularizes the weights away from the extremes of 0 and 1.
In general, we recommend stacking for averaging predictive distributions, while pseudo-BMA+ can serve as a computationally easier alternative.
Value
A numeric vector containing one weight for each model.
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, doi:10.1214/17-BA1091. (online).
See Also
- The loo package vignettes, particularly Bayesian Stacking and Pseudo-BMA weights using the loo package. 
-  loo()for details on leave-one-out ELPD estimation.
-  constrOptim()for the choice of optimization methods and control-parameters.
-  relative_eff()for computingr_eff.
Examples
## Not run: 
### Demonstrating usage after fitting models with RStan
library(rstan)
# generate fake data from N(0,1).
N <- 100
y <- rnorm(N, 0, 1)
# Suppose we have three models: N(-1, sigma), N(0.5, sigma) and N(0.6,sigma).
stan_code <- "
  data {
    int N;
    vector[N] y;
    real mu_fixed;
  }
  parameters {
    real<lower=0> sigma;
  }
  model {
    sigma ~ exponential(1);
    y ~ normal(mu_fixed, sigma);
  }
  generated quantities {
    vector[N] log_lik;
    for (n in 1:N) log_lik[n] = normal_lpdf(y[n]| mu_fixed, sigma);
  }"
mod <- stan_model(model_code = stan_code)
fit1 <- sampling(mod, data=list(N=N, y=y, mu_fixed=-1))
fit2 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.5))
fit3 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.6))
model_list <- list(fit1, fit2, fit3)
log_lik_list <- lapply(model_list, extract_log_lik)
# optional but recommended
r_eff_list <- lapply(model_list, function(x) {
  ll_array <- extract_log_lik(x, merge_chains = FALSE)
  relative_eff(exp(ll_array))
})
# stacking method:
wts1 <- loo_model_weights(
  log_lik_list,
  method = "stacking",
  r_eff_list = r_eff_list,
  optim_control = list(reltol=1e-10)
)
print(wts1)
# can also pass a list of psis_loo objects to avoid recomputing loo
loo_list <- lapply(1:length(log_lik_list), function(j) {
  loo(log_lik_list[[j]], r_eff = r_eff_list[[j]])
})
wts2 <- loo_model_weights(
  loo_list,
  method = "stacking",
  optim_control = list(reltol=1e-10)
)
all.equal(wts1, wts2)
# can provide names to be used in the results
loo_model_weights(setNames(loo_list, c("A", "B", "C")))
# pseudo-BMA+ method:
set.seed(1414)
loo_model_weights(loo_list, method = "pseudobma")
# pseudo-BMA method (set BB = FALSE):
loo_model_weights(loo_list, method = "pseudobma", BB = FALSE)
# calling stacking_weights or pseudobma_weights directly
lpd1 <- loo(log_lik_list[[1]], r_eff = r_eff_list[[1]])$pointwise[,1]
lpd2 <- loo(log_lik_list[[2]], r_eff = r_eff_list[[2]])$pointwise[,1]
lpd3 <- loo(log_lik_list[[3]], r_eff = r_eff_list[[3]])$pointwise[,1]
stacking_weights(cbind(lpd1, lpd2, lpd3))
pseudobma_weights(cbind(lpd1, lpd2, lpd3))
pseudobma_weights(cbind(lpd1, lpd2, lpd3), BB = FALSE)
## End(Not run)
Moment matching for efficient approximate leave-one-out cross-validation (LOO)
Description
Moment matching algorithm for updating a loo object when Pareto k estimates are large.
Usage
loo_moment_match(x, ...)
## Default S3 method:
loo_moment_match(
  x,
  loo,
  post_draws,
  log_lik_i,
  unconstrain_pars,
  log_prob_upars,
  log_lik_i_upars,
  max_iters = 30L,
  k_threshold = NULL,
  split = TRUE,
  cov = TRUE,
  cores = getOption("mc.cores", 1),
  ...
)
Arguments
| x | A fitted model object. | 
| ... | Further arguments passed to the custom functions documented above. | 
| loo | A loo object to be modified. | 
| post_draws | A function the takes  | 
| log_lik_i | A function that takes  | 
| unconstrain_pars | A function that takes arguments  | 
| log_prob_upars | A function that takes arguments  | 
| log_lik_i_upars | A function that takes arguments  | 
| max_iters | Maximum number of moment matching iterations. Usually this
does not need to be modified. If the maximum number of iterations is
reached, there will be a warning, and increasing  | 
| k_threshold | Threshold value for Pareto k values above which the moment
matching algorithm is used. The default value is  | 
| split | Logical; Indicate whether to do the split transformation or not at the end of moment matching for each LOO fold. | 
| cov | Logical; Indicate whether to match the covariance matrix of the
samples or not. If  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
Details
The loo_moment_match() function is an S3 generic and we provide a
default method that takes as arguments user-specified functions
post_draws, log_lik_i, unconstrain_pars, log_prob_upars, and
log_lik_i_upars. All of these functions should take .... as an argument
in addition to those specified for each function.
Value
The loo_moment_match() methods return an updated loo object. The
structure of the updated loo object is similar, but the method also
stores the original Pareto k diagnostic values in the diagnostics field.
Methods (by class)
-  loo_moment_match(default): A default method that takes as arguments a user-specified model objectx, alooobject and user-specified functionspost_draws,log_lik_i,unconstrain_pars,log_prob_upars, andlog_lik_i_upars.
References
Paananen, T., Piironen, J., Buerkner, P.-C., Vehtari, A. (2021). Implicitly adaptive importance sampling. Statistics and Computing, 31, 16. doi:10.1007/s11222-020-09982-2. arXiv preprint arXiv:1906.08850.
See Also
loo(), loo_moment_match_split()
Examples
# See the vignette for loo_moment_match()
Split moment matching for efficient approximate leave-one-out cross-validation (LOO)
Description
A function that computes the split moment matching importance sampling loo. Takes in the moment matching total transformation, transforms only half of the draws, and computes a single elpd using multiple importance sampling.
Usage
loo_moment_match_split(
  x,
  upars,
  cov,
  total_shift,
  total_scaling,
  total_mapping,
  i,
  log_prob_upars,
  log_lik_i_upars,
  r_eff_i,
  cores,
  is_method,
  ...
)
Arguments
| x | A fitted model object. | 
| upars | A matrix containing the model parameters in unconstrained space where they can have any real value. | 
| cov | Logical; Indicate whether to match the covariance matrix of the
samples or not. If  | 
| total_shift | A vector representing the total shift made by the moment matching algorithm. | 
| total_scaling | A vector representing the total scaling of marginal variance made by the moment matching algorithm. | 
| total_mapping | A vector representing the total covariance transformation made by the moment matching algorithm. | 
| i | Observation index. | 
| log_prob_upars | A function that takes arguments  | 
| log_lik_i_upars | A function that takes arguments  | 
| r_eff_i | MCMC relative effective sample size of the  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
| is_method | The importance sampling method to use. The following methods are implemented: | 
| ... | Further arguments passed to the custom functions documented above. | 
Value
A list containing the updated log-importance weights and log-likelihood values. Also returns the updated MCMC effective sample size and the integrand-specific log-importance weights.
References
Paananen, T., Piironen, J., Buerkner, P.-C., Vehtari, A. (2021). Implicitly adaptive importance sampling. Statistics and Computing, 31, 16. doi:10.1007/s11222-020-09982-2. arXiv preprint arXiv:1906.08850.
See Also
Estimate leave-one-out predictive performance..
Description
The loo_predictive_metric() function computes estimates of leave-one-out
predictive metrics given a set of predictions and observations. Currently
supported metrics are mean absolute error, mean squared error and root mean
squared error for continuous predictions and accuracy and balanced accuracy
for binary classification. Predictions are passed on to the E_loo()
function, so this function assumes that the PSIS approximation is working
well.
Usage
loo_predictive_metric(x, ...)
## S3 method for class 'matrix'
loo_predictive_metric(
  x,
  y,
  log_lik,
  ...,
  metric = c("mae", "rmse", "mse", "acc", "balanced_acc"),
  r_eff = 1,
  cores = getOption("mc.cores", 1)
)
Arguments
| x | A numeric matrix of predictions. | 
| ... | Additional arguments passed on to  | 
| y | A numeric vector of observations. Length should be equal to the
number of rows in  | 
| log_lik | A matrix of pointwise log-likelihoods. Should be of same
dimension as  | 
| metric | The type of predictive metric to be used. Currently
supported options are  
 | 
| r_eff | A Vector of relative effective sample size estimates containing
one element per observation. See  | 
| cores | The number of cores to use for parallelization of  | 
Value
A list with the following components:
- estimate
- 
Estimate of the given metric. 
- se
- 
Standard error of the estimate. 
Examples
if (requireNamespace("rstanarm", quietly = TRUE)) {
# Use rstanarm package to quickly fit a model and get both a log-likelihood
# matrix and draws from the posterior predictive distribution
library("rstanarm")
# data from help("lm")
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
d <- data.frame(
  weight = c(ctl, trt),
  group = gl(2, 10, 20, labels = c("Ctl","Trt"))
)
fit <- stan_glm(weight ~ group, data = d, refresh = 0)
ll <- log_lik(fit)
r_eff <- relative_eff(exp(-ll), chain_id = rep(1:4, each = 1000))
mu_pred <- posterior_epred(fit)
# Leave-one-out mean absolute error of predictions
mae <- loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll,
                            pred_error = 'mae', r_eff = r_eff)
# Leave-one-out 90%-quantile of mean absolute error
mae_90q <- loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll,
                                pred_error = 'mae', r_eff = r_eff,
                                type = 'quantile', probs = 0.9)
}
Efficient approximate leave-one-out cross-validation (LOO) using subsampling, so that less costly and more approximate computation is made for all LOO-fold, and more costly and accurate computations are made only for m<N LOO-folds.
Description
Efficient approximate leave-one-out cross-validation (LOO) using subsampling, so that less costly and more approximate computation is made for all LOO-fold, and more costly and accurate computations are made only for m<N LOO-folds.
Usage
loo_subsample(x, ...)
## S3 method for class ''function''
loo_subsample(
  x,
  ...,
  data = NULL,
  draws = NULL,
  observations = 400,
  log_p = NULL,
  log_g = NULL,
  r_eff = 1,
  save_psis = FALSE,
  cores = getOption("mc.cores", 1),
  loo_approximation = "plpd",
  loo_approximation_draws = NULL,
  estimator = "diff_srs",
  llgrad = NULL,
  llhess = NULL
)
Arguments
| x | A function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs. | 
| data,draws,... | For  | 
| observations | The subsample observations to use. The argument can take four (4) types of arguments: 
 | 
| log_p,log_g | Should be supplied only if approximate posterior draws are
used. The default ( | 
| r_eff | Vector of relative effective sample size estimates for the
likelihood ( | 
| save_psis | Should the  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
| loo_approximation | What type of approximation of the loo_i's should be used?
The default is  
 As point estimates of  | 
| loo_approximation_draws | The number of posterior draws used when
integrating over the posterior. This is used if  | 
| estimator | How should  
 | 
| llgrad | The gradient of the log-likelihood. This
is only used when  | 
| llhess | The Hessian of the log-likelihood. This is only used
with  | 
Details
The loo_subsample() function is an S3 generic and a methods is
currently provided for log-likelihood functions. The implementation works
for both MCMC and for posterior approximations where it is possible to
compute the log density for the approximation.
Value
loo_subsample() returns a named list with class c("psis_loo_ss", "psis_loo", "loo"). This has the same structure as objects returned by
loo() but with the additional slot:
-  loo_subsampling: A list with two vectors,log_pandlog_g, of the same length containing the posterior density and the approximation density for the individual draws.
Methods (by class)
-  loo_subsample(`function`): A functionf()that takes argumentsdata_ianddrawsand returns a vector containing the log-likelihood for a single observationievaluated at each posterior draw. The function should be written such that, for each observationiin1:N, evaluatingf(data_i = data[i,, drop=FALSE], draws = draws) results in a vector of length S(size of posterior sample). The log-likelihood function can also have additional arguments butdata_ianddrawsare required.If using the function method then the arguments dataanddrawsmust also be specified in the call toloo():-  data: A data frame or matrix containing the data (e.g. observed outcome and predictors) needed to compute the pointwise log-likelihood. For each observationi, theith row ofdatawill be passed to thedata_iargument of the log-likelihood function.
-  draws: An object containing the posterior draws for any parameters needed to compute the pointwise log-likelihood. Unlikedata, which is indexed by observation, for each observation the entire objectdrawswill be passed to thedrawsargument of the log-likelihood function.
- The - ...can be used if your log-likelihood function takes additional arguments. These arguments are used like the- drawsargument in that they are recycled for each observation.
 
-  
References
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019). Leave-One-Out Cross-Validation for Large Data. In Thirty-sixth International Conference on Machine Learning, PMLR 97:4244-4253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020). Leave-One-Out Cross-Validation for Model Comparison in Large Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341-351.
See Also
Named lists
Description
Create a named list using specified names or, if names are omitted, using the
names of the objects in the list. The code list(a = a, b = b) becomes
nlist(a,b) and list(a = a, b = 2) becomes nlist(a, b = 2), etc.
Usage
nlist(...)
Arguments
| ... | Objects to include in the list. | 
Value
A named list.
Examples
# All variables already defined
a <- rnorm(100)
b <- mat.or.vec(10, 3)
nlist(a,b)
# Define some variables in the call and take the rest from the environment
nlist(a, b, veggies = c("lettuce", "spinach"), fruits = c("banana", "papaya"))
The number of observations in a psis_loo_ss object.
Description
The number of observations in a psis_loo_ss object.
Usage
## S3 method for class 'psis_loo_ss'
nobs(object, ...)
Arguments
| object | a  | 
| ... | Currently unused. | 
Get observation indices used in subsampling
Description
Get observation indices used in subsampling
Usage
obs_idx(x, rep = TRUE)
Arguments
| x | A  | 
| rep | If sampling with replacement is used, an observation can have
multiple samples and these are then repeated in the returned object if
 | 
Value
An integer vector.
Extractor methods
Description
These are only defined in order to deprecate with a warning (rather than
remove and break backwards compatibility) the old way of accessing the point
estimates in a "psis_loo" or "psis" object. The new way as of
v2.0.0 is to get them from the "estimates" component of the object.
Usage
## S3 method for class 'loo'
x[i]
## S3 method for class 'loo'
x[[i, exact = TRUE]]
## S3 method for class 'loo'
x$name
Arguments
| x,i,exact,name | See Extract. | 
Parallel psis list computations
Description
Parallel psis list computations
Usage
parallel_psis_list(
  N,
  .loo_i,
  .llfun,
  data,
  draws,
  r_eff,
  save_psis,
  cores,
  ...
)
parallel_importance_sampling_list(
  N,
  .loo_i,
  .llfun,
  data,
  draws,
  r_eff,
  save_psis,
  cores,
  method,
  ...
)
Arguments
| N | The total number of observations (i.e.  | 
| .loo_i | The function used to compute individual loo contributions. | 
| .llfun | See  | 
| data,draws,... | For the  | 
| r_eff | Vector of relative effective sample size estimates for the
likelihood ( | 
| save_psis | Should the  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
| method | See  | 
Details
Refactored function to handle parallel computations for psis_list
Diagnostics for Pareto smoothed importance sampling (PSIS)
Description
Print a diagnostic table summarizing the estimated Pareto shape parameters
and PSIS effective sample sizes, find the indexes of observations for which
the estimated Pareto shape parameter k is larger than some
threshold value, or plot observation indexes vs. diagnostic estimates.
The Details section below provides a brief overview of the
diagnostics, but we recommend consulting Vehtari, Gelman, and Gabry (2017)
and Vehtari, Simpson, Gelman, Yao, and Gabry (2024) for full details.
Usage
pareto_k_table(x)
pareto_k_ids(x, threshold = NULL)
pareto_k_values(x)
pareto_k_influence_values(x)
psis_n_eff_values(x)
mcse_loo(x, threshold = NULL)
## S3 method for class 'psis_loo'
plot(
  x,
  diagnostic = c("k", "ESS", "n_eff"),
  ...,
  label_points = FALSE,
  main = "PSIS diagnostic plot"
)
## S3 method for class 'psis'
plot(
  x,
  diagnostic = c("k", "ESS", "n_eff"),
  ...,
  label_points = FALSE,
  main = "PSIS diagnostic plot"
)
Arguments
| x | |
| threshold | For  | 
| diagnostic | For the  | 
| label_points,... | For the  | 
| main | For the  | 
Details
The reliability and approximate convergence rate of the PSIS-based
estimates can be assessed using the estimates for the shape
parameter k of the generalized Pareto distribution. The
diagnostic threshold for Pareto k depends on sample size
S (sample size dependent threshold was introduced by Vehtari
et al. (2024), and before that fixed thresholds of 0.5 and 0.7 were
recommended). For simplicity, loo package uses the nominal sample
size S when computing the sample size specific
threshold. This provides an optimistic threshold if the effective
sample size is less than 2200, but if MCMC-ESS > S/2 the difference
is usually negligible. Thinning of MCMC draws can be used to
improve the ratio ESS/S.
- If - k < min(1 - 1 / log10(S), 0.7), where- Sis the sample size, the PSIS estimate and the corresponding Monte Carlo standard error estimate are reliable.
- If - 1 - 1 / log10(S) <= k < 0.7, the PSIS estimate and the corresponding Monte Carlo standard error estimate are not reliable, but increasing the (effective) sample size- Sabove 2200 may help (this will increase the sample size specific threshold- (1-1/log10(2200)>0.7and then the bias specific threshold 0.7 dominates).
- If - 0.7 <= k < 1, the PSIS estimate and the corresponding Monte Carlo standard error have large bias and are not reliable. Increasing the sample size may reduce the variability in- kestimate, which may result in lower- kestimate, too.
- If - k \geq 1, the target distribution is estimated to have a non-finite mean. The PSIS estimate and the corresponding Monte Carlo standard error are not well defined. Increasing the sample size may reduce the variability in the- kestimate, which may also result in a lower- kestimate.
What if the estimated tail shape parameter k
exceeds the diagnostic threshold?
 Importance sampling is likely to
work less well if the marginal posterior p(\theta^s | y) and
LOO posterior p(\theta^s | y_{-i}) are very different, which
is more likely to happen with a non-robust model and highly
influential observations.  If the estimated tail shape parameter
k exceeds the diagnostic threshold, the user should be
warned. (Note: If k is greater than the diagnostic threshold
then WAIC is also likely to fail, but WAIC lacks as accurate
diagnostic.)  When using PSIS in the context of approximate LOO-CV,
we recommend one of the following actions:
- With some additional computations, it is possible to transform the MCMC draws from the posterior distribution to obtain more reliable importance sampling estimates. This results in a smaller shape parameter - k. See- loo_moment_match()and the vignette Avoiding model refits in leave-one-out cross-validation with moment matching for an example of this.
- Sampling from a leave-one-out mixture distribution (see the vignette Mixture IS leave-one-out cross-validation for high-dimensional Bayesian models), directly from - p(\theta^s | y_{-i})for the problematic observations- i, or using- K-fold cross-validation (see the vignette Holdout validation and K-fold cross-validation of Stan programs with the loo package) will generally be more stable.
- Using a model that is more robust to anomalous observations will generally make approximate LOO-CV more stable. 
Observation influence statistics
 The estimated shape parameter
k for each observation can be used as a measure of the observation's
influence on posterior distribution of the model. These can be obtained with
pareto_k_influence_values().
Effective sample size and error estimates
 In the case that we
obtain the samples from the proposal distribution via MCMC the loo
package also computes estimates for the Monte Carlo error and the effective
sample size for importance sampling, which are more accurate for PSIS than
for IS and TIS (see Vehtari et al (2024) for details). However, the PSIS
effective sample size estimate will be
over-optimistic when the estimate of k is greater than
min(1-1/log10(S), 0.7), where S is the sample size.
Value
pareto_k_table() returns an object of class
"pareto_k_table", which is a matrix with columns "Count",
"Proportion", and "Min. n_eff", and has its own print method.
pareto_k_ids() returns an integer vector indicating which
observations have Pareto k estimates above threshold.
pareto_k_values() returns a vector of the estimated Pareto
k parameters. These represent the reliability of sampling.
pareto_k_influence_values() returns a vector of the estimated Pareto
k parameters. These represent influence of the observations on the
model posterior distribution.
psis_n_eff_values() returns a vector of the estimated PSIS
effective sample sizes.
mcse_loo() returns the Monte Carlo standard error (MCSE)
estimate for PSIS-LOO. MCSE will be NA if any Pareto k values are
above threshold.
The plot() method is called for its side effect and does not
return anything. If x is the result of a call to loo()
or psis() then plot(x, diagnostic) produces a plot of
the estimates of the Pareto shape parameters (diagnostic = "k") or
estimates of the PSIS effective sample sizes (diagnostic = "ESS").
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
See Also
-  psis()for the implementation of the PSIS algorithm.
- The FAQ page on the loo website for answers to frequently asked questions. 
Convenience function for extracting pointwise estimates
Description
Convenience function for extracting pointwise estimates
Usage
pointwise(x, estimate, ...)
## S3 method for class 'loo'
pointwise(x, estimate, ...)
Arguments
| x | A  | 
| estimate | Which pointwise estimate to return. By default all are
returned. The objects returned by the different functions ( | 
| ... | Currently ignored. | 
Value
A vector of length equal to the number of observations.
Examples
x <- loo(example_loglik_array())
pointwise(x, "elpd_loo")
Print methods
Description
Print methods
Usage
## S3 method for class 'loo'
print(x, digits = 1, ...)
## S3 method for class 'waic'
print(x, digits = 1, ...)
## S3 method for class 'psis_loo'
print(x, digits = 1, plot_k = FALSE, ...)
## S3 method for class 'importance_sampling_loo'
print(x, digits = 1, plot_k = FALSE, ...)
## S3 method for class 'psis_loo_ap'
print(x, digits = 1, plot_k = FALSE, ...)
## S3 method for class 'psis'
print(x, digits = 1, plot_k = FALSE, ...)
## S3 method for class 'importance_sampling'
print(x, digits = 1, plot_k = FALSE, ...)
Arguments
| x | |
| digits | An integer passed to  | 
| ... | Arguments passed to  | 
| plot_k | Logical. If  | 
Value
x, invisibly.
See Also
Print dimensions of log-likelihood or log-weights matrix
Description
Print dimensions of log-likelihood or log-weights matrix
Usage
print_dims(x, ...)
## S3 method for class 'importance_sampling'
print_dims(x, ...)
## S3 method for class 'psis_loo'
print_dims(x, ...)
## S3 method for class 'importance_sampling_loo'
print_dims(x, ...)
## S3 method for class 'waic'
print_dims(x, ...)
## S3 method for class 'kfold'
print_dims(x, ...)
## S3 method for class 'psis_loo_ss'
print_dims(x, ...)
Arguments
| x | |
| ... | Ignored. | 
Pareto smoothed importance sampling (PSIS)
Description
Implementation of Pareto smoothed importance sampling (PSIS), a method for stabilizing importance ratios. The version of PSIS implemented here corresponds to the algorithm presented in Vehtari, Simpson, Gelman, Yao, and Gabry (2024). For PSIS diagnostics see the pareto-k-diagnostic page.
Usage
psis(log_ratios, ...)
## S3 method for class 'array'
psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1))
## S3 method for class 'matrix'
psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1))
## Default S3 method:
psis(log_ratios, ..., r_eff = 1)
is.psis(x)
is.sis(x)
is.tis(x)
Arguments
| log_ratios | An array, matrix, or vector of importance ratios on the log scale (for PSIS-LOO these are negative log-likelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. | 
| ... | Arguments passed on to the various methods. | 
| r_eff | Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
| x | For  | 
Value
The psis() methods return an object of class "psis",
which is a named list with the following components:
- log_weights
- 
Vector or matrix of smoothed (and truncated) but unnormalized log weights. To get normalized weights use the weights()method provided for objects of class"psis".
- diagnostics
- 
A named list containing two vectors: -  pareto_k: Estimates of the shape parameterkof the generalized Pareto distribution. See the pareto-k-diagnostic page for details.
-  n_eff: PSIS effective sample size estimates.
 
-  
Objects of class "psis" also have the following attributes:
- norm_const_log
- 
Vector of precomputed values of colLogSumExps(log_weights)that are used internally by theweightsmethod to normalize the log weights.
- tail_len
- 
Vector of tail lengths used for fitting the generalized Pareto distribution. 
- r_eff
- 
If specified, the user's r_effargument.
- dims
- 
Integer vector of length 2 containing S(posterior sample size) andN(number of observations).
- method
- 
Method used for importance sampling, here psis.
Methods (by class)
-  psis(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number of chains, andNis the number of data points.
-  psis(matrix): AnSbyNmatrix, whereSis the size of the posterior sample (with all chains merged) andNis the number of data points.
-  psis(default): A vector of lengthS(posterior sample size).
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
See Also
-  loo()for approximate LOO-CV using PSIS.
-  pareto-k-diagnostic for PSIS diagnostics. 
- The loo package vignettes for demonstrations. 
- The FAQ page on the loo website for answers to frequently asked questions. 
Examples
log_ratios <- -1 * example_loglik_array()
r_eff <- relative_eff(exp(-log_ratios))
psis_result <- psis(log_ratios, r_eff = r_eff)
str(psis_result)
plot(psis_result)
# extract smoothed weights
lw <- weights(psis_result) # default args are log=TRUE, normalize=TRUE
ulw <- weights(psis_result, normalize=FALSE) # unnormalized log-weights
w <- weights(psis_result, log=FALSE) # normalized weights (not log-weights)
uw <- weights(psis_result, log=FALSE, normalize = FALSE) # unnormalized weights
Diagnostics for Laplace and ADVI approximations and Laplace-loo and ADVI-loo
Description
Diagnostics for Laplace and ADVI approximations and Laplace-loo and ADVI-loo
Usage
psis_approximate_posterior(
  log_p = NULL,
  log_g = NULL,
  log_liks = NULL,
  cores,
  save_psis,
  ...,
  log_q = NULL
)
Arguments
| log_p | The log-posterior (target) evaluated at S samples from the proposal distribution (g). A vector of length S. | 
| log_g | The log-density (proposal) evaluated at S samples from the proposal distribution (g). A vector of length S. | 
| log_liks | A log-likelihood matrix of size S * N, where N is the number
of observations and S is the number of samples from q. See
 | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
| save_psis | Should the  | 
| log_q | Deprecated argument name (the same as log_g). | 
Value
If log likelihoods are supplied, the function returns a "loo" object,
otherwise the function returns a "psis" object.
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
See Also
Pareto smoothed importance sampling (deprecated, old version)
Description
As of version 2.0.0 this function is deprecated. Please use the
psis() function for the new PSIS algorithm.
Usage
psislw(
  lw,
  wcp = 0.2,
  wtrunc = 3/4,
  cores = getOption("mc.cores", 1),
  llfun = NULL,
  llargs = NULL,
  ...
)
Arguments
| lw | A matrix or vector of log weights. For computing LOO,  | 
| wcp | The proportion of importance weights to use for the generalized
Pareto fit. The  | 
| wtrunc | For truncating very large weights to  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  | 
| llfun,llargs | See  | 
| ... | Ignored when  | 
Value
A named list with components lw_smooth (modified log weights) and
pareto_k (estimated generalized Pareto shape parameter(s) k).
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
See Also
pareto-k-diagnostic for PSIS diagnostics.
Convenience function for computing relative efficiencies
Description
relative_eff() computes the the MCMC effective sample size divided by
the total sample size.
Usage
relative_eff(x, ...)
## Default S3 method:
relative_eff(x, chain_id, ...)
## S3 method for class 'matrix'
relative_eff(x, chain_id, ..., cores = getOption("mc.cores", 1))
## S3 method for class 'array'
relative_eff(x, ..., cores = getOption("mc.cores", 1))
## S3 method for class ''function''
relative_eff(
  x,
  chain_id,
  ...,
  cores = getOption("mc.cores", 1),
  data = NULL,
  draws = NULL
)
## S3 method for class 'importance_sampling'
relative_eff(x, ...)
Arguments
| x | A vector, matrix, 3-D array, or function. See the Methods (by
class) section below for details on specifying  | 
| chain_id | A vector of length  | 
| cores | The number of cores to use for parallelization. | 
| data,draws,... | Same as for the  | 
Value
A vector of relative effective sample sizes.
Methods (by class)
-  relative_eff(default): A vector of lengthS(posterior sample size).
-  relative_eff(matrix): AnSbyNmatrix, whereSis the size of the posterior sample (with all chains merged) andNis the number of data points.
-  relative_eff(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number of chains, andNis the number of data points.
-  relative_eff(`function`): A functionf()that takes argumentsdata_ianddrawsand returns a vector containing the log-likelihood for a single observationievaluated at each posterior draw. The function should be written such that, for each observationiin1:N, evaluatingf(data_i = data[i,, drop=FALSE], draws = draws) results in a vector of length S(size of posterior sample). The log-likelihood function can also have additional arguments butdata_ianddrawsare required.If using the function method then the arguments dataanddrawsmust also be specified in the call toloo():-  data: A data frame or matrix containing the data (e.g. observed outcome and predictors) needed to compute the pointwise log-likelihood. For each observationi, theith row ofdatawill be passed to thedata_iargument of the log-likelihood function.
-  draws: An object containing the posterior draws for any parameters needed to compute the pointwise log-likelihood. Unlikedata, which is indexed by observation, for each observation the entire objectdrawswill be passed to thedrawsargument of the log-likelihood function.
- The - ...can be used if your log-likelihood function takes additional arguments. These arguments are used like the- drawsargument in that they are recycled for each observation.
 
-  
-  relative_eff(importance_sampling): Ifxis an object of class"psis",relative_eff()simply returns ther_effattribute ofx.
Examples
LLarr <- example_loglik_array()
LLmat <- example_loglik_matrix()
dim(LLarr)
dim(LLmat)
rel_n_eff_1 <- relative_eff(exp(LLarr))
rel_n_eff_2 <- relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500))
all.equal(rel_n_eff_1, rel_n_eff_2)
Standard importance sampling (SIS)
Description
Implementation of standard importance sampling (SIS).
Usage
sis(log_ratios, ...)
## S3 method for class 'array'
sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1))
## S3 method for class 'matrix'
sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1))
## Default S3 method:
sis(log_ratios, ..., r_eff = NULL)
Arguments
| log_ratios | An array, matrix, or vector of importance ratios on the log scale (for Importance sampling LOO, these are negative log-likelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. | 
| ... | Arguments passed on to the various methods. | 
| r_eff | Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
Value
The sis() methods return an object of class "sis",
which is a named list with the following components:
- log_weights
- 
Vector or matrix of smoothed but unnormalized log weights. To get normalized weights use the weights()method provided for objects of classsis.
- diagnostics
- 
A named list containing one vector: -  pareto_k: Not used insis, all set to 0.
-  n_eff: effective sample size estimates.
 
-  
Objects of class "sis" also have the following attributes:
- norm_const_log
- 
Vector of precomputed values of colLogSumExps(log_weights)that are used internally by theweightsmethod to normalize the log weights.
- r_eff
- 
If specified, the user's r_effargument.
- tail_len
- 
Not used for sis.
- dims
- 
Integer vector of length 2 containing S(posterior sample size) andN(number of observations).
- method
- 
Method used for importance sampling, here sis.
Methods (by class)
-  sis(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number of chains, andNis the number of data points.
-  sis(matrix): AnSbyNmatrix, whereSis the size of the posterior sample (with all chains merged) andNis the number of data points.
-  sis(default): A vector of lengthS(posterior sample size).
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
See Also
-  psis()for approximate LOO-CV using PSIS.
-  loo()for approximate LOO-CV.
-  pareto-k-diagnostic for PSIS diagnostics. 
Examples
log_ratios <- -1 * example_loglik_array()
r_eff <- relative_eff(exp(-log_ratios))
sis_result <- sis(log_ratios, r_eff = r_eff)
str(sis_result)
# extract smoothed weights
lw <- weights(sis_result) # default args are log=TRUE, normalize=TRUE
ulw <- weights(sis_result, normalize=FALSE) # unnormalized log-weights
w <- weights(sis_result, log=FALSE) # normalized weights (not log-weights)
uw <- weights(sis_result, log=FALSE, normalize = FALSE) # unnormalized weights
Truncated importance sampling (TIS)
Description
Implementation of truncated (self-normalized) importance sampling (TIS), truncated at S^(1/2) as recommended by Ionides (2008).
Usage
tis(log_ratios, ...)
## S3 method for class 'array'
tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1))
## S3 method for class 'matrix'
tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1))
## Default S3 method:
tis(log_ratios, ..., r_eff = 1)
Arguments
| log_ratios | An array, matrix, or vector of importance ratios on the log scale (for Importance sampling LOO, these are negative log-likelihood values). See the Methods (by class) section below for a detailed description of how to specify the inputs for each method. | 
| ... | Arguments passed on to the various methods. | 
| r_eff | Vector of relative effective sample size estimates containing
one element per observation. The values provided should be the relative
effective sample sizes of  | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
Value
The tis() methods return an object of class "tis",
which is a named list with the following components:
- log_weights
- 
Vector or matrix of smoothed (and truncated) but unnormalized log weights. To get normalized weights use the weights()method provided for objects of classtis.
- diagnostics
- 
A named list containing one vector: -  pareto_k: Not used intis, all set to 0.
-  n_eff: Effective sample size estimates.
 
-  
Objects of class "tis" also have the following attributes:
- norm_const_log
- 
Vector of precomputed values of colLogSumExps(log_weights)that are used internally by theweights()method to normalize the log weights.
- r_eff
- 
If specified, the user's r_effargument.
- tail_len
- 
Not used for tis.
- dims
- 
Integer vector of length 2 containing S(posterior sample size) andN(number of observations).
- method
- 
Method used for importance sampling, here tis.
Methods (by class)
-  tis(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number of chains, andNis the number of data points.
-  tis(matrix): AnSbyNmatrix, whereSis the size of the posterior sample (with all chains merged) andNis the number of data points.
-  tis(default): A vector of lengthS(posterior sample size).
References
Ionides, Edward L. (2008). Truncated importance sampling. Journal of Computational and Graphical Statistics 17(2): 295–311.
See Also
-  psis()for approximate LOO-CV using PSIS.
-  loo()for approximate LOO-CV.
-  pareto-k-diagnostic for PSIS diagnostics. 
Examples
log_ratios <- -1 * example_loglik_array()
r_eff <- relative_eff(exp(-log_ratios))
tis_result <- tis(log_ratios, r_eff = r_eff)
str(tis_result)
# extract smoothed weights
lw <- weights(tis_result) # default args are log=TRUE, normalize=TRUE
ulw <- weights(tis_result, normalize=FALSE) # unnormalized log-weights
w <- weights(tis_result, log=FALSE) # normalized weights (not log-weights)
uw <- weights(tis_result, log=FALSE, normalize = FALSE) # unnormalized weights
Update psis_loo_ss objects
Description
Update psis_loo_ss objects
Usage
## S3 method for class 'psis_loo_ss'
update(
  object,
  ...,
  data = NULL,
  draws = NULL,
  observations = NULL,
  r_eff = 1,
  cores = getOption("mc.cores", 1),
  loo_approximation = NULL,
  loo_approximation_draws = NULL,
  llgrad = NULL,
  llhess = NULL
)
Arguments
| object | A  | 
| ... | Currently not used. | 
| data,draws | |
| observations | The subsample observations to use. The argument can take four (4) types of arguments: 
 | 
| r_eff | Vector of relative effective sample size estimates for the
likelihood ( | 
| cores | The number of cores to use for parallelization. This defaults to
the option  
 | 
| loo_approximation | What type of approximation of the loo_i's should be used?
The default is  
 As point estimates of  | 
| loo_approximation_draws | The number of posterior draws used when
integrating over the posterior. This is used if  | 
| llgrad | The gradient of the log-likelihood. This
is only used when  | 
| llhess | The Hessian of the log-likelihood. This is only used
with  | 
Details
If observations is updated then if a vector of indices or a psis_loo_ss
object is supplied the updated object will have exactly the observations
indicated by the vector or psis_loo_ss object. If a single integer is
supplied, new observations will be sampled to reach the supplied sample size.
Value
A psis_loo_ss object.
Widely applicable information criterion (WAIC)
Description
The waic() methods can be used to compute WAIC from the pointwise
log-likelihood. However, we recommend LOO-CV using PSIS (as implemented by
the loo() function) because PSIS provides useful diagnostics as well as
effective sample size and Monte Carlo estimates.
Usage
waic(x, ...)
## S3 method for class 'array'
waic(x, ...)
## S3 method for class 'matrix'
waic(x, ...)
## S3 method for class ''function''
waic(x, ..., data = NULL, draws = NULL)
is.waic(x)
Arguments
| x | A log-likelihood array, matrix, or function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs for each method. | 
| draws,data,... | For the function method only. See the Methods (by class) section below for details on these arguments. | 
Value
A named list (of class c("waic", "loo")) with components:
- estimates
- 
A matrix with two columns ( "Estimate","SE") and three rows ("elpd_waic","p_waic","waic"). This contains point estimates and standard errors of the expected log pointwise predictive density (elpd_waic), the effective number of parameters (p_waic) and the information criterionwaic(which is just-2 * elpd_waic, i.e., converted to deviance scale).
- pointwise
- 
A matrix with three columns (and number of rows equal to the number of observations) containing the pointwise contributions of each of the above measures ( elpd_waic,p_waic,waic).
Methods (by class)
-  waic(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number of chains, andNis the number of data points.
-  waic(matrix): AnSbyNmatrix, whereSis the size of the posterior sample (with all chains merged) andNis the number of data points.
-  waic(`function`): A functionf()that takes argumentsdata_ianddrawsand returns a vector containing the log-likelihood for a single observationievaluated at each posterior draw. The function should be written such that, for each observationiin1:N, evaluatingf(data_i = data[i,, drop=FALSE], draws = draws) results in a vector of length S(size of posterior sample). The log-likelihood function can also have additional arguments butdata_ianddrawsare required.If using the function method then the arguments dataanddrawsmust also be specified in the call toloo():-  data: A data frame or matrix containing the data (e.g. observed outcome and predictors) needed to compute the pointwise log-likelihood. For each observationi, theith row ofdatawill be passed to thedata_iargument of the log-likelihood function.
-  draws: An object containing the posterior draws for any parameters needed to compute the pointwise log-likelihood. Unlikedata, which is indexed by observation, for each observation the entire objectdrawswill be passed to thedrawsargument of the log-likelihood function.
- The - ...can be used if your log-likelihood function takes additional arguments. These arguments are used like the- drawsargument in that they are recycled for each observation.
 
-  
References
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely application information criterion in singular learning theory. Journal of Machine Learning Research 11, 3571-3594.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
See Also
- The loo package vignettes and Vehtari, Gelman, and Gabry (2017) and Vehtari, Simpson, Gelman, Yao, and Gabry (2024) for more details on why we prefer - loo()to- waic().
-  loo_compare()for comparing models on approximate LOO-CV or WAIC.
Examples
### Array and matrix methods
LLarr <- example_loglik_array()
dim(LLarr)
LLmat <- example_loglik_matrix()
dim(LLmat)
waic_arr <- waic(LLarr)
waic_mat <- waic(LLmat)
identical(waic_arr, waic_mat)
## Not run: 
log_lik1 <- extract_log_lik(stanfit1)
log_lik2 <- extract_log_lik(stanfit2)
(waic1 <- waic(log_lik1))
(waic2 <- waic(log_lik2))
print(compare(waic1, waic2), digits = 2)
## End(Not run)
Extract importance sampling weights
Description
Extract importance sampling weights
Usage
## S3 method for class 'importance_sampling'
weights(object, ..., log = TRUE, normalize = TRUE)
Arguments
| object | |
| ... | Ignored. | 
| log | Should the weights be returned on the log scale? Defaults to
 | 
| normalize | Should the weights be normalized? Defaults to  | 
Value
The weights() method returns an object with the same dimensions as
the log_weights component of object. The normalize and log
arguments control whether the returned weights are normalized and whether
or not to return them on the log scale.
Examples
# See the examples at help("psis")