The neodistr package provides neo-normal
distribution functions for Bayesian modeling. This vignette explains how
to use it with bnrm
, brms
, and
Stan
code, and demonstrates how to fit Bayesian models
using the neo-normal distribution.
Note: This vignette contains computationally intensive examples.
To keep the vignette lightweight, we use pre-computed results for the examples # tambahkan penjelasan bahwa iterasi hanay 2000 supaya tdak besar dalam bahasa inggris
. The initial examples demonstrate the workflow using pre-computed data and results, while subsequent sections show the code needed to run the models interactively.
Install neodistr from CRAN or GitHub:
# From CRAN
install.packages("neodistr")
# From GitHub
#devtools::install_github("madsyair/neodistr")
We simulate a right‐skewed dataset of 100 observations from the
FOSSEP distribution (\(\mu=0\), \(\sigma=1\), \(\alpha=2\), and \(\beta=2\)) and perform basic exploratory
analysis. In this vignette, pre-computed data are loaded to streamline
the presentation. Pre-computed data are generated from the FOSSEP
distribution using the rfossep
function, which is part of
the neodistr package. The simulated data is stored in
an RDS file for easy access. The simulated data can be generated using
the following code. The exploratory analysis includes summary statistics
and a histogram of the simulated data.
#install.packages("R.utils")
#library(R.utils)
set.seed(400)
precomp_dir <- system.file("data_precomputed", package = "neodistr")
# Simulate data from FOSSEP distribution
y <- rfossep(50, 0, 1, 2, 2) # Simulated data from fossep distribution
df <- data.frame(y)
#saveRDS(y, file.path(precomp_dir, "fossep_data.rds")) # Save pre-computed data
#gzip(file.path(precomp_dir, "fossep_data.rds"))
# Basic exploration
summary(y)
hist(y, main = "Simulated FOSSEP Data", xlab = "Y", col = "skyblue", breaks = 20)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -1.0851 0.2071 0.9635 1.1622 1.9002 4.3928
We model y
with an intercept-only formula. Priors are
informed by the simulation parameters:
The prior predictive check is performed to assess the plausibility of
the priors by simulating data from the specified priors. This step helps
to ensure that the priors are reasonable and that they can generate data
similar to the observed data. To perform the prior predictive check, the
bnrm
function is used with the
sample_prior = "only"
argument. This will generate
simulated datasets from the priors without fitting the model to the
data.
# Run prior predictive check (computationally intensive)
ppc_fit_fossep <- bnrm(
formula = formula,
data = df,
family = fossep(),
prior = prior,
sample_prior = "only",
chains = 2, # Reduced for demo
cores = 1, # Single core for consistency
iter = 2000, # Reduced iterations
warmup = 1000,
refresh = 0 # Suppress output
)
# Show prior predictive check plot
ppc_plot_fossep <- pp_check(ppc_fit_fossep, type = "dens_overlay")
#saveRDS(ppc_plot_fossep, file.path(precomp_dir, "ppc_plot_fossep.rds")) # Save pre-computed plot file.path(precomp_dir, "fossep_data.rds")
print(ppc_plot_fossep)
The plot of the prior predictive check shows the distribution of the simulated data from the priors overlaid the observed data. This allows us to visually assess whether the priors are reasonable and can generate data similar to the observed data.
We fit the model using the bnrm
function, specifying the
formula, data, family, and priors defined earlier. The fitting process
uses MCMC sampling to estimate the posterior distributions of the
parameters.
# Fit the model (computationally intensive)
fit_fossep_bnrm <- bnrm(
formula = formula,
data = df,
family = fossep(),
prior = prior,
chains = 2,
cores = 1,
iter = 2000,
warmup = 1000,
seed = 123,
refresh = 0
)
#saveRDS(fit_fossep_bnrm, file.path(precomp_dir, "fit_fossep_bnrm.rds")) # Save pre-computed fit
After fitting the model, we can summarize the results to see the estimated parameters and their posterior distributions. The summary will include the mean, standard deviation, and credible intervals for each parameter. The convergence diagnostics will also be checked to ensure that the MCMC chains have mixed well and converged to the posterior distribution. Posterior predictive checks will be performed to assess the model fit and the ability of the model to predict new data.
summary(fit_fossep_bnrm)
# Convergence diagnostics
trace_plot <- mcmc_trace(fit_fossep_bnrm)
print(trace_plot)
# Posterior predictive check
pp_plot <- pp_check(fit_fossep_bnrm)
#saveRDS(pp_plot, file.path(precomp_dir, "pp_plot_fossep.rds")) # Save pre-computed plot
print(pp_plot)
#> Family: fossep
#> Links: mu = identity; sigma = identity; alpha = identity; beta = identity
#> Formula: y ~ 1
#> Data: data (Number of observations: 50)
#> Draws: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 2000
#>
#> Regression Coefficients:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept 0.35 0.33 -0.25 1.05 1.00 891 817
#>
#> Further Distributional Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma 1.07 0.27 0.55 1.62 1.01 640 692
#> alpha 1.66 0.36 1.10 2.40 1.00 887 697
#> beta 2.14 0.59 1.25 3.34 1.00 663 740
#>
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
In the following example, we demonstrate how to perform regression
modeling using the JSEP distribution. The code provided below shows how
to set up the regression model, specify the priors, and fit the model
using the bnrm
function.
set.seed(123)
x <- runif(50)
e <- rjsep(50, 0, 0.1, 0.8,2) # Simulated errors from JSEP distribution
y <- 0.5 + 0.8*x + e
xy<-cbind(y,x)
df_reg <- data.frame(y, x)
# Basic exploration
plot(x, y, main = "Regression Data", pch = 19, col = "steelblue")
#save pre-computed data
#saveRDS(xy, file.path(precomp_dir,"jsep_data.rds")) # Save pre-computed data
We specify the regression model using the brms
formula
interface. The regression formula is defined as y ~ x
,
where y
is the response variable and x
is the
predictor variable. The priors for the regression parameters are set as
follows: Prior of \(\alpha\) is set to
log-normal(0, 0.5), \(\beta\) is set to
log-normal(1, 0.5), \(\sigma\) is set
to half-normal(0, 1), and the intercept and slope are set to normal(0,
1). The code syntax for defining the model and these priors is shown
below.
# Define regression formula
formula_reg <- brms::bf(y ~ x)
# Priors for regression
prior_reg <- c(
set_prior("lognormal(log(2),0.25)", class = "alpha"),
set_prior("lognormal(log(2),0.25)", class = "beta"),
set_prior("normal(0,1)", class = "sigma"),
set_prior("normal(0,1)", class = "Intercept"),
set_prior("normal(0,1)", class = "b")
)
We evaluate our chosen priors via a prior predictive check, which
entails drawing simulated datasets solely from those priors to verify
they can plausibly reproduce the characteristics of the actual data. In
practice, this is done by calling
bnrm(..., sample_prior = "only")
, which generates the
prior‐based simulations without fitting the model to the observed
data.
# Fit regression model
fit_ppc_jsep <- bnrm(
formula = formula_reg,
data = df_reg,
family = jsep(),
prior = prior_reg,
chains = 2,
cores = 1,
iter = 2000,
sample_prior = "only", # Prior predictive check
warmup = 1000,
seed = 123,
refresh = 0
)
# prior predictive check fir jesp
ppc_plot_jsep<- pp_check(fit_ppc_jsep, type = "dens_overlay", nsamples = 200)
#saveRDS(ppc_plot_jsep, file.path(precomp_dir,"ppc_plot_jsep.rds")) # Save pre-computed plot
print(ppc_plot_jsep)
# Show prior predictive check plot
The prior predictive check plot overlays the data simulated from our priors onto the real observations, letting us visually judge whether those priors can plausibly reproduce the distribution of the actual data.
We fit the regression model using the bnrm
function,
specifying the formula, data, family, and priors defined earlier. The
fitting process uses MCMC sampling to estimate the posterior
distributions of the parameters.
fit_jsep_bnrm <- bnrm(
formula = formula_reg,
data = df_reg,
family = jsep(),
prior = prior_reg,
chains = 2,
cores = 1,
iter = 2000,
warmup = 1000,
seed = 123,
refresh = 0
)
#saveRDS(fit_jsep_bnrm, file.path(precomp_dir, "fit_jsep_bnrm.rds")) # Save pre-computed fit
After fitting the model, we can summarize the results to see the estimated parameters and their posterior distributions. The summary will include the mean, standard deviation, and credible intervals for each parameter. The convergence diagnostics will also be checked to ensure that the MCMC chains have mixed well and converged to the posterior distribution. Posterior predictive checks will be performed to assess the model fit and the ability of the model to predict new data.
summary(fit_jsep_bnrm)
# Convergence diagnostics
trace_plot <- mcmc_trace(fit_jsep_bnrm, facet_args = list(ncol = 2),pars = c("alpha", "beta", "sigma", "Intercept","b_x"))
print(trace_plot)
# Posterior predictive check
pp_plot <- pp_check(fit_jsep_bnrm)
#saveRDS(pp_plot, file.path(precomp_dir,"pp_plot_jsep.rds")) # Save pre-computed plot
# Model evaluation
loo_result <- loo(fit_jsep_bnrm, moment_match = TRUE)
#saveRDS(loo_result, file.path(precomp_dir,"loo_result_jsep.rds")) # Save pre-computed loo result
print(loo_result)
#> Family: jsep
#> Links: mu = identity; sigma = identity; alpha = identity; beta = identity
#> Formula: y ~ x
#> Data: data (Number of observations: 50)
#> Draws: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 2000
#>
#> Regression Coefficients:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept 0.47 0.04 0.40 0.54 1.00 743 1096
#> x 0.81 0.05 0.71 0.92 1.00 994 1192
#>
#> Further Distributional Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma 0.12 0.03 0.08 0.18 1.00 658 995
#> alpha 0.95 0.12 0.73 1.22 1.00 879 1189
#> beta 2.36 0.52 1.49 3.58 1.00 1188 1433
#>
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
#>
#> Computed from 2000 by 50 log-likelihood matrix.
#>
#> Estimate SE
#> elpd_loo 26.4 12.8
#> p_loo 8.2 4.8
#> looic -52.9 25.7
#> ------
#> MCSE of elpd_loo is 0.1.
#> MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.0]).
#>
#> All Pareto k estimates are good (k < 0.7).
#> See help('pareto-k-diagnostic') for details.
brms
codeThe brms
package provides a high-level interface for
Bayesian regression modeling using Stan. It allows users to specify
models using a formula interface similar to that of lm
or
glm
, while also supporting custom families. The neo-normal
distributions, such as FOSSEP and JSEP, can be used with
brms
by defining custom families. The brms
interface requires custom family definitions. Fossep and JSEP
distributions can be defined using the brms_custom_family
function. The following code demonstrates how to set up the custom
family for FOSSEP distributions, fit the model, and summarize the
results.
The following code demonstrates how to set up the custom family for
FOSSEP distributions, fit the model, and summarize the results. The
brms_custom_family
function is used to define the custom
family, and then the brm
function is used to fit the model.
The summary of the fitted model is displayed at the end.
# Define custom family for brms
fossep_custom <- brms_custom_family("fossep")
# Fit model with brms
fit_brms_demo <- brm(
y ~ 1,
data = df,
family = fossep_custom$custom_family,
stanvars = fossep_custom$stanvars_family,
chains = 2,
cores = 2,
iter = 4000,
warmup = 1000,
seed = 123,
refresh = 0
)
summary(fit_brms_demo)
The stanf_fossep
function generates the Stan code for
the FOSSEP distribution. The following code demonstrates how to prepare
the data, define the Stan model, and fit it using the stan
function. The Stan model includes the custom family function and the
data specification.
# Prepare data for Stan
stan_data <- list(
n = length(y),
y = y
)
# Simple Stan model (abbreviated for vignette)
stan_code <- paste0(
"functions { ", stanf_fossep(vectorize = TRUE), " }",
"data { int<lower=1> n; vector[n] y; }",
"parameters { real mu; real<lower=0> sigma; real<lower=0> alpha; real<lower=0> beta; }",
"model { y ~ fossep(rep_vector(mu, n), sigma, alpha, beta); }"
)
# Fit Stan model
fit_stan_demo <- stan(
model_code = stan_code,
data = stan_data,
chains = 2,
cores = 1,
iter = 1000,
warmup = 500,
refresh = 0
)
summary(fit_stan_demo, pars = c("mu", "sigma", "alpha", "beta"))
For routine analysis: - Use bnrm()
for
seamless integration with neo-normal distributions - Start with
iter = 2000, warmup = 1000
for initial exploration -
Increase to iter = 4000, warmup = 2000
for final
results
For custom implementations: - Use
brms_custom_family()
for maximum flexibility with brms -
Direct Stan code offers most control but requires more setup
Performance tips: - Use
cores = parallel::detectCores()
for parallel processing -
Set refresh = 0
to suppress output in production - Consider
adapt_delta = 0.95
if you encounter divergent
transitions
For production use, we recommend starting with the bnrm
interface for its simplicity, then moving to brms
or direct
Stan code for more complex modeling needs.