Installation

Install neodistr from CRAN or GitHub:

# From CRAN
install.packages("neodistr")

# From GitHub
#devtools::install_github("madsyair/neodistr")

Load Required Packages

 library(rstan)
  library(brms)
  library(bayesplot)
  library(loo)
  library(neodistr)
  # Set Stan options for faster computation
  options(mc.cores = min(2, parallel::detectCores()))
  rstan_options(auto_write = TRUE)

A. Parameter Estimation with the FOSSEP Distribution

Data Simulation and Exploration

We simulate a right‐skewed dataset of 100 observations from the FOSSEP distribution (\(\mu=0\), \(\sigma=1\), \(\alpha=2\), and \(\beta=2\)) and perform basic exploratory analysis. In this vignette, pre-computed data are loaded to streamline the presentation. Pre-computed data are generated from the FOSSEP distribution using the rfossep function, which is part of the neodistr package. The simulated data is stored in an RDS file for easy access. The simulated data can be generated using the following code. The exploratory analysis includes summary statistics and a histogram of the simulated data.

#install.packages("R.utils")
#library(R.utils)
set.seed(400)
precomp_dir <- system.file("data_precomputed", package = "neodistr")
# Simulate data from FOSSEP distribution
y <- rfossep(50, 0, 1, 2, 2)  # Simulated data from fossep distribution
df <- data.frame(y)
#saveRDS(y, file.path(precomp_dir, "fossep_data.rds")) # Save pre-computed data
#gzip(file.path(precomp_dir, "fossep_data.rds"))  
# Basic exploration
summary(y)
hist(y, main = "Simulated FOSSEP Data", xlab = "Y", col = "skyblue", breaks = 20)

#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> -1.0851  0.2071  0.9635  1.1622  1.9002  4.3928

Model Specification

We model y with an intercept-only formula. Priors are informed by the simulation parameters:

\(\alpha\) ~ lognormal(0, 0.5)
\(\beta\) ~ lognormal(log(2), 0.25)
\(\sigma\) ∼ half-normal(0, 1)
\(\mu\) ∼ normal(0, 1)

# Define formula for parameter estimation
formula <- brms::bf(y ~ 1)

# Choose priors
prior <- c(
  set_prior("lognormal(0,0.5)", class = "alpha"),
  set_prior("lognormal(log(2),0.5)", class = "beta"),
  set_prior("normal(0,1)", class = "sigma"),
  set_prior("normal(0,1)", class = "Intercept") # mu
)

Prior Predictive Check

The prior predictive check is performed to assess the plausibility of the priors by simulating data from the specified priors. This step helps to ensure that the priors are reasonable and that they can generate data similar to the observed data. To perform the prior predictive check, the bnrm function is used with the sample_prior = "only" argument. This will generate simulated datasets from the priors without fitting the model to the data.

# Run prior predictive check (computationally intensive)
ppc_fit_fossep <- bnrm(
  formula = formula,
  data = df,
  family = fossep(),
  prior = prior,
  sample_prior = "only",
  chains = 2,           # Reduced for demo
  cores = 1,            # Single core for consistency
  iter = 2000,          # Reduced iterations
  warmup = 1000,
  refresh = 0           # Suppress output
)

# Show prior predictive check plot
ppc_plot_fossep <- pp_check(ppc_fit_fossep, type = "dens_overlay") 
#saveRDS(ppc_plot_fossep, file.path(precomp_dir, "ppc_plot_fossep.rds")) # Save pre-computed plot file.path(precomp_dir, "fossep_data.rds")
print(ppc_plot_fossep)

The plot of the prior predictive check shows the distribution of the simulated data from the priors overlaid the observed data. This allows us to visually assess whether the priors are reasonable and can generate data similar to the observed data.

Model Fitting

We fit the model using the bnrm function, specifying the formula, data, family, and priors defined earlier. The fitting process uses MCMC sampling to estimate the posterior distributions of the parameters.

# Fit the model (computationally intensive)
fit_fossep_bnrm <- bnrm(
  formula = formula,
  data = df,
  family = fossep(),
  prior = prior,
  chains = 2,
  cores = 1,
  iter = 2000,
  warmup = 1000,
  seed = 123,
  refresh = 0
)
#saveRDS(fit_fossep_bnrm, file.path(precomp_dir, "fit_fossep_bnrm.rds")) # Save pre-computed fit

After fitting the model, we can summarize the results to see the estimated parameters and their posterior distributions. The summary will include the mean, standard deviation, and credible intervals for each parameter. The convergence diagnostics will also be checked to ensure that the MCMC chains have mixed well and converged to the posterior distribution. Posterior predictive checks will be performed to assess the model fit and the ability of the model to predict new data.

  summary(fit_fossep_bnrm)
  
  # Convergence diagnostics
  trace_plot <- mcmc_trace(fit_fossep_bnrm)
  print(trace_plot)
  
  # Posterior predictive check
  pp_plot <- pp_check(fit_fossep_bnrm)
  #saveRDS(pp_plot, file.path(precomp_dir, "pp_plot_fossep.rds")) # Save pre-computed plot
  print(pp_plot)

#>  Family: fossep 
#>   Links: mu = identity; sigma = identity; alpha = identity; beta = identity 
#> Formula: y ~ 1 
#>    Data: data (Number of observations: 50) 
#>   Draws: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 2000
#> 
#> Regression Coefficients:
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept     0.35      0.33    -0.25     1.05 1.00      891      817
#> 
#> Further Distributional Parameters:
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma     1.07      0.27     0.55     1.62 1.01      640      692
#> alpha     1.66      0.36     1.10     2.40 1.00      887      697
#> beta      2.14      0.59     1.25     3.34 1.00      663      740
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

B. Regression with the JSEP Distribution

Data Preparation

In the following example, we demonstrate how to perform regression modeling using the JSEP distribution. The code provided below shows how to set up the regression model, specify the priors, and fit the model using the bnrm function.

Data Simulation and Exploration

set.seed(123)
x <- runif(50)
e <- rjsep(50, 0, 0.1, 0.8,2)  # Simulated errors from JSEP distribution
y <- 0.5 + 0.8*x + e
xy<-cbind(y,x)

df_reg <- data.frame(y, x)
# Basic exploration
plot(x, y, main = "Regression Data", pch = 19, col = "steelblue")
#save pre-computed data
#saveRDS(xy, file.path(precomp_dir,"jsep_data.rds")) # Save pre-computed data

Model Specification and Fitting

We specify the regression model using the brms formula interface. The regression formula is defined as y ~ x, where y is the response variable and x is the predictor variable. The priors for the regression parameters are set as follows: Prior of \(\alpha\) is set to log-normal(0, 0.5), \(\beta\) is set to log-normal(1, 0.5), \(\sigma\) is set to half-normal(0, 1), and the intercept and slope are set to normal(0, 1). The code syntax for defining the model and these priors is shown below.

# Define regression formula
formula_reg <- brms::bf(y ~ x)

# Priors for regression
prior_reg <- c(
  set_prior("lognormal(log(2),0.25)", class = "alpha"),
  set_prior("lognormal(log(2),0.25)", class = "beta"),
  set_prior("normal(0,1)", class = "sigma"),
  set_prior("normal(0,1)", class = "Intercept"),
  set_prior("normal(0,1)", class = "b")
)

Prior Predictive Check

We evaluate our chosen priors via a prior predictive check, which entails drawing simulated datasets solely from those priors to verify they can plausibly reproduce the characteristics of the actual data. In practice, this is done by calling bnrm(..., sample_prior = "only"), which generates the prior‐based simulations without fitting the model to the observed data.

# Fit regression model
fit_ppc_jsep <- bnrm(
  formula = formula_reg,
  data = df_reg,
  family = jsep(),
  prior = prior_reg,
  chains = 2,
  cores = 1,
  iter = 2000,
  sample_prior = "only", # Prior predictive check
  warmup = 1000,
  seed = 123,
  refresh = 0
)
# prior predictive check fir jesp


ppc_plot_jsep<- pp_check(fit_ppc_jsep, type = "dens_overlay", nsamples = 200)
#saveRDS(ppc_plot_jsep, file.path(precomp_dir,"ppc_plot_jsep.rds")) # Save pre-computed plot
print(ppc_plot_jsep)
# Show prior predictive check plot

The prior predictive check plot overlays the data simulated from our priors onto the real observations, letting us visually judge whether those priors can plausibly reproduce the distribution of the actual data.

Model Fitting

We fit the regression model using the bnrm function, specifying the formula, data, family, and priors defined earlier. The fitting process uses MCMC sampling to estimate the posterior distributions of the parameters.

fit_jsep_bnrm <- bnrm(
  formula = formula_reg,
  data = df_reg,
  family = jsep(),
  prior = prior_reg,
  chains = 2,
  cores = 1,
  iter = 2000,
   warmup = 1000,
  seed = 123,
  refresh = 0
)
#saveRDS(fit_jsep_bnrm, file.path(precomp_dir, "fit_jsep_bnrm.rds")) # Save pre-computed fit

 
summary(fit_jsep_bnrm)
# Convergence diagnostics
  trace_plot <- mcmc_trace(fit_jsep_bnrm, facet_args = list(ncol = 2),pars = c("alpha", "beta", "sigma", "Intercept","b_x"))
  print(trace_plot)
  
  # Posterior predictive check
  pp_plot <- pp_check(fit_jsep_bnrm)
  
 #saveRDS(pp_plot, file.path(precomp_dir,"pp_plot_jsep.rds")) # Save pre-computed plot

  # Model evaluation
  loo_result <- loo(fit_jsep_bnrm, moment_match = TRUE)
  #saveRDS(loo_result, file.path(precomp_dir,"loo_result_jsep.rds")) # Save pre-computed loo result
  print(loo_result)

#>  Family: jsep 
#>   Links: mu = identity; sigma = identity; alpha = identity; beta = identity 
#> Formula: y ~ x 
#>    Data: data (Number of observations: 50) 
#>   Draws: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 2000
#> 
#> Regression Coefficients:
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept     0.47      0.04     0.40     0.54 1.00      743     1096
#> x             0.81      0.05     0.71     0.92 1.00      994     1192
#> 
#> Further Distributional Parameters:
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma     0.12      0.03     0.08     0.18 1.00      658      995
#> alpha     0.95      0.12     0.73     1.22 1.00      879     1189
#> beta      2.36      0.52     1.49     3.58 1.00     1188     1433
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

#> 
#> Computed from 2000 by 50 log-likelihood matrix.
#> 
#>          Estimate   SE
#> elpd_loo     26.4 12.8
#> p_loo         8.2  4.8
#> looic       -52.9 25.7
#> ------
#> MCSE of elpd_loo is 0.1.
#> MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.0]).
#> 
#> All Pareto k estimates are good (k < 0.7).
#> See help('pareto-k-diagnostic') for details.

Usage Example with `brms` code

The brms package provides a high-level interface for Bayesian regression modeling using Stan. It allows users to specify models using a formula interface similar to that of lm or glm, while also supporting custom families. The neo-normal distributions, such as FOSSEP and JSEP, can be used with brms by defining custom families. The brms interface requires custom family definitions. Fossep and JSEP distributions can be defined using the brms_custom_family function. The following code demonstrates how to set up the custom family for FOSSEP distributions, fit the model, and summarize the results.

The following code demonstrates how to set up the custom family for FOSSEP distributions, fit the model, and summarize the results. The brms_custom_family function is used to define the custom family, and then the brm function is used to fit the model. The summary of the fitted model is displayed at the end.

# Define custom family for brms
fossep_custom <- brms_custom_family("fossep")

# Fit model with brms
fit_brms_demo <- brm(
  y ~ 1,
  data = df,
  family = fossep_custom$custom_family,
  stanvars = fossep_custom$stanvars_family,
  chains = 2,
  cores = 2,
  iter = 4000,
  warmup = 1000,
  seed = 123,
  refresh = 0
)
summary(fit_brms_demo)

Usage Example with Stan code

The stanf_fossep function generates the Stan code for the FOSSEP distribution. The following code demonstrates how to prepare the data, define the Stan model, and fit it using the stan function. The Stan model includes the custom family function and the data specification.

# Prepare data for Stan
stan_data <- list(
  n = length(y),
  y = y
)

# Simple Stan model (abbreviated for vignette)
stan_code <- paste0(
  "functions { ", stanf_fossep(vectorize = TRUE), " }",
  "data { int<lower=1> n; vector[n] y; }",
  "parameters { real mu; real<lower=0> sigma; real<lower=0> alpha; real<lower=0> beta; }",
  "model { y ~ fossep(rep_vector(mu, n), sigma, alpha, beta); }"
)

# Fit Stan model
fit_stan_demo <- stan(
  model_code = stan_code,
  data = stan_data,
  chains = 2,
  cores = 1,
  iter = 1000,
  warmup = 500,
  refresh = 0
)
summary(fit_stan_demo, pars = c("mu", "sigma", "alpha", "beta"))

Practical Recommendations

For routine analysis: - Use bnrm() for seamless integration with neo-normal distributions - Start with iter = 2000, warmup = 1000 for initial exploration - Increase to iter = 4000, warmup = 2000 for final results

For custom implementations: - Use brms_custom_family() for maximum flexibility with brms - Direct Stan code offers most control but requires more setup

Performance tips: - Use cores = parallel::detectCores() for parallel processing - Set refresh = 0 to suppress output in production - Consider adapt_delta = 0.95 if you encounter divergent transitions

For production use, we recommend starting with the bnrm interface for its simplicity, then moving to brms or direct Stan code for more complex modeling needs.

Neo-normal Distribution Family for MCMC Models in ‘bnrm’, ‘brms’, and ‘Stan’ Code

Introduction

Installation

Load Required Packages

A. Parameter Estimation with the FOSSEP Distribution

Data Simulation and Exploration

Model Specification

Prior Predictive Check

Model Fitting

B. Regression with the JSEP Distribution

Data Preparation

Data Simulation and Exploration

Model Specification and Fitting

Prior Predictive Check

Model Fitting

Usage Example with `brms` code

Usage Example with Stan code

Practical Recommendations

Neo-normal Distribution Family for MCMC Models in ‘bnrm’, ‘brms’, and ‘Stan’ Code

Introduction

Installation

Load Required Packages

A. Parameter Estimation with the FOSSEP Distribution

Data Simulation and Exploration

Model Specification

Prior Predictive Check

Model Fitting

B. Regression with the JSEP Distribution

Data Preparation

Data Simulation and Exploration

Model Specification and Fitting

Prior Predictive Check

Model Fitting

Usage Example with brms code

Usage Example with Stan code

Practical Recommendations

Usage Example with `brms` code