---
title: "PMM2 for Time Series: AR, MA, ARMA, and ARIMA Models"
author: "Serhii Zabolotnii"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{PMM2 for Time Series: AR, MA, ARMA, and ARIMA Models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  warning = FALSE,
  message = FALSE
)
```

## Introduction

Time series models are fundamental tools in economics, finance, engineering, and many other fields. Classical estimation methods like **Conditional Sum of Squares (CSS)** and **Maximum Likelihood (ML)** assume Gaussian innovations. However, real-world time series often exhibit:

- **Heavy tails** (e.g., financial returns)
- **Skewness** (e.g., environmental data, commodity prices)
- **Outliers** (e.g., measurement errors)

The **PMM2 method** extends to time series models, providing **more efficient parameter estimates** when innovations deviate from normality. This vignette demonstrates PMM2 applications to:

1. **Autoregressive (AR)** models
2. **Moving Average (MA)** models
3. **ARMA** models
4. **ARIMA** models with differencing

---

## Setup

```{r load_package}
library(EstemPMM)
set.seed(42)
```

---

## Part 1: Autoregressive (AR) Models

### AR(1) Model with Skewed Innovations

We'll simulate an AR(1) process with **gamma-distributed innovations** to demonstrate PMM2's robustness.

```{r ar1_simulation}
# Model parameters
n <- 300
phi <- 0.6  # AR coefficient
intercept <- 5

# Generate skewed innovations using gamma distribution
# (Gamma(shape=2) has skewness = sqrt(2) ≈ 1.41)
innovations <- rgamma(n, shape = 2, rate = 2) - 1  # Center around 0

# Generate AR(1) series
x <- numeric(n)
x[1] <- intercept + innovations[1]
for(i in 2:n) {
  x[i] <- intercept + phi * (x[i-1] - intercept) + innovations[i]
}

# Plot the time series
plot(x, type = "l", main = "AR(1) Process with Skewed Innovations",
     ylab = "Value", xlab = "Time", col = "steelblue", lwd = 1.5)
abline(h = intercept, col = "red", lty = 2)
```

### Estimate AR(1) Model: PMM2 vs CSS

```{r ar1_estimation}
# Fit using PMM2
fit_pmm2_ar1 <- ar_pmm2(x, order = 1, method = "pmm2", include.mean = TRUE)

# Fit using CSS (classical method)
fit_css_ar1 <- ar_pmm2(x, order = 1, method = "css", include.mean = TRUE)

# Compare coefficients
cat("True AR coefficient:", phi, "\n\n")

cat("CSS Estimate:\n")
print(coef(fit_css_ar1))

cat("\nPMM2 Estimate:\n")
print(coef(fit_pmm2_ar1))

# Summary with diagnostics
summary(fit_pmm2_ar1)
```

**Observation:** PMM2 typically provides estimates closer to the true value when innovations are non-Gaussian.

---

### AR(2) Model

For higher-order AR models, PMM2 maintains its efficiency advantages.

```{r ar2_example}
# AR(2) model: x_t = 0.5*x_{t-1} - 0.3*x_{t-2} + e_t
n <- 400
phi <- c(0.5, -0.3)

# Generate t-distributed innovations (heavy tails)
innovations <- rt(n, df = 4)

# Generate AR(2) series
x <- arima.sim(n = n, model = list(ar = phi), innov = innovations)

# Fit models
fit_pmm2_ar2 <- ar_pmm2(x, order = 2, method = "pmm2", include.mean = FALSE)
fit_css_ar2 <- ar_pmm2(x, order = 2, method = "css", include.mean = FALSE)

# Compare
comparison_ar2 <- data.frame(
  Parameter = c("phi1", "phi2"),
  True = phi,
  CSS = coef(fit_css_ar2),
  PMM2 = coef(fit_pmm2_ar2)
)

print(comparison_ar2, row.names = FALSE)
```

---

## Part 2: Moving Average (MA) Models

MA models are challenging because innovations are not directly observable. PMM2 uses **CSS residuals** as starting estimates, then refines parameters.

### MA(1) Model

```{r ma1_simulation}
# MA(1) model: x_t = e_t + theta*e_{t-1}
n <- 300
theta <- 0.7

# Generate chi-squared innovations (right-skewed)
innovations <- rchisq(n, df = 3) - 3

# Generate MA(1) series
x <- numeric(n)
x[1] <- innovations[1]
for(i in 2:n) {
  x[i] <- innovations[i] + theta * innovations[i-1]
}

# Fit MA(1) models
fit_pmm2_ma1 <- ma_pmm2(x, order = 1, method = "pmm2", include.mean = FALSE)
fit_css_ma1 <- ma_pmm2(x, order = 1, method = "css", include.mean = FALSE)

# Compare estimates
cat("True MA coefficient:", theta, "\n\n")

cat("CSS Estimate:\n")
print(coef(fit_css_ma1))

cat("\nPMM2 Estimate:\n")
print(coef(fit_pmm2_ma1))

# Check convergence
cat("\nPMM2 Convergence:", fit_pmm2_ma1@convergence, "\n")
cat("Iterations:", fit_pmm2_ma1@iterations, "\n")
```

---

### MA(2) Model

```{r ma2_example}
# MA(2) model
n <- 400
theta <- c(0.6, 0.3)

# Mixed distribution: 80% normal + 20% contamination
innovations <- ifelse(runif(n) < 0.8,
                      rnorm(n),
                      rnorm(n, mean = 0, sd = 3))

# Generate MA(2) series
x <- arima.sim(n = n, model = list(ma = theta), innov = innovations)

# Fit models
fit_pmm2_ma2 <- ma_pmm2(x, order = 2, method = "pmm2", include.mean = FALSE)
fit_css_ma2 <- ma_pmm2(x, order = 2, method = "css", include.mean = FALSE)

# Comparison
comparison_ma2 <- data.frame(
  Parameter = c("theta1", "theta2"),
  True = theta,
  CSS = coef(fit_css_ma2),
  PMM2 = coef(fit_pmm2_ma2)
)

print(comparison_ma2, row.names = FALSE)
```

**Monte Carlo Evidence:** For MA(2) models with skewed errors, PMM2 achieves **26-32% variance reduction** compared to CSS (see roadmap documentation).

---

## Part 3: ARMA Models

ARMA models combine AR and MA components, providing flexible modeling for stationary series.

### ARMA(1,1) Model

```{r arma11_simulation}
# ARMA(1,1) model
n <- 500
phi <- 0.5    # AR coefficient
theta <- 0.4  # MA coefficient

# Generate exponentially distributed innovations (right-skewed)
innovations <- rexp(n, rate = 1) - 1

# Generate ARMA(1,1) series
x <- arima.sim(n = n,
               model = list(ar = phi, ma = theta),
               innov = innovations)

# Fit models
fit_pmm2_arma <- arma_pmm2(x, order = c(1, 1), method = "pmm2", include.mean = FALSE)
fit_css_arma <- arma_pmm2(x, order = c(1, 1), method = "css", include.mean = FALSE)

# Extract coefficients
cat("True parameters: AR =", phi, ", MA =", theta, "\n\n")

cat("CSS Estimates:\n")
print(coef(fit_css_arma))

cat("\nPMM2 Estimates:\n")
print(coef(fit_pmm2_arma))

# Summary
summary(fit_pmm2_arma)
```

---

### Diagnostic Plots for ARMA Models

```{r arma_diagnostics, fig.width=7, fig.height=6}
# Plot residuals and ACF
plot(fit_pmm2_arma)
title("ARMA(1,1) Model Diagnostics")
```

The diagnostic plots help verify:

1. **Residuals vs Time**: Should show no patterns
2. **ACF of Residuals**: Should be within confidence bands (white noise)
3. **Q-Q Plot**: Assess residual distribution

---

## Part 4: ARIMA Models with Differencing

For non-stationary series, ARIMA models incorporate differencing to achieve stationarity.

### ARIMA(1,1,1) Model

```{r arima111_simulation}
# Generate non-stationary series requiring differencing
n <- 400
phi <- 0.4
theta <- 0.3

# Skewed innovations
innovations <- rgamma(n, shape = 3, rate = 3) - 1

# Generate integrated series: ARIMA(1,1,1)
# First generate stationary ARMA(1,1)
x_stationary <- arima.sim(n = n,
                          model = list(ar = phi, ma = theta),
                          innov = innovations)

# Integrate once (cumulative sum)
x <- cumsum(x_stationary)

# Plot the non-stationary series
plot(x, type = "l", main = "ARIMA(1,1,1) Process",
     ylab = "Value", xlab = "Time", col = "darkgreen", lwd = 1.5)
```

### Fitting ARIMA Models

```{r arima_estimation}
# Fit ARIMA(1,1,1) using PMM2
fit_pmm2_arima <- arima_pmm2(x, order = c(1, 1, 1), method = "pmm2", include.mean = FALSE)

# Fit using classical CSS method
fit_css_arima <- arima_pmm2(x, order = c(1, 1, 1), method = "css", include.mean = FALSE)

# Compare coefficients
cat("True parameters: AR =", phi, ", MA =", theta, "\n\n")

cat("CSS Estimates:\n")
print(coef(fit_css_arima))

cat("\nPMM2 Estimates:\n")
print(coef(fit_pmm2_arima))

# Summary
summary(fit_pmm2_arima)
```

**Monte Carlo Evidence:** ARIMA(1,1,1) models show **24-52% variance reduction** with PMM2 vs CSS when innovations are non-Gaussian.

---

## Part 5: Bootstrap Inference for Time Series

Bootstrap methods provide robust confidence intervals for time series parameters.

```{r ts_bootstrap_inference}
# Perform bootstrap inference for the ARMA(1,1) model
boot_results <- ts_pmm2_inference(fit_pmm2_arma,
                                  B = 500,
                                  seed = 123,
                                  method = "block",
                                  block_length = 20,
                                  parallel = FALSE)

# Display summary with confidence intervals
summary(boot_results)
```

The bootstrap results provide:

- **Standard errors** for each parameter
- **95% confidence intervals** (percentile and bias-corrected)
- **Convergence diagnostics** across bootstrap samples

---

## Part 6: Comparing Methods with Monte Carlo

The package provides tools for systematic comparison through Monte Carlo simulation.

```{r monte_carlo_comparison, eval=FALSE}
# Define model specifications
specs <- list(
  list(
    model = "ma",
    order = 1,
    theta = 0.7,
    label = "MA(1)",
    innovations = list(type = "gamma", shape = 2)
  ),
  list(
    model = "arma",
    order = c(1, 1),
    theta = list(ar = 0.5, ma = 0.4),
    label = "ARMA(1,1)",
    innovations = list(type = "student_t", df = 4)
  )
)

# Run Monte Carlo comparison
mc_results <- pmm2_monte_carlo_compare(
  model_specs = specs,
  methods = c("css", "pmm2"),
  n = 300,
  n_sim = 1000,
  seed = 456,
  progress = FALSE
)

# Display results
print(mc_results$summary)
print(mc_results$gain)
```

**Interpretation:**

- `MSE_ratio < 1.0` indicates PMM2 has **lower variance** than CSS
- Efficiency gains typically range from **10-50%** depending on innovation distribution

---

## Part 7: Prediction with Time Series Models

PMM2-fitted models support forecasting:

```{r prediction}
# Forecast 10 steps ahead for the ARMA(1,1) model
forecast_horizon <- 10

pred_raw <- predict(fit_pmm2_arma, n.ahead = forecast_horizon)
predictions <- if (is.list(pred_raw)) as.numeric(pred_raw$pred) else as.numeric(pred_raw)

# Display forecasts
cat("Forecasts for next", forecast_horizon, "periods:\n")
print(predictions)

# Plot original series with forecasts
n_plot <- 100
plot_data <- tail(x, n_plot)
plot(seq_along(plot_data), plot_data, type = "l",
     xlim = c(1, n_plot + forecast_horizon),
     ylim = range(c(plot_data, predictions)),
     main = "ARMA(1,1) Forecasts",
     xlab = "Time", ylab = "Value",
     col = "steelblue", lwd = 1.5)

# Add forecasts
lines(n_plot + seq_len(forecast_horizon), predictions,
      col = "red", lwd = 2, lty = 2)

legend("topleft",
       legend = c("Observed", "Forecast"),
       col = c("steelblue", "red"),
       lty = c(1, 2), lwd = c(1.5, 2))
```

---

## Practical Guidelines

### When to Use PMM2 for Time Series

**Recommended when:**

- Innovations exhibit **skewness or heavy tails** (check residuals from classical fit)
- Working with **financial data** (returns often non-Gaussian)
- **Environmental time series** (precipitation, pollution levels)
- **Engineering measurements** with asymmetric errors
- Sample size **n ≥ 200** for reliable moment estimation

**Stick with classical methods when:**

- Innovations are approximately **Gaussian** (check Q-Q plot)
- Very **small sample sizes** (n < 100)
- **Computational speed** is critical
- Model is for **pedagogical purposes** only

---

### Model Selection Strategy

1. **Visualize** the time series and check for stationarity
2. **Fit classical model** (CSS/ML) as baseline
3. **Examine residuals** for non-normality:
   - Skewness: `pmm_skewness(residuals)`
   - Kurtosis: `pmm_kurtosis(residuals)`
   - Visual: Q-Q plots, histograms
4. **If residuals are non-Gaussian**, refit with PMM2
5. **Compare standard errors** using bootstrap
6. **Validate** with out-of-sample forecasting

---

### Computational Considerations

PMM2 is iterative and may take longer than classical methods:

- **AR models**: Typically fast (5-15 iterations)
- **MA models**: Moderate (10-30 iterations depending on order)
- **ARMA/ARIMA**: Can be slower (20-50 iterations)

For large datasets (n > 5000), consider:

- Using `verbose = FALSE` to reduce output
- Starting with classical estimates as `initial` parameter
- Reducing `max_iter` if convergence is slow

---

## Summary Statistics: Efficiency Gains

Based on extensive Monte Carlo studies documented in the package:

| Model Type | Innovation Distribution | Variance Reduction | Sample Size |
|------------|------------------------|-------------------|-------------|
| AR(1)      | Gamma (skew = 1.4)     | 15-20%           | 200-1000    |
| MA(1)      | Chi-squared (df=3)     | 15-23%           | 200-1000    |
| MA(2)      | Mixed contamination    | 26-32%           | 200-1000    |
| ARMA(1,1)  | Student-t (df=4)       | 30-45%           | 200-1000    |
| ARIMA(1,1,1) | Exponential (shifted) | 24-52%          | 300-1000    |

**Key Insight:** The more **non-Gaussian** the innovations, the **larger the efficiency gain** from PMM2.

---

## Advanced Topics

### Custom Innovation Distributions

You can use `pmm2_monte_carlo_compare()` with custom distributions:

```{r custom_innovations, eval=FALSE}
# Custom innovation generator
my_innovations <- function(n) {
  # Mixture: 90% Gaussian + 10% extreme outliers
  base <- rnorm(n)
  outliers <- sample(c(-5, 5), n, replace = TRUE, prob = c(0.05, 0.05))
  ifelse(runif(n) < 0.9, base, outliers)
}

# Use in Monte Carlo
specs <- list(
  list(
    model = "ar",
    order = 1,
    theta = 0.6,
    innovations = list(generator = my_innovations)
  )
)

mc_results <- pmm2_monte_carlo_compare(specs, methods = c("css", "pmm2"),
                                       n = 300, n_sim = 500)
```

---

## Conclusion

The **PMM2 method for time series** provides:

1. **Robust estimation** for AR, MA, ARMA, and ARIMA models
2. **10-50% variance reduction** when innovations deviate from normality
3. **Seamless integration** with standard R workflows
4. **Comprehensive diagnostics** and bootstrap inference

For practitioners working with **real-world time series** exhibiting non-Gaussian behavior, PMM2 offers a principled approach to achieving **more precise parameter estimates** without sacrificing interpretability.

---

## Next Steps

- **Bootstrap Inference**: See `vignette("03-bootstrap-inference")` for detailed statistical inference
- **Linear Regression**: See `vignette("01-pmm2-introduction")` for PMM2 basics
- **Advanced Applications**: Explore package demos with `demo(package = "EstemPMM")`

---

## References

1. Zabolotnii, S., Warsza, Z.L., Tkachenko, O. (2018). "Polynomial Estimation of Linear Regression Parameters for the Asymmetric PDF of Errors". Springer, AUTOMATION 2018.

2. Package functions: `?ar_pmm2`, `?ma_pmm2`, `?arma_pmm2`, `?arima_pmm2`, `?ts_pmm2_inference`

3. Monte Carlo tools: `?pmm2_monte_carlo_compare`

4. GitHub: https://github.com/SZabolotnii/EstemPMM