--- title: "PMM2 for Time Series: AR, MA, ARMA, and ARIMA Models" author: "Serhii Zabolotnii" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{PMM2 for Time Series: AR, MA, ARMA, and ARIMA Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE ) ``` ## Introduction Time series models are fundamental tools in economics, finance, engineering, and many other fields. Classical estimation methods like **Conditional Sum of Squares (CSS)** and **Maximum Likelihood (ML)** assume Gaussian innovations. However, real-world time series often exhibit: - **Heavy tails** (e.g., financial returns) - **Skewness** (e.g., environmental data, commodity prices) - **Outliers** (e.g., measurement errors) The **PMM2 method** extends to time series models, providing **more efficient parameter estimates** when innovations deviate from normality. This vignette demonstrates PMM2 applications to: 1. **Autoregressive (AR)** models 2. **Moving Average (MA)** models 3. **ARMA** models 4. **ARIMA** models with differencing --- ## Setup ```{r load_package} library(EstemPMM) set.seed(42) ``` --- ## Part 1: Autoregressive (AR) Models ### AR(1) Model with Skewed Innovations We'll simulate an AR(1) process with **gamma-distributed innovations** to demonstrate PMM2's robustness. ```{r ar1_simulation} # Model parameters n <- 300 phi <- 0.6 # AR coefficient intercept <- 5 # Generate skewed innovations using gamma distribution # (Gamma(shape=2) has skewness = sqrt(2) ≈ 1.41) innovations <- rgamma(n, shape = 2, rate = 2) - 1 # Center around 0 # Generate AR(1) series x <- numeric(n) x[1] <- intercept + innovations[1] for(i in 2:n) { x[i] <- intercept + phi * (x[i-1] - intercept) + innovations[i] } # Plot the time series plot(x, type = "l", main = "AR(1) Process with Skewed Innovations", ylab = "Value", xlab = "Time", col = "steelblue", lwd = 1.5) abline(h = intercept, col = "red", lty = 2) ``` ### Estimate AR(1) Model: PMM2 vs CSS ```{r ar1_estimation} # Fit using PMM2 fit_pmm2_ar1 <- ar_pmm2(x, order = 1, method = "pmm2", include.mean = TRUE) # Fit using CSS (classical method) fit_css_ar1 <- ar_pmm2(x, order = 1, method = "css", include.mean = TRUE) # Compare coefficients cat("True AR coefficient:", phi, "\n\n") cat("CSS Estimate:\n") print(coef(fit_css_ar1)) cat("\nPMM2 Estimate:\n") print(coef(fit_pmm2_ar1)) # Summary with diagnostics summary(fit_pmm2_ar1) ``` **Observation:** PMM2 typically provides estimates closer to the true value when innovations are non-Gaussian. --- ### AR(2) Model For higher-order AR models, PMM2 maintains its efficiency advantages. ```{r ar2_example} # AR(2) model: x_t = 0.5*x_{t-1} - 0.3*x_{t-2} + e_t n <- 400 phi <- c(0.5, -0.3) # Generate t-distributed innovations (heavy tails) innovations <- rt(n, df = 4) # Generate AR(2) series x <- arima.sim(n = n, model = list(ar = phi), innov = innovations) # Fit models fit_pmm2_ar2 <- ar_pmm2(x, order = 2, method = "pmm2", include.mean = FALSE) fit_css_ar2 <- ar_pmm2(x, order = 2, method = "css", include.mean = FALSE) # Compare comparison_ar2 <- data.frame( Parameter = c("phi1", "phi2"), True = phi, CSS = coef(fit_css_ar2), PMM2 = coef(fit_pmm2_ar2) ) print(comparison_ar2, row.names = FALSE) ``` --- ## Part 2: Moving Average (MA) Models MA models are challenging because innovations are not directly observable. PMM2 uses **CSS residuals** as starting estimates, then refines parameters. ### MA(1) Model ```{r ma1_simulation} # MA(1) model: x_t = e_t + theta*e_{t-1} n <- 300 theta <- 0.7 # Generate chi-squared innovations (right-skewed) innovations <- rchisq(n, df = 3) - 3 # Generate MA(1) series x <- numeric(n) x[1] <- innovations[1] for(i in 2:n) { x[i] <- innovations[i] + theta * innovations[i-1] } # Fit MA(1) models fit_pmm2_ma1 <- ma_pmm2(x, order = 1, method = "pmm2", include.mean = FALSE) fit_css_ma1 <- ma_pmm2(x, order = 1, method = "css", include.mean = FALSE) # Compare estimates cat("True MA coefficient:", theta, "\n\n") cat("CSS Estimate:\n") print(coef(fit_css_ma1)) cat("\nPMM2 Estimate:\n") print(coef(fit_pmm2_ma1)) # Check convergence cat("\nPMM2 Convergence:", fit_pmm2_ma1@convergence, "\n") cat("Iterations:", fit_pmm2_ma1@iterations, "\n") ``` --- ### MA(2) Model ```{r ma2_example} # MA(2) model n <- 400 theta <- c(0.6, 0.3) # Mixed distribution: 80% normal + 20% contamination innovations <- ifelse(runif(n) < 0.8, rnorm(n), rnorm(n, mean = 0, sd = 3)) # Generate MA(2) series x <- arima.sim(n = n, model = list(ma = theta), innov = innovations) # Fit models fit_pmm2_ma2 <- ma_pmm2(x, order = 2, method = "pmm2", include.mean = FALSE) fit_css_ma2 <- ma_pmm2(x, order = 2, method = "css", include.mean = FALSE) # Comparison comparison_ma2 <- data.frame( Parameter = c("theta1", "theta2"), True = theta, CSS = coef(fit_css_ma2), PMM2 = coef(fit_pmm2_ma2) ) print(comparison_ma2, row.names = FALSE) ``` **Monte Carlo Evidence:** For MA(2) models with skewed errors, PMM2 achieves **26-32% variance reduction** compared to CSS (see roadmap documentation). --- ## Part 3: ARMA Models ARMA models combine AR and MA components, providing flexible modeling for stationary series. ### ARMA(1,1) Model ```{r arma11_simulation} # ARMA(1,1) model n <- 500 phi <- 0.5 # AR coefficient theta <- 0.4 # MA coefficient # Generate exponentially distributed innovations (right-skewed) innovations <- rexp(n, rate = 1) - 1 # Generate ARMA(1,1) series x <- arima.sim(n = n, model = list(ar = phi, ma = theta), innov = innovations) # Fit models fit_pmm2_arma <- arma_pmm2(x, order = c(1, 1), method = "pmm2", include.mean = FALSE) fit_css_arma <- arma_pmm2(x, order = c(1, 1), method = "css", include.mean = FALSE) # Extract coefficients cat("True parameters: AR =", phi, ", MA =", theta, "\n\n") cat("CSS Estimates:\n") print(coef(fit_css_arma)) cat("\nPMM2 Estimates:\n") print(coef(fit_pmm2_arma)) # Summary summary(fit_pmm2_arma) ``` --- ### Diagnostic Plots for ARMA Models ```{r arma_diagnostics, fig.width=7, fig.height=6} # Plot residuals and ACF plot(fit_pmm2_arma) title("ARMA(1,1) Model Diagnostics") ``` The diagnostic plots help verify: 1. **Residuals vs Time**: Should show no patterns 2. **ACF of Residuals**: Should be within confidence bands (white noise) 3. **Q-Q Plot**: Assess residual distribution --- ## Part 4: ARIMA Models with Differencing For non-stationary series, ARIMA models incorporate differencing to achieve stationarity. ### ARIMA(1,1,1) Model ```{r arima111_simulation} # Generate non-stationary series requiring differencing n <- 400 phi <- 0.4 theta <- 0.3 # Skewed innovations innovations <- rgamma(n, shape = 3, rate = 3) - 1 # Generate integrated series: ARIMA(1,1,1) # First generate stationary ARMA(1,1) x_stationary <- arima.sim(n = n, model = list(ar = phi, ma = theta), innov = innovations) # Integrate once (cumulative sum) x <- cumsum(x_stationary) # Plot the non-stationary series plot(x, type = "l", main = "ARIMA(1,1,1) Process", ylab = "Value", xlab = "Time", col = "darkgreen", lwd = 1.5) ``` ### Fitting ARIMA Models ```{r arima_estimation} # Fit ARIMA(1,1,1) using PMM2 fit_pmm2_arima <- arima_pmm2(x, order = c(1, 1, 1), method = "pmm2", include.mean = FALSE) # Fit using classical CSS method fit_css_arima <- arima_pmm2(x, order = c(1, 1, 1), method = "css", include.mean = FALSE) # Compare coefficients cat("True parameters: AR =", phi, ", MA =", theta, "\n\n") cat("CSS Estimates:\n") print(coef(fit_css_arima)) cat("\nPMM2 Estimates:\n") print(coef(fit_pmm2_arima)) # Summary summary(fit_pmm2_arima) ``` **Monte Carlo Evidence:** ARIMA(1,1,1) models show **24-52% variance reduction** with PMM2 vs CSS when innovations are non-Gaussian. --- ## Part 5: Bootstrap Inference for Time Series Bootstrap methods provide robust confidence intervals for time series parameters. ```{r ts_bootstrap_inference} # Perform bootstrap inference for the ARMA(1,1) model boot_results <- ts_pmm2_inference(fit_pmm2_arma, B = 500, seed = 123, method = "block", block_length = 20, parallel = FALSE) # Display summary with confidence intervals summary(boot_results) ``` The bootstrap results provide: - **Standard errors** for each parameter - **95% confidence intervals** (percentile and bias-corrected) - **Convergence diagnostics** across bootstrap samples --- ## Part 6: Comparing Methods with Monte Carlo The package provides tools for systematic comparison through Monte Carlo simulation. ```{r monte_carlo_comparison, eval=FALSE} # Define model specifications specs <- list( list( model = "ma", order = 1, theta = 0.7, label = "MA(1)", innovations = list(type = "gamma", shape = 2) ), list( model = "arma", order = c(1, 1), theta = list(ar = 0.5, ma = 0.4), label = "ARMA(1,1)", innovations = list(type = "student_t", df = 4) ) ) # Run Monte Carlo comparison mc_results <- pmm2_monte_carlo_compare( model_specs = specs, methods = c("css", "pmm2"), n = 300, n_sim = 1000, seed = 456, progress = FALSE ) # Display results print(mc_results$summary) print(mc_results$gain) ``` **Interpretation:** - `MSE_ratio < 1.0` indicates PMM2 has **lower variance** than CSS - Efficiency gains typically range from **10-50%** depending on innovation distribution --- ## Part 7: Prediction with Time Series Models PMM2-fitted models support forecasting: ```{r prediction} # Forecast 10 steps ahead for the ARMA(1,1) model forecast_horizon <- 10 pred_raw <- predict(fit_pmm2_arma, n.ahead = forecast_horizon) predictions <- if (is.list(pred_raw)) as.numeric(pred_raw$pred) else as.numeric(pred_raw) # Display forecasts cat("Forecasts for next", forecast_horizon, "periods:\n") print(predictions) # Plot original series with forecasts n_plot <- 100 plot_data <- tail(x, n_plot) plot(seq_along(plot_data), plot_data, type = "l", xlim = c(1, n_plot + forecast_horizon), ylim = range(c(plot_data, predictions)), main = "ARMA(1,1) Forecasts", xlab = "Time", ylab = "Value", col = "steelblue", lwd = 1.5) # Add forecasts lines(n_plot + seq_len(forecast_horizon), predictions, col = "red", lwd = 2, lty = 2) legend("topleft", legend = c("Observed", "Forecast"), col = c("steelblue", "red"), lty = c(1, 2), lwd = c(1.5, 2)) ``` --- ## Practical Guidelines ### When to Use PMM2 for Time Series **Recommended when:** - Innovations exhibit **skewness or heavy tails** (check residuals from classical fit) - Working with **financial data** (returns often non-Gaussian) - **Environmental time series** (precipitation, pollution levels) - **Engineering measurements** with asymmetric errors - Sample size **n ≥ 200** for reliable moment estimation **Stick with classical methods when:** - Innovations are approximately **Gaussian** (check Q-Q plot) - Very **small sample sizes** (n < 100) - **Computational speed** is critical - Model is for **pedagogical purposes** only --- ### Model Selection Strategy 1. **Visualize** the time series and check for stationarity 2. **Fit classical model** (CSS/ML) as baseline 3. **Examine residuals** for non-normality: - Skewness: `pmm_skewness(residuals)` - Kurtosis: `pmm_kurtosis(residuals)` - Visual: Q-Q plots, histograms 4. **If residuals are non-Gaussian**, refit with PMM2 5. **Compare standard errors** using bootstrap 6. **Validate** with out-of-sample forecasting --- ### Computational Considerations PMM2 is iterative and may take longer than classical methods: - **AR models**: Typically fast (5-15 iterations) - **MA models**: Moderate (10-30 iterations depending on order) - **ARMA/ARIMA**: Can be slower (20-50 iterations) For large datasets (n > 5000), consider: - Using `verbose = FALSE` to reduce output - Starting with classical estimates as `initial` parameter - Reducing `max_iter` if convergence is slow --- ## Summary Statistics: Efficiency Gains Based on extensive Monte Carlo studies documented in the package: | Model Type | Innovation Distribution | Variance Reduction | Sample Size | |------------|------------------------|-------------------|-------------| | AR(1) | Gamma (skew = 1.4) | 15-20% | 200-1000 | | MA(1) | Chi-squared (df=3) | 15-23% | 200-1000 | | MA(2) | Mixed contamination | 26-32% | 200-1000 | | ARMA(1,1) | Student-t (df=4) | 30-45% | 200-1000 | | ARIMA(1,1,1) | Exponential (shifted) | 24-52% | 300-1000 | **Key Insight:** The more **non-Gaussian** the innovations, the **larger the efficiency gain** from PMM2. --- ## Advanced Topics ### Custom Innovation Distributions You can use `pmm2_monte_carlo_compare()` with custom distributions: ```{r custom_innovations, eval=FALSE} # Custom innovation generator my_innovations <- function(n) { # Mixture: 90% Gaussian + 10% extreme outliers base <- rnorm(n) outliers <- sample(c(-5, 5), n, replace = TRUE, prob = c(0.05, 0.05)) ifelse(runif(n) < 0.9, base, outliers) } # Use in Monte Carlo specs <- list( list( model = "ar", order = 1, theta = 0.6, innovations = list(generator = my_innovations) ) ) mc_results <- pmm2_monte_carlo_compare(specs, methods = c("css", "pmm2"), n = 300, n_sim = 500) ``` --- ## Conclusion The **PMM2 method for time series** provides: 1. **Robust estimation** for AR, MA, ARMA, and ARIMA models 2. **10-50% variance reduction** when innovations deviate from normality 3. **Seamless integration** with standard R workflows 4. **Comprehensive diagnostics** and bootstrap inference For practitioners working with **real-world time series** exhibiting non-Gaussian behavior, PMM2 offers a principled approach to achieving **more precise parameter estimates** without sacrificing interpretability. --- ## Next Steps - **Bootstrap Inference**: See `vignette("03-bootstrap-inference")` for detailed statistical inference - **Linear Regression**: See `vignette("01-pmm2-introduction")` for PMM2 basics - **Advanced Applications**: Explore package demos with `demo(package = "EstemPMM")` --- ## References 1. Zabolotnii, S., Warsza, Z.L., Tkachenko, O. (2018). "Polynomial Estimation of Linear Regression Parameters for the Asymmetric PDF of Errors". Springer, AUTOMATION 2018. 2. Package functions: `?ar_pmm2`, `?ma_pmm2`, `?arma_pmm2`, `?arima_pmm2`, `?ts_pmm2_inference` 3. Monte Carlo tools: `?pmm2_monte_carlo_compare` 4. GitHub: https://github.com/SZabolotnii/EstemPMM