PMM3: Linear Regression for Symmetric Platykurtic Errors

Serhii Zabolotnii

2026-04-06

The Problem: When OLS and PMM2 Are Not Enough

Ordinary Least Squares (OLS) is optimal for Gaussian errors. PMM2 improves upon OLS when errors are asymmetric (skewed). But what about errors that are symmetric yet non-Gaussian?

Many real-world processes produce errors with lighter tails than the normal distribution (platykurtic errors, \(\gamma_4 < 0\)). In these cases:

Where Platykurtic Errors Arise


The Solution: PMM3 (Polynomial Maximization Method, S=3)

PMM3 constructs a cubic stochastic polynomial using the fourth and sixth central moments. The key theoretical quantities:

The factor \(g_3\) is the ratio \(\text{Var}(\hat{\beta}_{\text{PMM3}}) / \text{Var}(\hat{\beta}_{\text{OLS}})\). When \(g_3 < 1\), PMM3 achieves lower variance than OLS.

The Newton-Raphson Algorithm

PMM3 iteratively refines OLS estimates using the score equation:

\[ Z_{1\nu} = \varepsilon_\nu (\kappa - \varepsilon_\nu^2), \quad \kappa = m_4 / m_2 \]

with Jacobian \(J = X^\top \text{diag}(3\varepsilon^2 - \kappa) X\). The solver includes step-size limiting and a divergence guard for numerical stability.

Reference: Based on the theoretical framework of the Polynomial Maximization Method (Zabolotnii et al., 2018).


Getting Started

library(EstemPMM)
set.seed(42)

Example 1: Simple Regression with Uniform Errors

Uniform errors are the canonical platykurtic case: \(\gamma_4 \approx -1.2\), symmetric, bounded.

Data Generation

n <- 300

# Generate predictor
x <- rnorm(n, mean = 5, sd = 2)

# True parameters
beta_0 <- 10
beta_1 <- 2.5

# Generate uniform errors (platykurtic, gamma4 ≈ -1.2)
errors <- runif(n, -sqrt(3), sqrt(3))  # variance = 1

# Response variable
y <- beta_0 + beta_1 * x + errors
dat <- data.frame(y = y, x = x)

Visualize Error Distribution

fit_ols_temp <- lm(y ~ x, data = dat)
res_ols <- residuals(fit_ols_temp)

par(mfrow = c(1, 2))
hist(res_ols, breaks = 30, main = "Residual Distribution (Uniform Errors)",
     xlab = "Residuals", col = "lightblue", border = "white")
qqnorm(res_ols, main = "Q-Q Plot")
qqline(res_ols, col = "red", lwd = 2)

par(mfrow = c(1, 1))

cat("Skewness (gamma3):", round(pmm_skewness(res_ols), 3), "\n")
#> Skewness (gamma3): 0.026
cat("Excess kurtosis (gamma4):", round(pmm_kurtosis(res_ols) - 3, 3), "\n")
#> Excess kurtosis (gamma4): -4.183

The Q-Q plot shows lighter tails than normal (S-shaped curve with compressed ends). The skewness is near zero but kurtosis is clearly negative.

Automatic Method Selection

recommendation <- pmm_dispatch(res_ols)
#>   n = 300 | gamma3 = +0.026 | gamma4 = -1.183
#>   g2(PMM2) = 0.9992 | g3(PMM3) = 0.3083
#>   >>> |gamma3| = 0.026 < 0.3 (symmetric) and gamma4 = -1.183 < -0.7 (platykurtic). PMM3 expected to reduce variance by 69.2%.

Fit PMM3

fit_pmm3 <- lm_pmm3(y ~ x, data = dat)
fit_ols  <- lm(y ~ x, data = dat)

# Display PMM3 summary
summary(fit_pmm3)
#> PMM3 estimation results (S = 3, symmetric errors)
#> Call:
#> lm_pmm3(formula = y ~ x, data = dat)
#> 
#> Coefficients:
#> (Intercept)           x 
#>    9.916678    2.518456 
#> 
#> Central moments of initial residuals:
#>   m2 = 0.9555941 
#>   m4 = 1.658808 
#>   m6 = 3.424153 
#> 
#> Theoretical characteristics of PMM3 (S = 3):
#>   gamma4 = -1.183442 
#>   gamma6 = 6.675667 
#>   g3     = 0.3082707  (expected ratio Var[PMM3]/Var[OLS])
#>   kappa  = 1.231908 
#> 
#> Algorithm information:
#>   Convergence status: TRUE 
#>   Iterations: 3

Compare with OLS

comparison <- data.frame(
  Parameter = c("Intercept", "Slope"),
  True  = c(beta_0, beta_1),
  OLS   = coef(fit_ols),
  PMM3  = coef(fit_pmm3),
  Diff  = coef(fit_pmm3) - coef(fit_ols)
)
print(comparison, row.names = FALSE)
#>  Parameter True      OLS     PMM3        Diff
#>  Intercept 10.0 9.804314 9.916678  0.11236314
#>      Slope  2.5 2.537942 2.518456 -0.01948666

cat("\nResidual sum of squares:\n")
#> 
#> Residual sum of squares:
cat("  OLS:  ", sum(residuals(fit_ols)^2), "\n")
#>   OLS:   286.6782
cat("  PMM3: ", sum(residuals(fit_pmm3)^2), "\n")
#>   PMM3:  287.1956

cat("\nTheoretical variance ratio g3 =",
    fit_pmm3@g_coefficient, "\n")
#> 
#> Theoretical variance ratio g3 = 0.3082707
cat("Expected variance reduction:",
    round((1 - fit_pmm3@g_coefficient) * 100, 1), "%\n")
#> Expected variance reduction: 69.2 %

Example 2: Three-Way Comparison (OLS vs PMM2 vs PMM3)

For symmetric platykurtic errors, PMM3 should outperform both OLS and PMM2. PMM2 should offer no improvement since \(\gamma_3 \approx 0\).

# Fit all three methods
fit_ols  <- lm(y ~ x, data = dat)
fit_pmm2 <- lm_pmm2(y ~ x, data = dat)
fit_pmm3 <- lm_pmm3(y ~ x, data = dat)

comparison3 <- data.frame(
  Method = c("OLS", "PMM2", "PMM3"),
  Intercept = c(coef(fit_ols)[1], coef(fit_pmm2)[1], coef(fit_pmm3)[1]),
  Slope     = c(coef(fit_ols)[2], coef(fit_pmm2)[2], coef(fit_pmm3)[2])
)
print(comparison3, row.names = FALSE, digits = 5)
#>  Method Intercept  Slope
#>     OLS    9.8043 2.5379
#>    PMM2    9.8058 2.5376
#>    PMM3    9.9167 2.5185

cat("\nTrue values: Intercept =", beta_0, ", Slope =", beta_1, "\n")
#> 
#> True values: Intercept = 10 , Slope = 2.5
cat("\nEfficiency factors:\n")
#> 
#> Efficiency factors:
vf2 <- pmm2_variance_factor(fit_pmm2@m2, fit_pmm2@m3, fit_pmm2@m4)
cat("  PMM2 g2 =", vf2$g2,
    " (improvement:", round((1 - vf2$g2) * 100, 1), "%)\n")
#>   PMM2 g2 =  (improvement:  %)
cat("  PMM3 g3 =", fit_pmm3@g_coefficient,
    " (improvement:", round((1 - fit_pmm3@g_coefficient) * 100, 1), "%)\n")
#>   PMM3 g3 = 0.3082707  (improvement: 69.2 %)

With symmetric platykurtic errors, PMM2’s \(g_2 \approx 1\) (no gain), while PMM3’s \(g_3 \ll 1\) (substantial gain).


Example 3: Real Data — Auto MPG (Quadratic Regression)

The auto_mpg dataset contains fuel consumption data for 398 vehicles. The regression of MPG vs Horsepower (quadratic model) produces residuals with \(|\gamma_3| \approx 0.2\) and \(\gamma_4 \approx -1.3\) — a case where PMM3 is appropriate.

data(auto_mpg)

# Remove missing horsepower values
auto_complete <- na.omit(auto_mpg[, c("mpg", "horsepower")])

# Visualize the relationship (clearly nonlinear)
plot(auto_complete$horsepower, auto_complete$mpg,
     main = "MPG vs Horsepower",
     xlab = "Horsepower", ylab = "Miles per Gallon",
     col = "steelblue", pch = 16, cex = 0.8)

Fit Quadratic Model

# Quadratic OLS
fit_auto_ols <- lm(mpg ~ horsepower + I(horsepower^2), data = auto_complete)

# Check residual properties
res_auto <- residuals(fit_auto_ols)
cat("Residual diagnostics:\n")
#> Residual diagnostics:
cat("  Skewness (gamma3):", round(pmm_skewness(res_auto), 3), "\n")
#>   Skewness (gamma3): 0.218
cat("  Kurtosis (gamma4):", round(pmm_kurtosis(res_auto) - 3, 3), "\n")
#>   Kurtosis (gamma4): -1.701
sym_test <- test_symmetry(res_auto)
cat("  Symmetric:", sym_test$is_symmetric, "\n")
#>   Symmetric: TRUE

# Dispatch recommendation
dispatch_auto <- pmm_dispatch(res_auto)
#>   n = 392 | gamma3 = +0.218 | gamma4 = +1.299
#>   g2(PMM2) = 0.9856 | g3(PMM3) = 0.8951
#>   >>> gamma3 = 0.218, gamma4 = 1.299: near-Gaussian residuals. No PMM advantage expected. Use OLS.

PMM3 Estimation

# Fit PMM3 (quadratic model)
fit_auto_pmm3 <- lm_pmm3(mpg ~ horsepower + I(horsepower^2),
                          data = auto_complete)

# Fit PMM2 for comparison
fit_auto_pmm2 <- lm_pmm2(mpg ~ horsepower + I(horsepower^2),
                          data = auto_complete, na.action = na.omit)

# Compare all three methods
comparison_auto <- data.frame(
  Method    = c("OLS", "PMM2", "PMM3"),
  Intercept = c(coef(fit_auto_ols)[1], coef(fit_auto_pmm2)[1],
                coef(fit_auto_pmm3)[1]),
  HP        = c(coef(fit_auto_ols)[2], coef(fit_auto_pmm2)[2],
                coef(fit_auto_pmm3)[2]),
  HP_sq     = c(coef(fit_auto_ols)[3], coef(fit_auto_pmm2)[3],
                coef(fit_auto_pmm3)[3])
)
print(comparison_auto, row.names = FALSE, digits = 6)
#>  Method Intercept        HP      HP_sq
#>     OLS   56.9001 -0.466190 0.00123054
#>    PMM2   56.0466 -0.454033 0.00119689
#>    PMM3   58.1803 -0.488974 0.00131442

cat("\nPMM3 g3 =", fit_auto_pmm3@g_coefficient,
    " (improvement:", round((1 - fit_auto_pmm3@g_coefficient) * 100, 1), "%)\n")
#> 
#> PMM3 g3 = 0.895111  (improvement: 10.5 %)

Visualize the Fits

hp_seq <- seq(min(auto_complete$horsepower),
              max(auto_complete$horsepower), length.out = 200)
newdata <- data.frame(horsepower = hp_seq)

pred_ols  <- predict(fit_auto_ols, newdata = newdata)
pred_pmm3 <- predict(fit_auto_pmm3,
                     newdata = data.frame(horsepower = hp_seq,
                                          `I(horsepower^2)` = hp_seq^2))

plot(auto_complete$horsepower, auto_complete$mpg,
     main = "MPG vs Horsepower: OLS and PMM3 Quadratic Fits",
     xlab = "Horsepower", ylab = "Miles per Gallon",
     col = "gray70", pch = 16, cex = 0.7)
lines(hp_seq, pred_ols, col = "blue", lwd = 2)
lines(hp_seq, as.numeric(pred_pmm3), col = "red", lwd = 2, lty = 2)
legend("topright", legend = c("OLS", "PMM3"),
       col = c("blue", "red"), lty = c(1, 2), lwd = 2)


Example 4: Real Data — Auto MPG (Linear, PMM2 Case)

For contrast, the regression of MPG vs Acceleration produces asymmetric residuals (\(\gamma_3 \approx 0.5\)), where PMM2 is appropriate instead.

# Linear model: MPG vs Acceleration
fit_accel_ols <- lm(mpg ~ acceleration, data = auto_mpg)
res_accel <- residuals(fit_accel_ols)

cat("Acceleration residual diagnostics:\n")
#> Acceleration residual diagnostics:
cat("  Skewness (gamma3):", round(pmm_skewness(res_accel), 3), "\n")
#>   Skewness (gamma3): 0.497
cat("  Kurtosis (gamma4):", round(pmm_kurtosis(res_accel) - 3, 3), "\n")
#>   Kurtosis (gamma4): -3.33

# Dispatch
dispatch_accel <- pmm_dispatch(res_accel)
#>   n = 398 | gamma3 = +0.497 | gamma4 = -0.330
#>   g2(PMM2) = 0.8519 | g3(PMM3) = 0.9779
#>   >>> |gamma3| = 0.497 > 0.3 and g2 = 0.852 < 0.95: moderate asymmetry, PMM2 worthwhile (14.8% reduction).

# Compare
fit_accel_pmm2 <- lm_pmm2(mpg ~ acceleration, data = auto_mpg, na.action = na.omit)
vf2_accel <- pmm2_variance_factor(fit_accel_pmm2@m2, fit_accel_pmm2@m3, fit_accel_pmm2@m4)
cat("\nPMM2 g2 =", vf2_accel$g2,
    " (improvement:", round((1 - vf2_accel$g2) * 100, 1), "%)\n")
#> 
#> PMM2 g2 =  (improvement:  %)

This demonstrates how pmm_dispatch() guides users to the correct method: PMM3 for symmetric platykurtic residuals, PMM2 for asymmetric residuals.


Example 5: Multiple Regression

PMM3 extends naturally to multiple predictors.

n <- 300
x1 <- rnorm(n)
x2 <- runif(n, -2, 2)
x3 <- rnorm(n, 3, 1)

# True parameters
beta <- c(5, 1.5, -0.8, 2.0)

# Beta-symmetric errors (platykurtic, gamma4 ≈ -0.86)
errors_beta <- (rbeta(n, 3, 3) - 0.5) * sqrt(12 * 3)

y_multi <- beta[1] + beta[2]*x1 + beta[3]*x2 + beta[4]*x3 + errors_beta
dat_multi <- data.frame(y = y_multi, x1 = x1, x2 = x2, x3 = x3)

# Fit
fit_multi_ols  <- lm(y ~ x1 + x2 + x3, data = dat_multi)
fit_multi_pmm3 <- lm_pmm3(y ~ x1 + x2 + x3, data = dat_multi)

comparison_multi <- data.frame(
  Parameter = c("Intercept", "x1", "x2", "x3"),
  True  = beta,
  OLS   = coef(fit_multi_ols),
  PMM3  = coef(fit_multi_pmm3),
  Diff  = coef(fit_multi_pmm3) - coef(fit_multi_ols)
)
print(comparison_multi, row.names = FALSE, digits = 4)
#>  Parameter True     OLS    PMM3      Diff
#>  Intercept  5.0  4.9943  4.9668 -0.027433
#>         x1  1.5  1.6223  1.6114 -0.010928
#>         x2 -0.8 -0.7644 -0.7735 -0.009073
#>         x3  2.0  1.9923  1.9992  0.006979

cat("\ng3 =", fit_multi_pmm3@g_coefficient,
    " (improvement:", round((1 - fit_multi_pmm3@g_coefficient) * 100, 1), "%)\n")
#> 
#> g3 = 0.9000364  (improvement: 10 %)

Example 6: No-Intercept Model

PMM3 supports models without an intercept term.

# Model through the origin
y_noint <- 3 * x1 + errors_beta[1:n]
dat_noint <- data.frame(y = y_noint, x1 = x1)

fit_noint <- lm_pmm3(y ~ x1 - 1, data = dat_noint)
cat("True slope: 3\n")
#> True slope: 3
cat("PMM3 estimate:", coef(fit_noint), "\n")
#> PMM3 estimate: 3.11516
cat("Converged:", fit_noint@convergence, "\n")
#> Converged: TRUE

Interpreting the PMM3 Summary

The summary() method reports several PMM3-specific quantities:

summary(fit_pmm3)
#> PMM3 estimation results (S = 3, symmetric errors)
#> Call:
#> lm_pmm3(formula = y ~ x, data = dat)
#> 
#> Coefficients:
#> (Intercept)           x 
#>    9.916678    2.518456 
#> 
#> Central moments of initial residuals:
#>   m2 = 0.9555941 
#>   m4 = 1.658808 
#>   m6 = 3.424153 
#> 
#> Theoretical characteristics of PMM3 (S = 3):
#>   gamma4 = -1.183442 
#>   gamma6 = 6.675667 
#>   g3     = 0.3082707  (expected ratio Var[PMM3]/Var[OLS])
#>   kappa  = 1.231908 
#> 
#> Algorithm information:
#>   Convergence status: TRUE 
#>   Iterations: 3
Quantity Meaning
m2, m4, m6 Central moments of the initial (OLS) residuals
gamma4 Excess kurtosis: negative = platykurtic, zero = Gaussian
gamma6 Sixth-order cumulant coefficient
g3 Theoretical efficiency factor: \(\text{Var}(\hat\beta_{\text{PMM3}}) / \text{Var}(\hat\beta_{\text{OLS}})\)
kappa Moment ratio \(m_4/m_2\) used in the NR score equations
Convergence Whether the Newton-Raphson algorithm converged
Iterations Number of NR iterations performed

Rule of thumb: If \(g_3 < 0.90\), PMM3 provides a meaningful improvement over OLS (\(> 10\%\) variance reduction).


Adaptive Mode

By default, PMM3 computes \(\kappa\) from OLS residuals and holds it fixed throughout the Newton-Raphson iterations. In adaptive mode, \(\kappa\) is re-estimated from the current residuals at each iteration:

fit_fixed    <- lm_pmm3(y ~ x, data = dat, adaptive = FALSE)
fit_adaptive <- lm_pmm3(y ~ x, data = dat, adaptive = TRUE)

comparison_adapt <- data.frame(
  Mode       = c("Fixed kappa", "Adaptive kappa"),
  Intercept  = c(coef(fit_fixed)[1], coef(fit_adaptive)[1]),
  Slope      = c(coef(fit_fixed)[2], coef(fit_adaptive)[2]),
  Iterations = c(fit_fixed@iterations, fit_adaptive@iterations)
)
print(comparison_adapt, row.names = FALSE, digits = 5)
#>            Mode Intercept  Slope Iterations
#>     Fixed kappa    9.9167 2.5185          3
#>  Adaptive kappa    9.9177 2.5183          4

Adaptive mode can improve estimates when the error distribution departs significantly from what the initial OLS residuals suggest, at the cost of additional iterations.


Diagnostic Plots

PMM3 provides four diagnostic plots:

plot(fit_pmm3)

  1. Residuals vs Fitted — check for non-linearity and heteroscedasticity
  2. Q-Q Plot — platykurtic residuals show lighter tails (compressed ends)
  3. Scale-Location — check homoscedasticity
  4. Residual Histogram — platykurtic distributions appear flatter than a bell curve, often with “chopped” tails

Utility Functions

Automatic Method Selection

# Symmetric platykurtic errors → PMM3
pmm_dispatch(runif(500, -1, 1))
#>   n = 500 | gamma3 = -0.026 | gamma4 = -1.272
#>   g2(PMM2) = 0.9990 | g3(PMM3) = 0.2438
#>   >>> |gamma3| = 0.026 < 0.3 (symmetric) and gamma4 = -1.272 < -0.7 (platykurtic). PMM3 expected to reduce variance by 75.6%.

# Asymmetric errors → PMM2
pmm_dispatch(rexp(500) - 1)
#>   n = 500 | gamma3 = +2.351 | gamma4 = +8.946
#>   g2(PMM2) = 0.4953 | g3(PMM3) = 0.7901
#>   >>> |gamma3| = 2.351 > 1.0: strong asymmetry detected. PMM2 expected to reduce variance by 50.5%.

# Gaussian errors → OLS
pmm_dispatch(rnorm(500))
#>   n = 500 | gamma3 = -0.001 | gamma4 = +0.007
#>   g2(PMM2) = 1.0000 | g3(PMM3) = 1.0000
#>   >>> gamma3 = -0.001, gamma4 = 0.007: near-Gaussian residuals. No PMM advantage expected. Use OLS.

Testing Symmetry

test_symmetry(residuals(fit_ols))
#> $gamma3
#> [1] 0.02563207
#> 
#> $is_symmetric
#> [1] TRUE
#> 
#> $message
#> [1] "|gamma3| = 0.026 <= 0.3: residuals are symmetric. PMM3 may be appropriate."

Computing Moments

mom <- compute_moments_pmm3(residuals(fit_ols))
cat("m2 =", mom$m2, "\n")
#> m2 = 0.9555941
cat("m4 =", mom$m4, "\n")
#> m4 = 1.658808
cat("m6 =", mom$m6, "\n")
#> m6 = 3.424153
cat("gamma4 =", mom$gamma4, "\n")
#> gamma4 = -1.183442
cat("kappa =", mom$kappa, "\n")
#> kappa = 1.231908
cat("g3 =", mom$g3, "\n")
#> g3 = 0.3082707

Sixth-Order Cumulant

pmm_gamma6(residuals(fit_ols))
#> [1] 6.675667

Variance Factor

pmm3_variance_factor(mom$m2, mom$m4, mom$m6)
#> $gamma4
#> [1] -1.183442
#> 
#> $gamma6
#> [1] 6.675667
#> 
#> $g3
#> [1] 0.3082707

When to Use PMM3

Use PMM3 when:

Use PMM2 instead when:

Stick with OLS when:

Decision Workflow

  1. Fit OLS as baseline
  2. Check residuals: pmm_skewness(), pmm_kurtosis(), test_symmetry()
  3. Use pmm_dispatch() for automatic recommendation
  4. Refit with the recommended method
  5. Compare coefficients and efficiency factors

Efficiency Summary (Monte Carlo Evidence)

Based on Monte Carlo simulations (R = 1000 replications):

Error Distribution \(\gamma_4\) \(g_3\) Variance Reduction
Uniform \(-1.20\) \(0.64\) 36%
Beta(2,2) symmetric \(-0.86\) \(0.78\) 22%
Beta(3,3) symmetric \(-0.57\) \(0.88\) 12%
Truncated Normal (\(|x| \leq 2\)) \(-0.47\) \(0.91\) 9%
Gaussian (control) \(0.00\) \(1.00\) 0%

Key insight: The more platykurtic the errors (\(\gamma_4\) further from 0), the larger the efficiency gain from PMM3.


Next Steps


References

  1. Zabolotnii S., Warsza Z.L., Tkachenko O. (2018). Polynomial Estimation of Linear Regression Parameters for the Asymmetric PDF of Errors. Springer AISC, vol 743. doi:10.1007/978-3-319-77179-3_75

  2. Package functions: ?lm_pmm3, ?pmm_dispatch, ?compute_moments_pmm3, ?test_symmetry, ?pmm_gamma6

  3. GitHub: https://github.com/SZabolotnii/EstemPMM