---
title: "The Algebra of MLEs"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{The Algebra of MLEs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
oldopts <- options(digits = 4)
```

## Introduction

The maximum likelihood estimator (MLE) is a technology. Under regularity conditions,
any MLE is asymptotically normal: $\hat\theta \sim N(\theta, I(\theta)^{-1})$, where
$I(\theta)$ is the Fisher information matrix. This fact is remarkable because it
holds regardless of the underlying model -- exponential, Weibull, normal, or anything
else. The MLE reduces the problem of inference to a mean vector and a covariance
matrix.

`algebraic.mle` is the algebra that follows from this fact. It does not find
MLEs -- it takes them as input and provides operations for composition,
transformation, and inference. Given two independent experiments, `joint()` composes
their MLEs into a single joint estimator with block-diagonal covariance. Given three
labs estimating the same quantity, `combine()` produces the optimal inverse-variance
weighted estimate. Given a transformation $g(\theta)$, `rmap()` propagates
uncertainty through the delta method. And `as_dist()` bridges to the distribution
algebra in `algebraic.dist`, where normal distributions can be further composed.

This vignette walks through each operation with concrete examples, culminating
in a full pipeline from independent experiments to system reliability inference.

```{r setup}
library(algebraic.mle)
```

## Creating MLEs

The package provides three constructors for wrapping estimation results into the
MLE algebra. Each produces an object that supports the full interface: `params()`,
`vcov()`, `confint()`, `se()`, `AIC()`, and the algebraic operations.

### Direct construction with `mle()`

When you know the point estimate and its variance-covariance matrix (e.g., from
analytical results or published tables), construct an MLE directly:

```{r mle-direct}
# Exponential rate from a published study: lambda_hat = 0.5, se = 0.07, n = 200
fit_pub <- mle(
  theta.hat = c(lambda = 0.5),
  sigma = matrix(0.07^2),
  nobs = 200L
)
params(fit_pub)
se(fit_pub)
```

### From numerical optimization with `mle_numerical()`

The most common path: maximize a log-likelihood with `optim()`, then wrap the
result. The Hessian at the optimum gives the variance-covariance matrix.

```{r mle-numerical}
set.seed(42)
x <- rexp(80, rate = 2)

loglik <- function(rate) {
  if (rate <= 0) return(-Inf)
  sum(dexp(x, rate = rate, log = TRUE))
}

result <- optim(
  par = c(lambda = 1),
  fn = loglik,
  method = "Brent", lower = 0.01, upper = 20,
  hessian = TRUE,
  control = list(fnscale = -1)
)

fit_num <- mle_numerical(result, options = list(nobs = length(x)))
params(fit_num)
se(fit_num)
```

### From bootstrap with `mle_boot()`

When the sample is small or regularity conditions are questionable, bootstrap
the MLE and wrap the `boot` object:

```{r mle-boot}
set.seed(42)
y <- rexp(30, rate = 2)

boot_result <- boot::boot(
  data = y,
  statistic = function(d, i) c(lambda = 1 / mean(d[i])),
  R = 999
)

fit_boot <- mle_boot(boot_result)
params(fit_boot)
se(fit_boot)
confint(fit_boot, type = "perc")
```

## Composing Independent MLEs

When independent experiments measure different parameters of a system, `joint()`
composes their MLEs into a single joint estimator. The result has a block-diagonal
variance-covariance matrix because the experiments are independent -- there is no
cross-covariance between their parameter estimates.

**Motivating example**: A series system has two components. Lab A tests component 1
and estimates its failure rate $\lambda_1$. Lab B tests component 2 and estimates
$\lambda_2$. Neither lab knows about the other component, but we need the joint
parameter vector $(\lambda_1, \lambda_2)$ for system-level inference.

```{r joint-example}
# Lab A: component 1 failure rate
fit_A <- mle(
  theta.hat = c(lambda1 = 0.02),
  sigma = matrix(0.001^2),
  nobs = 150L
)

# Lab B: component 2 failure rate
fit_B <- mle(
  theta.hat = c(lambda2 = 0.05),
  sigma = matrix(0.003^2),
  nobs = 80L
)

# Joint MLE: block-diagonal covariance
fit_joint <- joint(fit_A, fit_B)
params(fit_joint)
vcov(fit_joint)
```

The off-diagonal entries are zero -- the hallmark of independence. The joint MLE
supports the full interface:

```{r joint-methods}
confint(fit_joint)
se(fit_joint)
```

Use `marginal()` to recover individual component estimates from the joint:

```{r joint-marginal}
# Recover component 2 parameters
fit_B_recovered <- marginal(fit_joint, 2)
params(fit_B_recovered)
se(fit_B_recovered)
```

`joint()` extends to any number of independent MLEs:

```{r joint-three}
fit_C <- mle(
  theta.hat = c(lambda3 = 0.01),
  sigma = matrix(0.0005^2),
  nobs = 200L
)

fit_system <- joint(fit_A, fit_B, fit_C)
params(fit_system)
```

## Combining Repeated Estimates

When multiple independent experiments estimate the **same** parameter,
`combine()` produces the optimal estimator via inverse-variance weighting.
The combined estimate weights more precise estimates more heavily, and the
combined variance is always less than any individual variance.

**Motivating example**: Three labs independently estimate the failure rate
$\lambda$ of the same component type.

```{r combine-example}
lab1 <- mle(theta.hat = c(lambda = 0.050), sigma = matrix(0.004^2), nobs = 50L)
lab2 <- mle(theta.hat = c(lambda = 0.048), sigma = matrix(0.002^2), nobs = 200L)
lab3 <- mle(theta.hat = c(lambda = 0.053), sigma = matrix(0.003^2), nobs = 100L)

combined <- combine(lab1, lab2, lab3)
params(combined)
se(combined)
```

The combined standard error is smaller than any individual standard error:

```{r combine-precision}
cat("Lab SEs:     ", se(lab1), se(lab2), se(lab3), "\n")
cat("Combined SE: ", se(combined), "\n")
```

The combined estimate is pulled toward the more precise estimates. Lab 2 has
the smallest variance, so it receives the most weight.

When all estimates have equal precision, `combine()` reduces to the simple
average:

```{r combine-equal}
eq1 <- mle(theta.hat = c(mu = 10.0), sigma = matrix(1))
eq2 <- mle(theta.hat = c(mu = 12.0), sigma = matrix(1))
eq3 <- mle(theta.hat = c(mu = 11.0), sigma = matrix(1))

eq_combined <- combine(eq1, eq2, eq3)
params(eq_combined)  # (10 + 12 + 11) / 3 = 11
```

## Transformations via Invariance

By the invariance property of the MLE, if $\hat\theta$ is the MLE of $\theta$,
then $g(\hat\theta)$ is the MLE of $g(\theta)$ for any function $g$. The
`rmap()` function computes this transformation and propagates uncertainty
using the delta method:
$$
\text{Var}(g(\hat\theta)) \approx J \, \text{Var}(\hat\theta) \, J^\top,
$$
where $J$ is the Jacobian of $g$ evaluated at $\hat\theta$.

### Univariate example: rate to mean lifetime

An exponential component has rate $\lambda$. The mean lifetime is
$\mu = 1/\lambda$. Transform the MLE:

```{r rmap-univariate}
fit_rate <- mle(
  theta.hat = c(lambda = 0.02),
  sigma = matrix(0.001^2),
  nobs = 150L
)

# g: rate -> mean lifetime
fit_lifetime <- rmap(fit_rate, function(theta) c(mean_life = 1 / theta[1]),
  method = "delta")
params(fit_lifetime)
se(fit_lifetime)
confint(fit_lifetime)
```

### Multivariate example: component rates to system reliability

For a series system with independent exponential components, the system
reliability at time $t$ is
$$
R(t) = \exp(-\lambda_1 t) \cdot \exp(-\lambda_2 t) = \exp(-(\lambda_1 + \lambda_2) t).
$$

Starting from the joint MLE of $(\lambda_1, \lambda_2)$:

```{r rmap-system}
fit_rates <- joint(
  mle(theta.hat = c(lambda1 = 0.02), sigma = matrix(0.001^2), nobs = 150L),
  mle(theta.hat = c(lambda2 = 0.05), sigma = matrix(0.003^2), nobs = 80L)
)

# System reliability at t = 10 hours
t0 <- 10
R_mle <- rmap(fit_rates,
  function(theta) c(R_t = exp(-(theta[1] + theta[2]) * t0)),
  method = "delta"
)

params(R_mle)
se(R_mle)
confint(R_mle)
```

The delta method automatically computes the Jacobian via numerical differentiation
(using `numDeriv::jacobian`) and propagates the covariance.

## Bridging to Distribution Algebra

`as_dist()` converts an MLE to its asymptotic normal (or multivariate normal)
distribution, bridging into the `algebraic.dist` package where distributions
can be further composed.

```{r as-dist-basic}
fit1 <- mle(theta.hat = c(lambda = 0.02), sigma = matrix(0.001^2))
d1 <- as_dist(fit1)
d1
```

This is useful when you want to reason about the sampling distribution of the
MLE as a distribution object. For instance, you can compute the probability that
the true rate exceeds a regulatory threshold:

```{r as-dist-prob}
# P(lambda > 0.021) under the asymptotic distribution
1 - cdf(d1)(0.021)
```

For bootstrap MLEs, `as_dist()` returns an empirical distribution built from
the bootstrap replicates:

```{r as-dist-boot}
d_boot <- as_dist(fit_boot)
d_boot
```

## Full Pipeline

Here is an end-to-end example tying together all the algebraic operations.

**Scenario**: A series system has two independent components with exponential
lifetimes. Separate accelerated life tests estimate each component's failure
rate. We want to infer the system reliability at a mission time of $t = 50$
hours, including a confidence interval.

```{r pipeline-setup}
set.seed(123)

# --- Step 1: Independent MLEs from separate experiments ---

# Component 1: 200 observed lifetimes
x1 <- rexp(200, rate = 0.003)
loglik1 <- function(rate) {
  if (rate <= 0) return(-Inf)
  sum(dexp(x1, rate = rate, log = TRUE))
}
fit1 <- mle_numerical(
  optim(par = c(lambda1 = 0.001), fn = loglik1,
        method = "Brent", lower = 1e-6, upper = 1,
        hessian = TRUE, control = list(fnscale = -1)),
  options = list(nobs = length(x1))
)

# Component 2: 120 observed lifetimes
x2 <- rexp(120, rate = 0.008)
loglik2 <- function(rate) {
  if (rate <= 0) return(-Inf)
  sum(dexp(x2, rate = rate, log = TRUE))
}
fit2 <- mle_numerical(
  optim(par = c(lambda2 = 0.001), fn = loglik2,
        method = "Brent", lower = 1e-6, upper = 1,
        hessian = TRUE, control = list(fnscale = -1)),
  options = list(nobs = length(x2))
)

cat("Component 1 rate:", params(fit1), "  SE:", se(fit1), "\n")
cat("Component 2 rate:", params(fit2), "  SE:", se(fit2), "\n")
```

```{r pipeline-compose}
# --- Step 2: Compose into joint parameter space ---
fit_system <- joint(fit1, fit2)
cat("Joint parameters:", params(fit_system), "\n")
cat("Joint vcov:\n")
vcov(fit_system)
```

```{r pipeline-transform}
# --- Step 3: Transform to system reliability at t = 50 ---
mission_time <- 50
R_system <- rmap(fit_system,
  function(theta) c(R_sys = exp(-(theta[1] + theta[2]) * mission_time)),
  method = "delta"
)

cat("System reliability R(50):", params(R_system), "\n")
cat("SE of R(50):             ", se(R_system), "\n")
```

```{r pipeline-inference}
# --- Step 4: Inference ---
cat("95% CI for R(50):\n")
confint(R_system)
```

```{r pipeline-dist}
# --- Step 5: Bridge to distribution algebra ---
R_dist <- as_dist(R_system)
cat("Asymptotic distribution of R(50):", "\n")
R_dist

# Probability that system reliability exceeds 55%
cat("P(R(50) > 0.55):", 1 - cdf(R_dist)(0.55), "\n")
```

The pipeline reads as a chain of algebraic operations: two independent MLEs
are composed via `joint()`, transformed to the quantity of interest via `rmap()`,
and then reasoned about as a distribution via `as_dist()`. Each step preserves
the uncertainty structure inherited from the original experiments.

```{r, include = FALSE}
options(oldopts)
```