---
title: "Mathematical Foundations of Series Systems"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Mathematical Foundations of Series Systems}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4
)
```

## Series System Theory

A **series system** consists of $m$ independent components, each with lifetime $T_j$ and survival function $S_j(t) = P(T_j > t)$. The system fails when *any* component fails, so the system lifetime is:

$$T_{sys} = \min(T_1, \ldots, T_m)$$

Since the components are independent:

$$S_{sys}(t) = P(T_{sys} > t) = P(T_1 > t, \ldots, T_m > t) = \prod_{j=1}^{m} S_j(t)$$

Taking the negative logarithm gives the cumulative hazard:

$$H_{sys}(t) = -\log S_{sys}(t) = \sum_{j=1}^{m} H_j(t)$$

Differentiating yields the hazard rate:

$$h_{sys}(t) = \frac{d}{dt} H_{sys}(t) = \sum_{j=1}^{m} h_j(t)$$

This is the fundamental result: **the system hazard is the sum of component hazards**.

## Key Properties

Let's verify these relationships numerically.

### Property 1: Hazard Sum

```{r hazard-sum}
library(serieshaz)

sys <- dfr_dist_series(list(
    dfr_weibull(shape = 2, scale = 100),
    dfr_exponential(0.01),
    dfr_gompertz(a = 0.001, b = 0.05)
))

h_sys <- hazard(sys)
h1 <- component_hazard(sys, 1)
h2 <- component_hazard(sys, 2)
h3 <- component_hazard(sys, 3)

t_vals <- c(10, 50, 100, 200)
for (t0 in t_vals) {
    lhs <- h_sys(t0)
    rhs <- h1(t0) + h2(t0) + h3(t0)
    cat(sprintf("t=%3d: h_sys=%.6f, sum h_j=%.6f, diff=%.2e\n",
                t0, lhs, rhs, abs(lhs - rhs)))
}
```

### Property 2: Survival Product

```{r surv-product}
S_sys <- surv(sys)
comp1 <- component(sys, 1)
comp2 <- component(sys, 2)
comp3 <- component(sys, 3)
S1 <- surv(comp1)
S2 <- surv(comp2)
S3 <- surv(comp3)

for (t0 in t_vals) {
    lhs <- S_sys(t0)
    rhs <- S1(t0) * S2(t0) * S3(t0)
    cat(sprintf("t=%3d: S_sys=%.6f, prod S_j=%.6f, diff=%.2e\n",
                t0, lhs, rhs, abs(lhs - rhs)))
}
```

### Property 3: Cumulative Hazard Sum

```{r cumhaz-sum}
# The system's cumulative hazard is the sum of component cumulative hazards
H_sys <- cum_haz(sys)
H1 <- cum_haz(comp1)
H2 <- cum_haz(comp2)
H3 <- cum_haz(comp3)

for (t0 in t_vals) {
    lhs <- H_sys(t0)
    rhs <- H1(t0) + H2(t0) + H3(t0)
    cat(sprintf("t=%3d: H_sys=%.6f, sum H_j=%.6f, diff=%.2e\n",
                t0, lhs, rhs, abs(lhs - rhs)))
}
```

### Property 4: Density Formula

The system density is:

$$f_{sys}(t) = h_{sys}(t) \cdot S_{sys}(t) = \left[\sum_{j=1}^m h_j(t)\right] \exp\left[-\sum_{j=1}^m H_j(t)\right]$$

```{r density-formula}
f_sys <- density(sys)
for (t0 in t_vals) {
    from_density <- f_sys(t0)
    from_hS <- h_sys(t0) * S_sys(t0)
    cat(sprintf("t=%3d: f(t)=%.8f, h(t)*S(t)=%.8f, diff=%.2e\n",
                t0, from_density, from_hS, abs(from_density - from_hS)))
}
```

The `density()` method and the manual $h(t) \cdot S(t)$ calculation agree to machine precision. Note that at $t = 200$, both the density and survival are effectively zero --- by that time, the combined Weibull + exponential + Gompertz hazard has driven the cumulative hazard so high that virtually all systems have already failed.

## The Parameter Layout

The series system stores parameters from all components as a single flat vector. This is critical for optimization — MLE fitting via `optim()` requires a single parameter vector.

```{r layout-demo}
sys <- dfr_dist_series(list(
    dfr_weibull(shape = 2, scale = 100),    # 2 params
    dfr_exponential(0.05),            # 1 param
    dfr_gompertz(a = 0.01, b = 0.1)         # 2 params
))

# Full parameter vector (5 elements)
params(sys)

# Layout maps global indices to components
param_layout(sys)

# Component 1 uses par[1:2], component 2 uses par[3], component 3 uses par[4:5]
```

When an optimizer proposes a new parameter vector `par = c(p1, p2, p3, p4, p5)`, the series system internally slices it: component 1 gets `par[1:2]`, component 2 gets `par[3]`, and component 3 gets `par[4:5]`.

## Analytical vs. Numerical Cumulative Hazard

The cumulative hazard $H(t) = \int_0^t h(u)\,du$ is needed for survival, CDF, and log-likelihood computations. When all components provide an analytical `cum_haz_rate`, the series system computes $H_{sys}(t) = \sum_j H_j(t)$ exactly. Otherwise, it falls back to numerical integration.

```{r analytical-check}
# All standard distributions provide analytical cumulative hazard
sys_analytical <- dfr_dist_series(list(
    dfr_weibull(shape = 2, scale = 100),
    dfr_exponential(0.05)
))
cat("Has analytical cum_haz:", !is.null(sys_analytical$cum_haz_rate), "\n")

# A custom dfr_dist without cum_haz_rate forces numerical fallback
custom <- dfr_dist(
    rate = function(t, par, ...) par[1] * t^(par[1] - 1),
    par = c(1.5)
)
sys_numerical <- dfr_dist_series(list(
    custom,
    dfr_exponential(0.05)
))
cat("Has analytical cum_haz:", !is.null(sys_numerical$cum_haz_rate), "\n")
```

The analytical path is faster and more precise, but the numerical fallback works correctly for any valid hazard function.

## Score Function: Decomposed Gradient

The log-likelihood for a series system with observed data $(t_i, \delta_i)$ is:

$$\ell(\boldsymbol{\theta}) = \sum_{i} \left[\delta_i \log h_{sys}(t_i; \boldsymbol{\theta}) - H_{sys}(t_i; \boldsymbol{\theta})\right]$$

where $\delta_i = 1$ for exact observations, $\delta_i = 0$ for right-censored,
and $\delta_i = -1$ for left-censored.

### The naive approach

A generic gradient computation would apply `numDeriv::grad` to the full
log-likelihood, perturbing the entire parameter vector $\boldsymbol{\theta}$
of length $P$.  Each perturbation re-evaluates all $m$ component hazards at all
$n$ observations, for a cost of $O(P \cdot n \cdot m)$.

### How `serieshaz` decomposes the gradient

The series structure admits a per-component decomposition.  Because
$H_{sys} = \sum_j H_j$ and $h_{sys} = \sum_j h_j$, the gradient with respect to
component $k$'s parameters $\boldsymbol{\theta}_k$ separates into two terms:

$$\frac{\partial \ell}{\partial \boldsymbol{\theta}_k} =
\underbrace{
  \sum_{i:\,\delta_i=1} \frac{1}{h_{sys}(t_i)}
  \frac{\partial h_k(t_i)}{\partial \boldsymbol{\theta}_k}
}_{\text{hazard derivative term}}
\;-\;
\underbrace{
  \sum_{i}
  \frac{\partial H_k(t_i)}{\partial \boldsymbol{\theta}_k}
}_{\text{cumulative hazard derivative term}}$$

The two terms are computed differently:

| Term | Method | Cost |
|------|--------|------|
| Hazard derivative | `numDeriv::jacobian` on $h_k(t, \boldsymbol{\theta}_k)$ | $O(p_k \cdot n_{\text{exact}})$ per component |
| Cumulative hazard derivative | **Analytical** via all-censored trick (when the component provides `score_fn`) | $O(n)$ per component |
| Cumulative hazard derivative | `numDeriv::grad` on $\sum_i H_k(t_i)$ (fallback) | $O(p_k \cdot n)$ per component |

Total cost: $O(P \cdot n)$, versus $O(P \cdot n \cdot m)$ for the naive approach ---
roughly an $m$-fold speedup.

### The all-censored trick

The key insight for the cumulative hazard derivative is that a component's own
`score_fn` computes:

$$\text{score}_k(\boldsymbol{\theta}_k) =
  \sum_i \delta_i \frac{\partial \log h_k(t_i)}{\partial \boldsymbol{\theta}_k}
  - \sum_i \frac{\partial H_k(t_i)}{\partial \boldsymbol{\theta}_k}$$

If we call `score_fn` with all observations marked as right-censored
($\delta_i = 0$), the first sum vanishes and we get:

$$\text{score}_k\big|_{\delta=0} =
  -\sum_i \frac{\partial H_k(t_i)}{\partial \boldsymbol{\theta}_k}$$

This extracts the cumulative hazard derivative analytically --- no numerical
differentiation needed.  All built-in distributions (exponential, Weibull,
Gompertz, log-logistic) provide `score_fn`, so this analytical path is the
common case.

### Numerical demonstration

```{r score-decomposed}
sys <- dfr_dist_series(list(
    dfr_weibull(shape = 2, scale = 100),
    dfr_exponential(0.01),
    dfr_gompertz(a = 0.001, b = 0.05)
))

set.seed(42)
times <- sampler(sys)(200)
df <- data.frame(t = times, delta = rep(1L, 200))

par0 <- params(sys)
sc_fn <- score(sys)
ll_fn <- loglik(sys)

# Decomposed score (what serieshaz computes internally)
score_val <- sc_fn(df, par = par0)

# Independent finite-difference verification
eps <- 1e-6
fd_score <- numeric(length(par0))
for (k in seq_along(par0)) {
    par_p <- par_m <- par0
    par_p[k] <- par0[k] + eps
    par_m[k] <- par0[k] - eps
    fd_score[k] <- (ll_fn(df, par = par_p) - ll_fn(df, par = par_m)) / (2 * eps)
}

result <- data.frame(
    Parameter = c("shape", "scale", "rate", "a", "b"),
    Decomposed = round(score_val, 6),
    FiniteDiff = round(fd_score, 6),
    AbsDiff = formatC(abs(score_val - fd_score), format = "e", digits = 1)
)
print(result, row.names = FALSE)
```

The decomposed score and the finite-difference approximation agree to high
precision across all five parameters, spanning three different distribution
families.

### Score at the MLE

At the maximum likelihood estimate, the score should be approximately zero
(the first-order optimality condition).  We compare the score magnitude at the
true parameters (generally nonzero with finite $n$) versus at the MLE:

```{r score-at-mle}
sys2 <- dfr_dist_series(list(
    dfr_weibull(shape = 2, scale = 100),
    dfr_exponential(0.01)
))

set.seed(42)
df2 <- data.frame(t = sampler(sys2)(300), delta = rep(1L, 300))

sc2 <- score(sys2)
cat("Score at true params:", round(sc2(df2), 4), "\n")

solver <- fit(sys2)
result <- suppressWarnings(solver(df2, par = c(1.5, 80, 0.02)))

cat("MLE:                ", round(coef(result), 4), "\n")
cat("Score at MLE:       ", round(sc2(df2, par = coef(result)), 6), "\n")
```

The score at the MLE is orders of magnitude smaller than at the true parameters,
confirming the optimizer has found the stationary point.  (The residual is due
to finite optimizer tolerance, not a problem with the score computation.)

### Decomposed Hessian

The Hessian also exploits the series structure.  It has a natural block
decomposition into within-component and cross-component terms:

**Within-component block** ($k = l$):
$$\frac{\partial^2 \ell}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_k^\top}
= \sum_{i:\,\delta_i=1} \left[
  \frac{1}{h_{sys}(t_i)} \frac{\partial^2 h_k}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_k^\top}
  - \frac{1}{h_{sys}(t_i)^2}
    \frac{\partial h_k}{\partial \boldsymbol{\theta}_k}
    \frac{\partial h_k^\top}{\partial \boldsymbol{\theta}_k}
\right]
- \sum_i \frac{\partial^2 H_k}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_k^\top}$$

**Cross-component block** ($k \neq l$):
$$\frac{\partial^2 \ell}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_l^\top}
= -\sum_{i:\,\delta_i=1}
  \frac{1}{h_{sys}(t_i)^2}
  \frac{\partial h_k}{\partial \boldsymbol{\theta}_k}
  \frac{\partial h_l^\top}{\partial \boldsymbol{\theta}_l}$$

The cross-component blocks are free --- they reuse the rate Jacobians already
computed for the score.  The cumulative hazard Hessian
$\partial^2 H_k / \partial \boldsymbol{\theta}_k^2$ is extracted analytically
via the all-censored trick on `hess_fn` when available.  The rate Hessian
$\partial^2 h_k / \partial \boldsymbol{\theta}_k^2$ uses `numDeriv::hessian`
per-component per observation.

```{r hessian-demo}
H_fn <- hess_loglik(sys2)
hess_val <- H_fn(df2, par = coef(result))
evals <- round(eigen(hess_val, symmetric = TRUE)$values, 2)
cat("Hessian eigenvalues at MLE:", evals, "\n")
cat("Hessian is symmetric:", all.equal(hess_val, t(hess_val)), "\n")
```

All eigenvalues are negative, confirming the MLE is a local maximum.  (For
non-identifiable models like exponential-only series, one eigenvalue would be
near zero.)

## Special Cases

### Exponential Series = Exponential(sum of rates)

When all components are exponential with rates $\lambda_1, \ldots, \lambda_m$, the system hazard is constant: $h_{sys}(t) = \sum_j \lambda_j$. This is equivalent to a single exponential with rate $\lambda = \sum_j \lambda_j$.

```{r exp-special}
rates <- c(0.1, 0.2, 0.3)
sys <- dfr_dist_series(lapply(rates, dfr_exponential))
equiv <- dfr_exponential(sum(rates))

h_sys <- hazard(sys)
h_eq  <- hazard(equiv)
S_sys <- surv(sys)
S_eq  <- surv(equiv)

for (t0 in c(1, 5, 10, 50)) {
    cat(sprintf("t=%2d: h_sys=%.4f h_eq=%.4f | S_sys=%.6f S_eq=%.6f\n",
                t0, h_sys(t0), h_eq(t0), S_sys(t0), S_eq(t0)))
}
```

### Identical Weibull Components

When all $m$ components are Weibull with the same shape $k$ and scale $\lambda$, the system is also Weibull with shape $k$ and scale $\lambda / m^{1/k}$:

```{r weibull-identical}
m <- 3
shape <- 2
scale <- 100

sys <- dfr_dist_series(replicate(
    m, dfr_weibull(shape = shape, scale = scale), simplify = FALSE
))
equiv <- dfr_weibull(shape = shape, scale = scale / m^(1/shape))

S_sys <- surv(sys)
S_eq  <- surv(equiv)

for (t0 in c(10, 30, 50)) {
    cat(sprintf("t=%2d: S_sys=%.6f S_equiv=%.6f diff=%.2e\n",
                t0, S_sys(t0), S_eq(t0), abs(S_sys(t0) - S_eq(t0))))
}
```

### Single Component (Degenerate Case)

A series system with one component is equivalent to that component:

```{r single-component}
single <- dfr_dist_series(list(dfr_weibull(shape = 2, scale = 100)))
direct <- dfr_weibull(shape = 2, scale = 100)

h1 <- hazard(single)
h2 <- hazard(direct)
S1 <- surv(single)
S2 <- surv(direct)

for (t0 in c(10, 50, 100)) {
    cat(sprintf("t=%3d: h=%.6f/%.6f S=%.6f/%.6f\n",
                t0, h1(t0), h2(t0), S1(t0), S2(t0)))
}
```