---
title: "Metafrontier Methods: Theory and Computation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Metafrontier Methods: Theory and Computation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

```{r setup}
library(metafrontier)
```

This vignette provides a detailed exposition of the metafrontier methods
implemented in the package, linking the econometric theory to the
computational approach used at each step.


## 1. The metafrontier framework

### 1.1 Group-specific stochastic frontiers

Consider $J$ groups of firms, where group $j$ contains $n_j$ firms. For
group $j$, the stochastic frontier model is:

$$\ln y_{ij} = x_{ij}'\beta_j + v_{ij} - u_{ij}, \quad i = 1, \ldots, n_j$$

where $y_{ij}$ is the output of firm $i$ in group $j$, $x_{ij}$ is a
vector of (logged) inputs including a constant, $\beta_j$ is the
group-specific parameter vector, $v_{ij} \sim N(0, \sigma_{v,j}^2)$ is
symmetric noise, and $u_{ij} \ge 0$ is one-sided inefficiency.

The group-specific technical efficiency is:

$$TE_{ij} = \exp(-u_{ij}) \in (0, 1]$$

estimated via the Jondrow et al. (1982) conditional mean estimator.


### 1.2 The metafrontier

The metafrontier is defined as a function $f^*(x) = \exp(x'\beta^*)$ such
that:

$$x'\beta^* \ge x'\beta_j \quad \text{for all } x \text{ and all } j$$

That is, the metafrontier weakly dominates all group frontiers. It
represents the production technology available to firms with unrestricted
access to all technologies.


### 1.3 The efficiency decomposition

For each firm, efficiency relative to the metafrontier decomposes as:

$$TE^*_{ij} = TE_{ij} \times TGR_{ij}$$

where the **technology gap ratio** is:

$$TGR_{ij} = \frac{\exp(x_{ij}'\beta_j)}{\exp(x_{ij}'\beta^*)} = \exp\left(x_{ij}'(\beta_j - \beta^*)\right) \in (0, 1]$$

A $TGR$ of 1 means the group frontier coincides with the metafrontier at
that input mix; values below 1 indicate a technology gap.


## 2. Deterministic metafrontier (Battese, Rao, and O'Donnell, 2004)

### 2.1 Estimation

After obtaining group estimates $\hat\beta_j$ in Stage 1, the metafrontier
parameters $\hat\beta^*$ are estimated by solving:

$$\min_{\beta^*} \sum_{j=1}^{J} \sum_{i=1}^{n_j} \left(x_{ij}'\beta^* - x_{ij}'\hat\beta_j\right)^2$$
$$\text{subject to: } x_{ij}'\beta^* \ge x_{ij}'\hat\beta_j \quad \forall\, i, j$$

This is a convex quadratic program. The `metafrontier` package solves it
using `constrOptim()` from base R, which implements an adaptive barrier
algorithm for linearly constrained optimisation.

### 2.2 Properties

- The deterministic metafrontier is a **point estimate** with no associated
  standard errors (since Stage 2 is a deterministic optimisation, not a
  statistical model).
- The enveloping constraints guarantee $TGR_{ij} \le 1$ for all
  observations in the estimation sample.
- The metafrontier coefficients depend on the observed input range; they
  are global only within the sample support.

### 2.3 Example

```{r det-example}
sim <- simulate_metafrontier(
  n_groups = 2, n_per_group = 300,
  tech_gap = c(0, 0.4),
  sigma_u = c(0.2, 0.35),
  seed = 123
)

fit_det <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  meta_type = "deterministic"
)

# Metafrontier coefficients (no standard errors)
coef(fit_det, which = "meta")

# Group coefficients for comparison
coef(fit_det, which = "group")
```

The metafrontier intercept should be at least as large as all group
intercepts:

```{r verify-envelop}
meta_b0 <- coef(fit_det, which = "meta")[1]
group_b0 <- sapply(coef(fit_det, which = "group"), `[`, 1)
meta_b0 >= group_b0
```


## 3. Stochastic metafrontier (Huang, Huang, and Liu, 2014)

### 3.1 Estimation

Huang, Huang, and Liu (2014) propose treating the technology gap as a
stochastic variable. In Stage 2, the fitted group frontier values become
the dependent variable in a second SFA:

$$\ln \hat{f}(x_{ij}; \hat\beta_j) = x_{ij}'\beta^* + v^*_{ij} - u^*_{ij}$$

where $u^*_{ij} \ge 0$ captures the technology gap and $v^*_{ij}$ is a
noise term. This is estimated via MLE, yielding:

- Point estimates $\hat\beta^*$ with standard errors
- A variance-covariance matrix for inference
- A distributional $\widehat{TGR}$ with associated uncertainty

### 3.2 Advantages over the deterministic approach

1. **Inference**: Standard errors, confidence intervals, and hypothesis
   tests on metafrontier parameters are available.
2. **Robustness**: The noise term $v^*_{ij}$ absorbs sampling variation
   from Stage 1, preventing overfitting.
3. **Consistency**: The metafrontier need not strictly envelop all group
   frontiers in finite samples, which can be more realistic.

### 3.3 Caveat: the generated-regressor problem

The stochastic metafrontier is a two-stage estimator. In Stage 2, the
dependent variable $\ln \hat{f}(x_{ij}; \hat\beta_j)$ is itself an
estimate from Stage 1 -- it is a *generated regressor* (Murphy and
Topel, 1985). The standard errors reported by the package are derived
from the Stage 2 Hessian alone and **do not account for the sampling
uncertainty in the Stage 1 group frontier estimates**.

As a result:

- Standard errors, confidence intervals (`confint()`), and hypothesis
  tests may be **understated** (confidence intervals narrower than their
  nominal coverage warrants).
- This issue does **not** affect point estimates of $\hat\beta^*$ or
  efficiency scores, only inference.
- The bias is negligible when group sample sizes are large relative to
  the number of frontier parameters.
- The Murphy--Topel (1985) correction is available via
  `vcov(fit, correction = "murphy-topel")` and
  `confint(fit, correction = "murphy-topel")`. This adjusts the Stage 2
  variance-covariance matrix to account for Stage 1 estimation
  uncertainty.

### 3.4 Example

```{r sto-example}
fit_sto <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  meta_type = "stochastic"
)

summary(fit_sto)
```

The stochastic metafrontier provides standard errors:

```{r sto-inference}
# Variance-covariance matrix
vcov(fit_sto)

# Log-likelihood of the metafrontier model
logLik(fit_sto)
```

### 3.5 A note on TGR values

Under the stochastic metafrontier, TGR values are not constrained to be
$\le 1$ in finite samples, since the metafrontier need not strictly
envelop all group frontiers. Values slightly above 1 can occur and are
consistent with the stochastic framework.

```{r tgr-range}
tgr_vals <- efficiencies(fit_sto, type = "tgr")
summary(tgr_vals)
```


## 4. DEA-based metafrontier

### 4.1 Approach

For a nonparametric metafrontier:

1. Compute group-specific DEA efficiencies $\hat\theta_{ij}^{group}$ using
   only observations from group $j$.
2. Compute pooled DEA efficiencies $\hat\theta_{ij}^{pool}$ using all
   observations.
3. The TGR is: $TGR_{ij} = \hat\theta_{ij}^{pool} / \hat\theta_{ij}^{group}$.

The package solves the DEA linear programs using `lpSolveAPI`.

### 4.2 Returns to scale

The `rts` argument controls the technology assumption:

- `"crs"` (constant returns to scale): the standard CCR model
- `"vrs"` (variable returns to scale): the BCC model
- `"drs"` / `"irs"` (decreasing / increasing returns)

```{r dea-example}
# CRS metafrontier
fit_crs <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  method = "dea",
  rts = "crs"
)

# VRS metafrontier
fit_vrs <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  method = "dea",
  rts = "vrs"
)

# Compare mean TGR
cbind(
  CRS = tapply(fit_crs$tgr, fit_crs$group_vec, mean),
  VRS = tapply(fit_vrs$tgr, fit_vrs$group_vec, mean)
)
```


## 5. Comparing methods

The choice between deterministic, stochastic, and DEA metafrontiers
involves trade-offs:

| Feature | Deterministic SFA | Stochastic SFA | DEA |
|---|---|---|---|
| Functional form | Parametric | Parametric | Nonparametric |
| Noise handling | Stage 1 only | Both stages | None |
| Inference on TGR | No | Yes | No |
| TGR $\le$ 1 guaranteed | Yes | No | Yes |
| Small sample performance | Moderate | Moderate | Poor |
| References | BRO (2004) | HHL (2014) | ORB (2008) |

```{r compare-methods}
# Compare TGR estimates across methods
tgr_det <- tapply(fit_det$tgr, fit_det$group_vec, mean)
tgr_sto <- tapply(fit_sto$tgr, fit_sto$group_vec, mean)
tgr_dea <- tapply(fit_crs$tgr, fit_crs$group_vec, mean)
true_tgr <- tapply(sim$data$true_tgr, sim$data$group, mean)

comparison <- data.frame(
  True = true_tgr,
  Deterministic = tgr_det,
  Stochastic = tgr_sto,
  DEA_CRS = tgr_dea
)
round(comparison, 4)
```


## 6. Choosing a method: practical guidance

Selecting between deterministic SFA, stochastic SFA, and DEA
metafrontiers depends on the research question, data characteristics,
and inferential requirements.

**Use the deterministic SFA metafrontier (BRO 2004) when:**

- You need guaranteed $TGR \le 1$ (the metafrontier strictly envelops
  all group frontiers).
- Inference on metafrontier parameters is not required.
- The goal is descriptive decomposition of efficiency into within-group
  and between-group components.

**Use the stochastic SFA metafrontier (HHL 2014) when:**

- You need standard errors, confidence intervals, or hypothesis tests
  on the metafrontier parameters.
- You want a distributional framework for the technology gap ratio.
- Sample sizes per group are moderate to large (at least 50--100
  observations per group is recommended).
- You are comfortable with the generated-regressor caveat (Section 3.3).

**Use the DEA metafrontier when:**

- You prefer a nonparametric approach with no functional form
  assumptions.
- Multiple inputs and/or multiple outputs are involved.
- Sample sizes are large enough to support DEA (a rough guideline:
  $n \ge 3 \times (m + s)$ per group, where $m$ is the number of
  inputs and $s$ the number of outputs).
- The returns-to-scale assumption (`rts`) is well-justified by the
  application context.

In many applied studies, it is informative to estimate multiple methods
and compare TGR estimates for robustness (as shown in Section 5).


## 7. Testing for technology heterogeneity

Before estimating a metafrontier, it is useful to test whether separate
group frontiers are actually needed. The **poolability test** uses a
likelihood ratio statistic:

$$LR = -2\left[LL_{pooled} - \sum_{j=1}^{J} LL_j\right] \sim \chi^2_{df}$$

where $LL_{pooled}$ is the log-likelihood of a single frontier estimated
on the pooled sample and $LL_j$ are the group-specific log-likelihoods.

```{r poolability}
poolability_test(fit_det)
```

A significant test (p < 0.05) confirms that the groups operate under
different technologies and the metafrontier decomposition is warranted.


## 8. Simulation for Monte Carlo studies

The `simulate_metafrontier()` function generates data from a known DGP,
enabling parameter recovery studies:

```{r monte-carlo, eval=FALSE}
# Monte Carlo: check parameter recovery over 100 replications
set.seed(1)
n_rep <- 100
beta_hat <- matrix(NA, n_rep, 3)

for (r in seq_len(n_rep)) {
  sim_r <- simulate_metafrontier(
    n_groups = 2, n_per_group = 200,
    tech_gap = c(0, 0.3),
    sigma_u = c(0.2, 0.3),
    sigma_v = 0.15
  )
  fit_r <- metafrontier(
    log_y ~ log_x1 + log_x2,
    data = sim_r$data,
    group = "group",
    meta_type = "deterministic"
  )
  beta_hat[r, ] <- coef(fit_r, which = "meta")
}

# Bias
true_beta <- c(1.0, 0.5, 0.3)
colMeans(beta_hat) - true_beta
```

The `simulate_metafrontier()` function supports:

- Arbitrary number of groups (`n_groups`)
- Unequal group sizes (`n_per_group` as a vector)
- Custom metafrontier coefficients (`beta_meta`)
- Group-specific technology gaps (`tech_gap`)
- Group-specific inefficiency dispersion (`sigma_u`)
- Reproducible results via `seed`


## References

- Battese, G.E., Rao, D.S.P. and O'Donnell, C.J. (2004). A metafrontier
  production function for estimation of technical efficiencies and
  technology gaps for firms operating under different technologies.
  *Journal of Productivity Analysis*, 21(1), 91--103.

- Huang, C.J., Huang, T.-H. and Liu, N.-H. (2014). A new approach to
  estimating the metafrontier production function based on a stochastic
  frontier framework. *Journal of Productivity Analysis*, 42(3), 241--254.

- Jondrow, J., Lovell, C.A.K., Materov, I.S. and Schmidt, P. (1982). On
  the estimation of technical inefficiency in the stochastic frontier
  production function model. *Journal of Econometrics*, 19(2--3), 233--238.

- O'Donnell, C.J., Rao, D.S.P. and Battese, G.E. (2008). Metafrontier
  frameworks for the study of firm-level efficiencies and technology
  ratios. *Empirical Economics*, 34(2), 231--255.