---
title: "Advanced Workflows for High-Level Users"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Advanced Workflows for High-Level Users}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
library(betaregscale)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 10,
  fig.height = 5,
  fig.align = "center",
  warning = FALSE,
  message = FALSE,
  digits = 4
)
set.seed(2026)

kbl10 <- function(x, n = 10, digits = 4, ...) {
  knitr::kable(utils::head(as.data.frame(x), n), digits = digits, align = "c", ...)
}
```

## Audience and scope

This vignette is designed for analysts who are already familiar with beta regression and require a production-oriented workflow featuring:

1. reproducible model selection;
2. inferential and predictive diagnostics;
3. mixed-effects escalation (`brs` -> `brsmm`) with Likelihood Ratio (LR) comparisons;
4. high-signal reporting tables for technical decision-making.

```{r library}
library(betaregscale)
```

## 1) Reproducible simulation and data checks

```{r data-sim}
n <- 260
d <- data.frame(
  x1 = rnorm(n),
  x2 = rnorm(n),
  z1 = rnorm(n)
)

sim <- brs_sim(
  formula = ~ x1 + x2 | z1,
  data = d,
  beta = c(0.15, 0.55, -0.30),
  zeta = c(-0.10, 0.35),
  link = "logit",
  link_phi = "logit",
  ncuts = 100,
  repar = 2
)

kbl10(head(sim, 10))
kbl10(
  data.frame(
    n = nrow(sim),
    exact = sum(sim$delta == 0),
    left = sum(sim$delta == 1),
    right = sum(sim$delta == 2),
    interval = sum(sim$delta == 3)
  ),
  digits = 4
)
```

## 2) Fixed-effects candidate set and model ranking

```{r fit-candidates}
fit_logit <- brs(y ~ x1 + x2 | z1, data = sim, link = "logit", repar = 2)
fit_probit <- brs(y ~ x1 + x2 | z1, data = sim, link = "probit", repar = 2)
fit_cauchit <- brs(y ~ x1 + x2 | z1, data = sim, link = "cauchit", repar = 2)

tab_rank <- brs_table(
  logit = fit_logit,
  probit = fit_probit,
  cauchit = fit_cauchit
)
kbl10(tab_rank)
```

## 3) Inference stack: Wald + bootstrap + AME


```{r inference-stack}
kbl10(brs_est(fit_logit))
kbl10(confint(fit_logit))

boot_tab <- brs_bootstrap(fit_logit, R = 80, level = 0.95)
kbl10(head(boot_tab, 10))

set.seed(2026) # For marginal effects simulation
ame_mu <- brs_marginaleffects(
  fit_logit,
  model = "mean",
  type = "response",
  interval = TRUE,
  n_sim = 120
)
kbl10(ame_mu)
```

```{r inference-visual}
if (requireNamespace("ggplot2", quietly = TRUE)) {
  boot_tab_bca <- brs_bootstrap(
    fit_logit,
    R = 120,
    level = 0.95,
    ci_type = "bca",
    keep_draws = TRUE
  )
  autoplot.brs_bootstrap(
    boot_tab_bca,
    type = "ci_forest",
    title = "Bootstrap (BCa) vs Wald intervals"
  )

  set.seed(2026) # For marginal effects simulation
  ame_mu_draws <- brs_marginaleffects(
    fit_logit,
    model = "mean",
    type = "response",
    interval = TRUE,
    n_sim = 160,
    keep_draws = TRUE,
  )
  autoplot.brs_marginaleffects(ame_mu_draws, type = "forest")
}
```

## 4) Prediction layer on analyst scale

```{r prediction-layer}
score_prob <- brs_predict_scoreprob(fit_logit, scores = 0:10)
kbl10(score_prob[1:8, 1:7])
```

## 5) Out-of-sample validation


```{r cv-layer}
set.seed(2026) # For cross-validation reproducibility
cv_tab <- brs_cv(
  y ~ x1 + x2 | z1,
  data = sim,
  k = 5,
  repeats = 5,
  repar = 2
)
kbl10(head(cv_tab, 10))
```

## 6) Escalation to mixed models

### 6.1 Simulate clustered data with random intercept + slope

```{r mixed-data}
g <- 10
ni <- 120
id <- factor(rep(seq_len(g), each = ni))
n_mm <- length(id)
x1 <- rnorm(n_mm)
x2 <- rbinom(n_mm, size = 1, prob = 1 / 2)

b0 <- rnorm(g, sd = 0.40)
b1 <- rnorm(g, sd = 0.22)

eta_mu <- 0.20 + 0.65 * x1 - 0.30 * x2 + b0[id] + b1[id] * x1
eta_phi <- rep(-0.20, n_mm)

mu <- plogis(eta_mu)
phi <- plogis(eta_phi)
shp <- brs_repar(mu = mu, phi = phi, repar = 2)
y <- round(stats::rbeta(n_mm, shp$shape1, shp$shape2) * 100)

dmm <- data.frame(y = y, id = id, x1 = x1, x2 = x2)
kbl10(head(dmm, 10))
```

### 6.2 Fit evolutionary sequence

```{r mixed-fit}
fit_brs <- brs(y ~ x1 + x2, data = dmm, repar = 2)
fit_ri <- brsmm(y ~ x1 + x2, random = ~ 1 | id, data = dmm, repar = 2)
fit_rs <- brsmm(y ~ x1 + x2, random = ~ 1 + x1 | id, data = dmm, repar = 2)

tab_lr <- anova(fit_brs, fit_ri, fit_rs, test = "Chisq")
kbl10(data.frame(model = rownames(tab_lr), tab_lr, row.names = NULL))
```

### 6.3 Model choice by LLR/LRT (ANOVA)

The `anova()` methods provide a practical Likelihood Ratio Test (LLR) workflow:

* `M0 = brs` (no random effects);
* `M1 = brsmm` with random intercept (`~ 1 | id`);
* `M2 = brsmm` with random intercept + slope (`~ 1 + x1 | id`).


In nested comparisons, the test statistic is:
$$LR=2\{\ell(\hat\theta_{\text{complex}})-\ell(\hat\theta_{\text{simple}})\}$$

For the first step (`M0 -> M1`), the null hypothesis involves variance components located at the boundary of the parameter space ($\sigma_b^2=0$); therefore, p-values should be interpreted with caution. 
For the `M1 -> M2` step, the chi-square approximation is robust and often used as a practical decision aid.

```{r llr-decision}
tab_lr_df <- data.frame(model = rownames(tab_lr), tab_lr, row.names = NULL)
tab_lr_df$decision <- c(
  "baseline",
  ifelse(is.na(tab_lr_df$`Pr(>Chisq)`[2]), "inspect AIC/BIC + diagnostics",
    ifelse(tab_lr_df$`Pr(>Chisq)`[2] < 0.05, "prefer M1 over M0", "prefer M0 (parsimony)")
  ),
  ifelse(is.na(tab_lr_df$`Pr(>Chisq)`[3]), "inspect AIC/BIC + diagnostics",
    ifelse(tab_lr_df$`Pr(>Chisq)`[3] < 0.05, "prefer M2 over M1", "prefer M1 (parsimony)")
  )
)
kbl10(tab_lr_df)
```

## 7) Random-effects study (numeric + visual)

```{r ranef-study}
rs <- brsmm_re_study(fit_rs)
print(rs)
kbl10(rs$summary)
kbl10(rs$D)
kbl10(rs$Corr)
```


```{r ranef-plots}
if (requireNamespace("ggplot2", quietly = TRUE)) {
  autoplot.brsmm(fit_rs, type = "ranef_caterpillar")
  autoplot.brsmm(fit_rs, type = "ranef_density")
  autoplot.brsmm(fit_rs, type = "ranef_pairs")
  autoplot.brsmm(fit_rs, type = "ranef_qq")
}
```

## 8) Practical decision checklist

* Start with `brs` candidates (`link`, `repar`) and rank them using `brs_table()`.
* Add bootstrap analysis and Average Marginal Effects (AME) before escalating complexity.
* Use `brs_cv()` for out-of-sample stability checks.
* Escalate to `brsmm` only when LR/AIC/BIC metrics and diagnostics clearly support it.
* For random slopes, always inspect `brsmm_re_study()` and the associated random-effects plots.

## References

Ferrari, S. L. P. and Cribari-Neto, F. (2004). Beta regression for modelling
rates and proportions. *Journal of Applied Statistics*, 31(7), 799-815.
DOI: 10.1080/0266476042000214501. Validated online via:
<https://doi.org/10.1080/0266476042000214501>.

Smithson, M., and Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood
regression with beta-distributed dependent variables. *Psychological Methods*,
11(1), 54-71. DOI: 10.1037/1082-989X.11.1.54. Validated online via:
<https://doi.org/10.1037/1082-989X.11.1.54>.

Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference
for latent Gaussian models by using integrated nested Laplace approximations.
*Journal of the Royal Statistical Society: Series B*, 71(2), 319-392.
DOI: 10.1111/j.1467-9868.2008.00700.x. Validated online via:
<https://doi.org/10.1111/j.1467-9868.2008.00700.x>.