---
title: "Simple global models"
author: 
 - name: Leonardo Ramirez-Lopez
   email: ramirez.lopez.leo@gmail.com
date: today
bibliography: resemble.bib
csl: elsevier-harvard.csl  
format:
  html:
    toc: true
    toc-depth: 3
    number-sections: true
    toc-location: left
    code-overflow: wrap
    smooth-scroll: true
    html-math-method: mathjax
knitr:
  opts_chunk:
    class-output: "ansi"
vignette: >
  %\VignetteIndexEntry{5 Simple global models}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{quarto::html}
---

```{r}
#| echo: false
options(cli.num_colors = 1)
Sys.setenv(OMP_NUM_THREADS = 2)
```

:::: {.columns}
::: {.column width="70%"}
> *It is vain to do with more what can be done with less* -- William of Ockham
:::
::: {.column width="30%"}
<img src="logo.png" style="width: 150%; max-width: 150px;">
:::
::::


```{r}
#| message: false
#| echo: false
library(resemble)
```

# Introduction

The `model()` function provides a simple interface for building global calibration models using partial least squares (PLS) or Gaussian process regression (GPR). Unlike `mbl()`, which builds local models for each prediction, `model()` fits a single model to the entire reference set.

# Fitting methods

Regression methods are specified via constructor functions. The available methods are:

| Constructor | Method | Description | Reference |
|-------------|--------|-------------|-----------|
| `fit_pls(method = "pls")` | PLS | Non-linear iterative partial least squares (NIPALS) | @wold1975soft |
| `fit_pls(method = "simpls")` | SIMPLS | Straightforward implementation of PLS | @de1993simpls |
| `fit_pls(method = "mpls")` | Modified PLS | Uses correlation instead of covariance for weights | @shenk1991populations |
| `fit_wapls(method = "mpls")` | Weighted average PLS | Predictions are a weighted average of multiple models from multiple PLS factors. __NOT YET AVAIABLE FOR THE__ `model()` __FUNCTION__ | @shenk1997investigation |
| `fit_gpr()` | GPR | Gaussian process regression with dot product kernel | @rasmussen2006gaussian |

```{r}
#| label: fit-methods
# PLS with 15 components
fit_pls(ncomp = 15)

# SIMPLS with scaling
fit_pls(ncomp = 15, method = "simpls", scale = TRUE)

# mPLS with scaling
fit_pls(ncomp = 15, method = "mpls")

# GPR with default noise variance
fit_gpr()
```

# Example

## Data preparation

```{r}
#| label: data-prep
#| message: false
library(prospectr)

wavs <- as.numeric(colnames(NIRsoil$spc))

# Preprocess: detrend + first derivative
NIRsoil$spc_pr <- savitzkyGolay(
  detrend(NIRsoil$spc, wav = wavs),
  m = 1, p = 1, w = 7
)

# Split data
train_x <- NIRsoil$spc_pr[NIRsoil$train == 1, ]
train_y <- NIRsoil$Ciso[NIRsoil$train == 1]
test_x  <- NIRsoil$spc_pr[NIRsoil$train == 0, ]
test_y  <- NIRsoil$Ciso[NIRsoil$train == 0]

# Remove missing values
ok_train <- !is.na(train_y)
ok_test  <- !is.na(test_y)
train_x <- train_x[ok_train, ]
train_y <- train_y[ok_train]
test_x  <- test_x[ok_test, ]
test_y  <- test_y[ok_test]
```

## Fitting a PLS model

```{r}
#| label: fit-pls

set.seed(1124) # guarantee same CV splits for all methods
pls_mod <- model(
 Xr = train_x,
 Yr = train_y,
 fit_method = fit_pls(ncomp = 12, method = "mpls", scale = TRUE),
 control = model_control(validation_type = "lgo", number = 10)
)

pls_mod
```

## Fitting a GPR model

```{r}
#| label: fit-gpr
set.seed(1124) # guarantee same CV splits for all methods
gpr_mod <- model(
 Xr = train_x,
 Yr = train_y,
 fit_method = fit_gpr(noise_variance = 0.5),
 control = model_control(validation_type = "lgo", number = 10)
)

gpr_mod
```

## Prediction

```{r}
#| label: predict
# Predict using optimal number of components (from CV)
pls_pred <- predict(pls_mod, newdata = test_x)

# Predict with GPR
gpr_pred <- predict(gpr_mod, newdata = test_x)

# Compare predictions
data.frame(
  observed = test_y,
  pls = pls_pred[, which(pls_mod$cv_results$optimal)],
  gpr = as.vector(gpr_pred)
) |> head(10)
```

## Validation statistics

```{r}
#| label: validation
# Function to compute stats
eval_pred <- function(obs, pred) {
  data.frame(
    rmse = sqrt(mean((obs - pred)^2)),
    r2 = cor(obs, pred)^2
  )
}
```

```{r}
## PLS evaluation
eval_pred(test_y, pls_pred[, which(pls_mod$cv_results$optimal)])
```

```{r}
## GPR evaluation
eval_pred(test_y, gpr_pred)
```

# References {-}
