---
title: "Fitting many curves using rTPC"
author: "Daniel Padfield"
output: rmarkdown::html_vignette
date: "`r Sys.Date()`"
vignette: >
  %\VignetteIndexEntry{Fitting many curves using rTPC}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
#### A brief example of how to fit many models to multiple TPCs using rTPC, nls.multstart and the tidyverse.

***

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  #tidy.opts=list(width.cutoff=60),
  #tidy=TRUE,
  fig.align = 'center'
)
```

```{r runtimer, include=FALSE}
runstart <- lubridate::now()
```

## Things to consider 

- Go through all the other **Things to consider** sections in the other vignettes.
- Think carefully about how to do model selection on the whole dataset. 
- If you care about specific parameters (e.g. optimum temperature or activation energy) then you might want to pick a single "best" model for the whole dataset.
- If you care about the overall fit and its shape, then you might want to do pick a "best" model on each individual curve.
- You should read resources about model selection to inform your decisions.

***


```{r setup, message=FALSE}
# load packages
library(rTPC)
library(nls.multstart)
library(broom)
library(tidyverse)
```

In the final part of the general pipeline, we demonstrate how multiple models can be fitted to multiple TPCs. Instead of picking all `r length(rTPC::get_model_names())` model formulations to demonstrate this approach, we will use only 2 models in this example: **gaussian_1987()** and **sharpeschoolhigh_1981()**. 

We can demonstrate the fitting of multiple curves by modelling all 60 TPCs from the example dataset of **rTPC** curve from the example dataset **rTPC**. These TPCs are of respiration and photosynthesis of the aquatic algae, *Chlorella vulgaris*. These algae differed in their growth temperature, `growth_temp`, and how long they had been grown at that temperature, `process`, either for ~100 or ~10 generations.

Using a similar approach to `vignette('fit_many_models')`, models can be fitted to each curve using list columns and **purrr::map()** to fit and store multiple models in a data frame.

When fitting lots of models at once, it is useful to know the progress the code as it may take a long time to run.The R package **purrr** now has progress bars built in; you can add a progress bar to the **map()** function using the argument *.progress*. To make this work, we need to only do a single **map()** call, so we will write a function that fits both models that we will then pass to **purrr**. This is how we _strongly_ recommend fitting multiple models using **map()**.

```{r fit_multiple_models, warning=FALSE, message=FALSE}
# load in data
data("chlorella_tpc")
d <- chlorella_tpc

# write function to fit both models
fit_TPCs <- function(d, ...) {
  # get start values and fit gaussian
  start_vals <- rTPC::get_start_vals(
    d$temp,
    d$rate,
    model_name = 'gaussian_1987'
  )

  mod_gaussian <- nls.multstart::nls_multstart(
    rate ~ rTPC::gaussian_1987(temp = temp, rmax, topt, a),
    data = d,
    iter = c(4, 4, 4),
    start_lower = start_vals - 10,
    start_upper = start_vals + 10,
    lower = rTPC::get_lower_lims(d$temp, d$rate, model_name = 'gaussian_1987'),
    upper = rTPC::get_upper_lims(d$temp, d$rate, model_name = 'gaussian_1987'),
    supp_errors = 'Y',
    convergence_count = FALSE,
    ...
  )

  # get start values and fit sharpe schoolfield
  start_vals <- rTPC::get_start_vals(
    d$temp,
    d$rate,
    model_name = 'sharpeschoolhigh_1981'
  )

  # fit model
  mod_ss <- nls.multstart::nls_multstart(
    rate ~
      rTPC::sharpeschoolhigh_1981(temp = temp, r_tref, e, eh, th, tref = 20),
    data = d,
    iter = c(3, 3, 3, 3),
    start_lower = start_vals - 10,
    start_upper = start_vals + 10,
    lower = rTPC::get_lower_lims(
      d$temp,
      d$rate,
      model_name = 'sharpeschoolhigh_1981'
    ),
    upper = rTPC::get_upper_lims(
      d$temp,
      d$rate,
      model_name = 'sharpeschoolhigh_1981'
    ),
    supp_errors = 'Y',
    convergence_count = FALSE,
    ...
  )
  
  return(tibble::tibble(
    gaussian = list(mod_gaussian),
    sharpeschoolhigh = list(mod_ss)
  ))
}

# fit two chosen model formulation in rTPC
d_fits <- nest(d, data = c(temp, rate)) %>%
  mutate(
    mods = map(
      data,
      ~ fit_TPCs(d = .x),
      .progress = TRUE
    )
  ) %>%
  unnest(mods)
```

```{r, echo=FALSE}
cli::cli_text(cli::col_green(strrep("█", 30)), " 100% | ETA:  0s")
```
 


Like previous vignettes, the predictions of each model can be estimated using **broom::augment()**. To do this, we first create a new list column containing high resolution temperature values by taking the `min` and `max` of each curve. Next we stack the models and finally we get the new predictions using the **map2()**, which allows us to use both `fit` and `new_data` list columns. After unnesting the `preds` column, we are then left with high resolution predictions for each curve. As this code covers a lot of steps, each line of the code is commented.

```{r preds}
# create new list column of for high resolution data
d_preds <- mutate(d_fits, new_data = map(data, ~tibble(temp = seq(min(.x$temp), max(.x$temp), length.out = 100)))) %>%
  # get rid of original data column
  select(., -data) %>%
  # stack models into a single column, with an id column for model_name
  pivot_longer(., names_to = 'model_name', values_to = 'fit', c(gaussian,sharpeschoolhigh)) %>%
  # create new list column containing the predictions
  # this uses both fit and new_data list columns
  mutate(preds = map2(fit, new_data, ~augment(.x, newdata = .y))) %>%
  # select only the columns we want to keep
  select(curve_id, growth_temp, process, flux, model_name, preds) %>%
  # unlist the preds list column
  unnest(preds)

glimpse(d_preds)

```

We can then plot the predictions of each curve using **ggplot2**.

```{r plot, fig.height = 12, fig.width=8}
# plot
ggplot(d_preds) +
  geom_line(aes(temp, .fitted, col = model_name)) +
  geom_point(aes(temp, rate), d) +
  facet_wrap(~curve_id, scales = 'free_y', ncol = 6) +
  theme_bw() +
  theme(legend.position = 'none') +
  scale_color_brewer(type = 'qual', palette = 2) +
  labs(x = 'Temperature (ºC)',
       y = 'Metabolic rate',
       title = 'All fitted thermal performance curves',
       subtitle = 'gaussian in green; sharpeschoolhigh in orange')

```

The traits of each thermal performance curve can also easily be calculated.

```{r params}
# stack models and calculate extra params
d_params <- pivot_longer(d_fits, names_to = 'model_name', values_to = 'fit', c(gaussian,sharpeschoolhigh)) %>%
  mutate(params = map(fit, calc_params, .progress = TRUE)) %>%
  select(curve_id, growth_temp, process, flux, model_name, params) %>%
  unnest(params)

glimpse(d_params)
```

```{r tot_time, include=FALSE}
tot_time <- lubridate::as.duration(lubridate::now() - runstart)
```

[Built in `r tot_time`s]{style="opacity: 0.1;font-size: small;"}