---
title: "2. Parallelize Computation of Indices"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{2. Parallelize Computation of Indices}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

<!-- 
# Pre-render with 
knitr::knit("vignettes/_fundiversity_1-parallel.Rmd", output = "vignettes/fundiversity_1-parallel.Rmd")
-->



Note: This vignette presents some performance tests ran between non-parallel and parallel versions of `fundiversity` functions. Note that to avoid the dependency on other packages, this vignette is [**pre-computed**](https://ropensci.org/technotes/2019/12/08/precompute-vignettes/).

Within `fundiversity` the computation of most indices can be parallelized using the `future` package. The indices that currently support parallelization are: **FRic**, **FDis**, **FDiv**, and **FEve**. The goal of this vignette is to explain how to toggle and use parallelization in `fundiversity`.

The `future` package provides a simple and general framework to allow asynchronous computation depending on the resources available for the user. The [first vignette of `future`](https://cran.r-project.org/package=future) gives a general overview of all its features. The main idea being that the user should write the code once and that it would run seamlessly sequentially, or in parallel on a single computer, or on a cluster, or distributed over several computers. `fundiversity` can thus run on all these different backends following the user's choice.


```r
library("fundiversity")

data("traits_birds", package = "fundiversity")
data("site_sp_birds", package = "fundiversity")
```

# Running code in parallel

By default the `fundiversity` code will run sequentially on a single core. To trigger parallelization the user needs to define a `future::plan()` object with a parallel backend such as `future::multisession` to split the execution across multiple R sessions.


```r
# Sequential execution
fric1 <- fd_fric(traits_birds)

# Parallel execution
future::plan(future::multisession)  # Plan definition
fric2 <- fd_fric(traits_birds)  # The code resolve in similar fashion

identical(fric1, fric2)
#> [1] TRUE
```

Within the `future::multisession` backend you can specify the number of cores on which the function should be parallelized over through the argument `workers`, you can change it in the `future::plan()` call:


```r
future::plan(future::multisession, workers = 2)  # Only 2 cores are used
fric3 <- fd_fric(traits_birds)

identical(fric3, fric2)
#> [1] TRUE
```

To learn more about the different backends available and the related arguments needed, please refer to the documentation of `future::plan()` and the [overview vignette of `future`](https://cran.r-project.org/package=future).


# Performance comparison

We can now compare the difference in performance to see the performance gain thanks to parallelization:


```r
future::plan(future::sequential)
non_parallel_bench <- microbenchmark::microbenchmark(
  non_parallel = {
    fd_fric(traits_birds)
  },
  times = 20
)

future::plan(future::multisession)
parallel_bench <- microbenchmark::microbenchmark(
  parallel = {
    fd_fric(traits_birds)
  },
  times = 20
)

rbind(non_parallel_bench, parallel_bench)
#> Unit: milliseconds
#>          expr       min         lq       mean     median         uq       max neval cld
#>  non_parallel  8.756378   8.892243   9.841818   9.072241   9.218554   23.9519    20  a 
#>      parallel 56.374332 167.680385 218.073077 172.888927 185.670312 1247.8534    20   b
```

The non parallelized code runs faster than the parallelized one! Indeed, the parallelization in `fundiversity` parallelize the computation across different sites. So parallelization should be used when you have many sites on which you want to compute similar indices.


```r
# Function to make a bigger site-sp dataset
make_more_sites <- function(n) {
  site_sp <- do.call(rbind, replicate(n, site_sp_birds, simplify = FALSE))
  rownames(site_sp) <- paste0("s", seq_len(nrow(site_sp)))

  site_sp
}
```

For example with a dataset 5000 times bigger:


```r
bigger_site <- make_more_sites(5000)

microbenchmark::microbenchmark(
  seq = { 
    future::plan(future::sequential) 
    fd_fric(traits_birds, bigger_site) 
  },
  multisession = { 
    future::plan(future::multisession, workers = 4)
    fd_fric(traits_birds, bigger_site) 
  },
  multicore = { 
    future::plan(future::multicore, workers = 4) 
    fd_fric(traits_birds, bigger_site) 
  }, times = 20
)
#> Warning in supportsMulticoreAndRStudio(...): [ONE-TIME WARNING] Forked processing ('multicore') is not supported when running R from RStudio
#> because it is considered unstable. For more details, how to control forked processing or not, and how to silence this warning in future R
#> sessions, see ?parallelly::supportsMulticore
#> Unit: seconds
#>          expr      min       lq     mean   median       uq      max neval cld
#>           seq 15.58688 15.67587 15.97552 15.97047 16.24568 16.54392    20  a 
#>  multisession 21.17851 21.75313 22.02965 21.88691 22.26971 23.50062    20   b
#>     multicore 15.53872 15.75567 16.06103 16.01595 16.35790 16.98102    20  a
```

<details>
<summary>Session info of the machine on which the benchmark was ran and time it took to run</summary>


```
#>  seconds needed to generate this document: 1095.27 sec elapsed
#> ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       Ubuntu 20.04.5 LTS
#>  system   x86_64, linux-gnu
#>  ui       RStudio
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Etc/UTC
#>  date     2022-11-15
#>  rstudio  2022.02.0+443 Prairie Trillium (server)
#>  pandoc   2.17.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  !  package        * version    date (UTC) lib source
#>  P  abind            1.4-5      2016-07-21 [3] CRAN (R 4.2.0)
#>     assertthat       0.2.1      2019-03-21 [3] CRAN (R 4.1.3)
#>     cachem           1.0.6      2021-08-19 [3] CRAN (R 4.1.3)
#>  VP cli              3.4.0      2022-09-23 [?] CRAN (R 4.2.1) (on disk 3.4.1)
#>     codetools        0.2-18     2020-11-04 [5] CRAN (R 4.0.3)
#>     colorspace       2.0-3      2022-02-21 [1] CRAN (R 4.2.0)
#>     crayon           1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
#>     DBI              1.1.2      2021-12-20 [3] CRAN (R 4.1.3)
#>  P  digest           0.6.29     2021-12-01 [3] CRAN (R 4.2.0)
#>     dplyr          * 1.0.10     2022-09-01 [1] CRAN (R 4.2.1)
#>     ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>     evaluate         0.18       2022-11-07 [1] CRAN (R 4.2.1)
#>     fansi            1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  P  fastmap          1.1.0      2021-01-25 [3] CRAN (R 4.2.1)
#>     fundiversity   * 0.2.1.9000 2022-04-12 [3] Github (bisaloo/fundiversity@87652ba)
#>  VP future           1.26.1     2022-09-02 [3] CRAN (R 4.2.1) (on disk 1.28.0)
#>  VP future.apply     1.9.0      2022-11-05 [3] CRAN (R 4.2.1) (on disk 1.10.0)
#>     generics         0.1.2      2022-01-31 [1] CRAN (R 4.2.0)
#>  P  geometry         0.4.6      2022-04-18 [3] CRAN (R 4.2.0)
#>     ggplot2        * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
#>  VP globals          0.15.0     2022-08-28 [3] CRAN (R 4.2.1) (on disk 0.16.1)
#>     glue             1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>     gtable           0.3.0      2019-03-25 [1] CRAN (R 4.2.0)
#>     htmltools        0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>     knitr            1.40       2022-08-24 [1] CRAN (R 4.2.1)
#>     lattice          0.20-45    2021-09-22 [3] CRAN (R 4.1.3)
#>     lifecycle        1.0.3      2022-10-07 [1] CRAN (R 4.2.1)
#>  P  listenv          0.8.0      2019-12-05 [3] CRAN (R 4.2.1)
#>  P  magic            1.6-0      2022-02-09 [3] CRAN (R 4.2.0)
#>     magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>     MASS             7.3-58.1   2022-08-03 [3] CRAN (R 4.2.1)
#>     Matrix           1.4-1      2022-03-23 [3] CRAN (R 4.1.3)
#>     memoise          2.0.1      2021-11-26 [3] CRAN (R 4.1.3)
#>     microbenchmark   1.4.9      2021-11-09 [3] CRAN (R 4.1.3)
#>     multcomp         1.4-19     2022-04-26 [1] CRAN (R 4.2.0)
#>     munsell          0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>     mvtnorm          1.1-3      2021-10-08 [1] CRAN (R 4.2.0)
#>  VP parallelly       1.31.1     2022-07-21 [3] CRAN (R 4.2.1) (on disk 1.32.1)
#>     pillar           1.7.0      2022-02-01 [1] CRAN (R 4.2.0)
#>     pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>     R6               2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  P  Rcpp             1.0.8.3    2022-03-17 [3] CRAN (R 4.2.0)
#>     rlang            1.0.6      2022-09-24 [1] CRAN (R 4.2.1)
#>     rmarkdown        2.13       2022-03-10 [3] CRAN (R 4.1.3)
#>     rstudioapi       0.14       2022-08-22 [1] CRAN (R 4.2.1)
#>     sandwich         3.0-2      2022-06-15 [1] CRAN (R 4.2.0)
#>     scales           1.2.0      2022-04-13 [1] CRAN (R 4.2.0)
#>     sessioninfo      1.2.2      2021-12-06 [3] CRAN (R 4.1.3)
#>     stringi          1.7.6      2021-11-29 [1] CRAN (R 4.2.0)
#>     stringr          1.4.0      2019-02-10 [1] CRAN (R 4.2.0)
#>     survival         3.3-1      2022-03-03 [3] CRAN (R 4.1.3)
#>     TH.data          1.1-1      2022-04-26 [1] CRAN (R 4.2.0)
#>     tibble           3.1.7      2022-05-03 [1] CRAN (R 4.2.0)
#>     tictoc           1.0.1      2021-04-19 [3] CRAN (R 4.1.3)
#>     tidyselect       1.2.0      2022-10-10 [1] CRAN (R 4.2.1)
#>     utf8             1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>     vctrs            0.5.0      2022-10-22 [1] CRAN (R 4.2.1)
#>     withr            2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>     xfun             0.34       2022-10-18 [1] CRAN (R 4.2.1)
#>     yaml             2.3.6      2022-10-18 [1] CRAN (R 4.2.1)
#>     zoo              1.8-10     2022-04-15 [1] CRAN (R 4.2.0)
#> 
#>  [1] /home/ke76dimu/R-library/4.2
#>  [2] /usr/local/lib/R/site-library
#>  [3] /data/library/4.1
#>  [4] /usr/lib/R/site-library
#>  [5] /usr/lib/R/library
#> 
#>  V ── Loaded and on-disk version mismatch.
#>  P ── Loaded and on-disk path mismatch.
#> 
#> ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
```

</details>
