---
title: "6. Selecting the optimal bin width"
output:
  rmarkdown::html_vignette:
    keep_md: true
vignette: >
  %\VignetteIndexEntry{6. Selecting the optimal bin width}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE
)
```

```{r setup}
library(quollr)
library(dplyr)
library(ggplot2)

set.seed(20240110)
```

We demonstrate how to identify the optimal bin width for hexagonal binning in the 2-D embedding space. Selecting an appropriate bin width is crucial for balancing model complexity and prediction accuracy when comparing structures between high-dimensional data and their 2-D layout.

We begin by computing model errors across a range of bin width values using the `gen_diffbin1_errors()` function. This function fits models for multiple bin widths and returns hexbin error (HBE) values for each configuration.

```{r}

error_df_all <- gen_diffbin1_errors(highd_data = scurve, 
                                    nldr_data = scurve_umap)

error_df_all <- error_df_all |>
  mutate(a1 = round(a1, 2)) |>
  filter(b1 >= 5) |>
  group_by(a1) |>
  filter(HBE == min(HBE)) |>
  ungroup()
```

We round the bin width values (`a1`), filter for sufficient bin resolution (`b1 >= 5`), and select the configuration with the lowest HBE for each unique bin width.

```{r}
error_df_all |>
  arrange(-a1) |>
  head(5)
```

The plot below shows the relationship between bin width (`a1`) and HBE. The goal is to identify a bin width that minimizes HBE while avoiding overly coarse or fine binning.

```{r, fig.alt="HBE Vs binwidths."}
#| fig-height: 4
#| fig-width: 6
#| out-width: 100%

ggplot(error_df_all,
         aes(x = a1,
             y = HBE)) +
    geom_point(size = 0.8) +
    geom_line(linewidth = 0.3) +
    ylab("HBE") +
    xlab(expression(paste("binwidth (", a[1], ")"))) +
    theme_minimal() +
    theme(panel.border = element_rect(fill = 'transparent'),
          plot.title = element_text(size = 12, hjust = 0.5, vjust = -0.5),
          axis.ticks.x = element_line(),
          axis.ticks.y = element_line(),
          legend.position = "none",
          axis.text.x = element_text(size = 7),
          axis.text.y = element_text(size = 7),
          axis.title.x = element_text(size = 7),
          axis.title.y = element_text(size = 7))

```

