---
title: "Datasets and shipped artifacts"
author: "elcf4R package"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Datasets and shipped artifacts}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(elcf4R)

.elcf4r_available_data <- local({
  pkg_data <- utils::data(package = "elcf4R")
  if (is.null(pkg_data$results)) {
    character()
  } else {
    as.character(pkg_data$results[, "Item"])
  }
})

.elcf4r_load_data <- function(name) {
  if (!(name %in% .elcf4r_available_data)) {
    return(FALSE)
  }
  utils::data(list = name, package = "elcf4R", envir = environment())
  TRUE
}

.elcf4r_example_size_row <- function(object_name, dataset_label) {
  if (!exists(object_name, inherits = FALSE)) {
    return(NULL)
  }
  x <- get(object_name, inherits = FALSE)
  if (!is.data.frame(x)) {
    return(NULL)
  }
  data.frame(
    dataset = dataset_label,
    rows = nrow(x),
    ids = length(unique(x$entity_id)),
    stringsAsFactors = FALSE
  )
}

.elcf4r_benchmark_summary <- function(object_name, dataset_label) {
  if (!exists(object_name, inherits = FALSE)) {
    return(NULL)
  }
  x <- get(object_name, inherits = FALSE)
  if (!is.data.frame(x) || nrow(x) == 0L) {
    return(NULL)
  }
  transform(
    aggregate(
      cbind(nmae, nrmse, smape, mase) ~ method,
      data = x,
      FUN = function(v) round(mean(v, na.rm = TRUE), 4)
    ),
    dataset = dataset_label
  )
}

invisible(lapply(
  c(
    "elcf4r_iflex_example",
    "elcf4r_iflex_benchmark_results",
    "elcf4r_storenet_example",
    "elcf4r_storenet_benchmark_results",
    "elcf4r_lcl_example",
    "elcf4r_lcl_benchmark_results",
    "elcf4r_refit_example",
    "elcf4r_refit_benchmark_results"
  ),
  .elcf4r_load_data
))
```

# Overview

`elcf4R` now supports four household-oriented public data sources through a
common normalized panel schema:

- `elcf4r_read_iflex()`
- `elcf4r_read_storenet()`
- `elcf4r_read_lcl()`
- `elcf4r_read_refit()`

The package also ships compact example panels and saved benchmark results so
the main vignettes run without external downloads. Raw source files stay in
`data-raw/` and are not redistributed through the package unless a compact
derived artifact has been explicitly built and saved.

Two additional unshipped scaffolds are also available:

- `elcf4r_download_ideal()` / `elcf4r_read_ideal()` for aggregate-electricity
  hourly summaries from IDEAL.
- `elcf4r_download_gx()` / `elcf4r_read_gx()` for the GX
  transformer/community-level dataset.

# Supported dataset matrix

The current dataset surface is:

| Dataset | Reader | Resolution | Temperature in normalized panel | Shipped example | Shipped benchmark |
|:--|:--|:--|:--|:--|:--|
| iFlex | `elcf4r_read_iflex()` | hourly | yes | `elcf4r_iflex_example` | `elcf4r_iflex_benchmark_results` |
| StoreNet (`H6_W`) | `elcf4r_read_storenet()` | 1 minute | optional, source-dependent | `elcf4r_storenet_example` | `elcf4r_storenet_benchmark_results` |
| Low Carbon London | `elcf4r_read_lcl()` | 30 minutes | no | `elcf4r_lcl_example` | `elcf4r_lcl_benchmark_results` |
| REFIT | `elcf4r_read_refit()` | user-selected resample | no | `elcf4r_refit_example` | `elcf4r_refit_benchmark_results` |
| ELMAS | not part of the common household reader set | hourly | no | `elcf4r_elmas_toy` | none |

All four household readers return the same core columns:

- `dataset`
- `entity_id`
- `timestamp`
- `date`
- `time_index`
- `y`
- `temp`
- `dow`
- `month`
- `resolution_minutes`

Dataset-specific metadata columns are preserved when available.

# Scaffolded, unshipped datasets

`IDEAL` and `GX` are intentionally documented separately from the core shipped
household matrix.

| Dataset | Helper surface | Level | Current scaffold scope | Shipped example | Shipped benchmark | Licence note |
|:--|:--|:--|:--|:--|:--|:--|
| IDEAL | `elcf4r_download_ideal()`, `elcf4r_read_ideal()` | household | aggregate-electricity hourly summaries from `auxiliarydata.zip` | no | no | the current Edinburgh DataShare record states `CC BY 4.0` |
| GX | `elcf4r_download_gx()`, `elcf4r_read_gx()` | transformer/community | SQLite or flat-export normalization to the common panel schema | no | no | treat licence terms as dataset-record specific and recheck before redistribution |

Notes:

- IDEAL support in this release is limited to aggregate electricity and does
  not attempt to parse the raw 1 Hz stream.
- GX is not an individual-household dataset. It is useful as a secondary
  benchmark source for weather and community-level demand behavior, but it is
  not folded into the package's core household benchmark claims.

# Shipped example panels

The shipped examples are small normalized panels intended for package examples
and vignette code.

```{r}
example_sizes <- Filter(
  Negate(is.null),
  list(
    .elcf4r_example_size_row("elcf4r_iflex_example", "iflex"),
    .elcf4r_example_size_row("elcf4r_storenet_example", "storenet"),
    .elcf4r_example_size_row("elcf4r_lcl_example", "lcl"),
    .elcf4r_example_size_row("elcf4r_refit_example", "refit")
  )
)

if (length(example_sizes) == 0L) {
  data.frame()
} else {
  do.call(rbind, example_sizes)
}
```

These objects can be passed directly to:

- `elcf4r_build_daily_segments()`
- `elcf4r_build_benchmark_index()`
- `elcf4r_benchmark()`

# Shipped benchmark result datasets

Each supported household dataset now has a saved benchmark-result object built
from a fixed local cohort and a deterministic rolling-origin design.

```{r}
benchmark_summary <- Filter(
  Negate(is.null),
  list(
    .elcf4r_benchmark_summary("elcf4r_iflex_benchmark_results", "iflex"),
    .elcf4r_benchmark_summary("elcf4r_storenet_benchmark_results", "storenet"),
    .elcf4r_benchmark_summary("elcf4r_lcl_benchmark_results", "lcl"),
    .elcf4r_benchmark_summary("elcf4r_refit_benchmark_results", "refit")
  )
)

if (length(benchmark_summary) == 0L) {
  data.frame()
} else {
  benchmark_summary <- do.call(rbind, benchmark_summary)
  benchmark_summary[, c("dataset", "method", "nmae", "nrmse", "smape", "mase")]
}
```

These shipped benchmark tables are poster-style artifacts. They are not
intended to replace full local benchmarking on the raw datasets.

# Rebuilding the shipped artifacts

Each shipped dataset is reproducible from a `data-raw/` script:

- `data-raw/elcf4r_iflex_subsets.R`
- `data-raw/elcf4r_iflex_benchmark_results.R`
- `data-raw/elcf4r_storenet_artifacts.R`
- `data-raw/elcf4r_lcl_artifacts.R`
- `data-raw/elcf4r_refit_artifacts.R`

The general pattern is:

1. Place the original raw files in `data-raw/`.
2. Read them through `elcf4r_read_*()`.
3. Build a normalized day index with `elcf4r_build_benchmark_index()`.
4. Save a compact example panel.
5. Run `elcf4r_benchmark()` on a fixed cohort and save the result table.

This keeps the package lightweight while making the shipped examples and
benchmark summaries reproducible.

# Example: daily segments from a shipped panel

```{r}
if (exists("elcf4r_iflex_example", inherits = FALSE)) {
  iflex_segments <- elcf4r_build_daily_segments(
    elcf4r_iflex_example,
    carry_cols = c("dataset", "participation_phase", "price_signal")
  )

  dim(iflex_segments$segments)
  head(iflex_segments$covariates[, c("entity_id", "date", "temp_mean", "price_signal")])
} else {
  data.frame()
}
```

# Example: rerun a tiny benchmark locally

```{r}
if (exists("elcf4r_lcl_example", inherits = FALSE)) {
  tiny_index <- elcf4r_build_benchmark_index(
    elcf4r_lcl_example,
    carry_cols = "dataset"
  )

  tiny_benchmark <- elcf4r_benchmark(
    panel = elcf4r_lcl_example,
    benchmark_index = tiny_index,
    methods = c("gam", "kwf"),
    cohort_size = 1,
    train_days = 10,
    test_days = 2,
    include_predictions = FALSE
  )

  tiny_benchmark$results[, c("entity_id", "method", "test_date", "nmae", "mase")]
} else {
  data.frame()
}
```