---
title: "MeSH Tables"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{MeSH Tables}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE,
  comment = "#>"
)
```

`puremoe` ships three MeSH reference tables: a thesaurus of descriptors and entry terms, a tree of hierarchical classifications, and a frequency table of descriptor occurrence across PubMed.

```{r libs}
library(puremoe)
library(dplyr)
library(DT)
```

## MeSH thesaurus

`data_mesh_thesaurus()` downloads and combines the MeSH Descriptor Thesaurus and Supplementary Concept Records (SCR). One row per term, including synonyms and entry terms for each descriptor.

```{r thesaurus}
thesaurus <- puremoe::data_mesh_thesaurus()
```

```{r thesaurus-table}
thesaurus |>
  head(20) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))
```

## MeSH trees

`data_mesh_trees()` provides the hierarchical classification structure. Each descriptor can appear in multiple branches; `tree_location` encodes the full path (e.g., `I01.880.604` = Social Sciences > Political Science > Political Systems).

```{r trees}
trees <- puremoe::data_mesh_trees()
```

```{r trees-table}
trees |>
  head(20) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))
```

## MeSH descriptor frequencies

`data_mesh_frequencies` is a bundled dataset giving the frequency of each MeSH
descriptor across the full PubMed corpus (39.7 M PMIDs, April 2026). Proportions
use the total corpus as denominator, making them suitable as a baseline for
enrichment analyses against arbitrary PubMed subsets.

```{r frequencies}
puremoe::data_mesh_frequencies |>
  head(20) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))
```

## Persistent storage

Both datasets are ~10 MB and fetched from GitHub on each call by default. To avoid re-downloading every session, set `use_persistent_storage = TRUE` — the files are cached to a system data directory and reused on subsequent calls.

```{r persistent, eval=FALSE}
thesaurus <- puremoe::data_mesh_thesaurus(use_persistent_storage = TRUE)
trees     <- puremoe::data_mesh_trees(use_persistent_storage = TRUE)
```
