---
title: "Execution"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Execution}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Executing the study

Now as we have initiated [database connection](https://healthinformaticsut.github.io/CohortContrast/articles/a00_introduction.html) and created the `targetTable` as well as the `controlTable` we are ready to execute the study.

The chunk below shows what a saved study looks like after execution by loading the bundled `lc500` example results:

```{r}
if (requireNamespace("nanoparquet", quietly = TRUE)) {
  studyDir <- system.file("example", "st", package = "CohortContrast")
  study <- CohortContrast::loadCohortContrastStudy("lc500", pathToResults = studyDir)

  # Inspect the main exported components created by a completed run.
  names(study)
}
```

This is the same type of output object you can reload from your own saved study directory after running `CohortContrast()`.


```{r, include = TRUE, eval=FALSE, echo=TRUE}

################################################################################
#
# Execute
#
################################################################################

data = CohortContrast::CohortContrast(
  cdm,
  targetTable = targetTable,
  controlTable = controlTable,
  pathToResults = file.path(getwd(), "studies"),
  domainsIncluded = c(
    "Drug",
    "Condition",
    "Measurement",
    "Observation",
    "Procedure",
    "Visit",
    "Visit detail",
    "Death"
  ),
  prevalenceCutOff = 2.5,
  topK = FALSE, # Number of features to export
  presenceFilter = 0.2, # 0-1, percentage of people who must have the chosen feature present
  complementaryMappingTable = NULL, # Optional manual concept mapping table
  getSourceData = FALSE, # If true will generate summaries with source data as well
  runChi2YTests = TRUE,
  runLogitTests = FALSE,
  createOutputFiles = TRUE,
  complName = "LungCancer_1Y")

```

## The parameters

There are multiple parameters we can tweak for different outcomes:

### Mandatory:

`cdm` Connection to the database

`targetTable` Table for target cohort

`controlTable` Table for control cohort

`pathToResults` Path to the results folder, can be project's working directory

`domainsIncluded` list of CDM domains to include, choose from Drug, Condition, Measurement, Observation, Procedure, Visit, Visit detail, Death

`complName` Name of the output study directory


### Customization:

`runChi2YTests` boolean for running CHI2Y tests (chi-squared tests for two proportions with Yates continuity correction)

`runLogitTests` boolean for logit-tests on the prevalence, builds a model for predicting whether the patient is in target or control

`getAllAbstractions` boolean for creating abstractions' levels for the imported data, this is useful when using GUI and exploring data

`maximumAbstractionLevel` Maximum level of abstraction allowed, if `getAllAbstractions` is TRUE, for hierarchy the concept_hierarchy table is used

`getSourceData` boolean for fetching source data, the data abstraction level which is used to map to OMOP CDM

`prevalenceCutOff` numeric or FALSE, if set, removes all of the concepts which are not present (in target) more than `prevalenceCutOff` times. Eg if set to 2, only concepts present double in target are exported.

`topK` numeric or FALSE, if set, keeps at maximum this number of features in the analysis. Maximum number of features exported.

`presenceFilter` numeric or FALSE, if set, removes all features represented by fewer target cohort subjects than the given percentage

`complementaryMappingTable` data frame or NULL. Mapping table for concept merges. Columns: CONCEPT_ID, CONCEPT_NAME, NEW_CONCEPT_ID, NEW_CONCEPT_NAME, ABSTRACTION_LEVEL, TYPE

`numCores` Number of cores to allocate to parallel processing, by default max number of cores - 1

`createOutputFiles` Boolean for creating output files, the default value is TRUE

`runRemoveTemporalBias` boolean for optional temporal-bias reduction step after main workflow

`runAutomaticHierarchyCombineConcepts` boolean for optional hierarchy-based post-processing

`runAutomaticCorrelationCombineConcepts` boolean for optional correlation-based post-processing

### Notes:

When using the GUI `prevalenceCutOff`, `presenceFilter` can be changed on a slider.

The effect of `runChi2YTests` and `runLogitTests` can be toggled as a filter.

The function will output a study directory with `complName`, in this case `LungCancer_1Y`, inside `pathToResults`.
The study directory contains parquet files (for example `data_patients.parquet`) and a metadata file `metadata.json`.

## Reloading a saved study

```{r, include = TRUE, eval=FALSE, echo=TRUE}
reloaded <- CohortContrast::loadCohortContrastStudy(
  studyName = "LungCancer_1Y",
  pathToResults = file.path(getwd(), "studies")
)
```