---
title: "mut"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{mut}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

vignette_file <- function(...) {
  candidates <- c(
    file.path(...),
    file.path("vignettes", ...),
    file.path("inst", "extdata", ...),
    file.path(Sys.getenv("PWD"), "inst", "extdata", ...),
    system.file("extdata", ..., package = "oncoPredict"),
    system.file("doc", ..., package = "oncoPredict")
  )
  candidates <- candidates[nzchar(candidates) & file.exists(candidates)]
  if (!length(candidates)) {
    stop("Could not find vignette file: ", file.path(...), call. = FALSE)
  }
  candidates[[1]]
}
```

```{r setup}
library(oncoPredict)

#This vignette demonstrates how to prepare predicted drug response and mutation
#data for mutation-based IDWAS with idwas(cnv=FALSE).

#Determine the parameters of the idwas() function...
#Set the drug_prediction parameter.
#Make sure rownames() are samples, and colnames() are drugs. Also make sure this data is a data frame.
drug_prediction<-as.data.frame(read.table(vignette_file("DrugPredictions.txt"), header=TRUE, row.names=1))
#In this example, replace '.' with '-' so the TCGA sample identifiers match the
#format used in the mutation data.
colnames(drug_prediction)<-gsub(".", "-", colnames(drug_prediction), fixed=T)
#Make sure the sample identifiers in the 'drug prediction' data are of similar form as the sample identifiers in the 'data' parameter.
cols=colnames(drug_prediction)
colnames(drug_prediction)<-substring(cols, 3, nchar(cols))
drug_prediction<-as.data.frame(t(drug_prediction))

```


This vignette provides an example of how to prepare mutation data from the GDC database for GBM (glioblastoma) and how to apply `idwas()` to test predicted drug response against somatic mutations.

Because GDC and TCGAbiolinks access patterns can change over time, the download code is shown as non-executed guidance.

Download mutation data for your cancer of interest from GDC database.

https://bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/mutation.html

https://rdrr.io/bioc/TCGAbiolinks/f/vignettes/mutation.Rmd


The code would look something like this:

```{r mutation-download, eval=FALSE}
library(TCGAbiolinks)

query_maf <- GDCquery(project = "TCGA-GBM",
                      data.category = "Simple Nucleotide Variation",
                      access = "open",
                      data.type = "Simple somatic mutation",
                      legacy = TRUE)

GDCdownload(query_maf)
maf <- GDCprepare(query_maf)
```

After downloading the mutation data, format the mutation table before running IDWAS.

```{r mutation-formatting, eval=FALSE}
#Make sure this data is a data frame with mutation annotations in columns.
#For idwas(cnv=FALSE), the data should include Variant_Classification,
#Hugo_Symbol, and Tumor_Sample_Barcode.
data<-as.data.frame(maf)
samps<-data$Tumor_Sample_Barcode
data$Tumor_Sample_Barcode<-substr(samps,1,nchar(samps)-12) #Make sure these sample ids are of the same form as the sample ids in your prediction data.

#Determine the number of samples you want mutations to occur in. The default is 10.
n=10

#Indicate whether or not you would like to test CNA amplification data. If TRUE, you will test CNA amplifications. If FALSE, you will test mutation data.
cnv=FALSE
```