%\VignetteIndexEntry{pedgene_manual}
%\VignetteDepends{pedgene}
%\VignetteKeywords{pedigree, variant, kinship}
%\VignettePackage{pedgene}

%**************************************************************************
%
% # $Id:$
% $Revision:  $
% $Author: Jason Sinnwell$
% $Date:  $

\documentclass[letterpaper]{article}

\usepackage{Sweave}
\textwidth 7.5in
\textheight 8.9in
\topmargin  0.1in
\headheight 0.0in
\headsep 0.0in
\oddsidemargin  -30pt
\evensidemargin  -30pt

<<desc, include=FALSE, echo=FALSE>>=
options(width = 100)
desc <- packageDescription("pedgene")
@

\title{pedgene package vignette \\ 
        Version \Sexpr{desc$Version}}
\author{Jason Sinnwell and Daniel Schaid}
\begin{document}
\maketitle

\section{Introduction}
This document is a brief tutorial for the pedgene package, which implements methods described
in Schaid et al., 2013~\cite{Schaid}, entitled {\sl ``Multiple Genetic Variant Association 
  Testing by Collapsing and Kernel Methods With Pedigree or Population Structured Data''}.   

We begin by showing the data structure and requirements of the input data, and continue with 
examples on running the method.  If the pedgene package is not loaded, load it now.

<<echo=TRUE, eval=TRUE>>=
library(pedgene)
@


\section{Example Data}
Three example data frames are provided within the pedgene package: 
\begin{itemize}
\item{\em example.ped: }{ data.frame with columns ped, person, father, mother, sex, and trait;
        for 3 families of 10 subjects each, and 9 unrelated subjects}
\item{\em example.geno:}{ data.frame with ped and person to match lines to example.ped, and 
  minor allele count for 20 simulated variant positions}
\item{\em example.map: }{data frame with columns for gene name and chromosome matching 
  columns in example.geno}
\end{itemize}

\noindent Load the example data and look at some of the values.

<<exdata>>=
data(example.ped)
example.ped[c(1:10,31:39),]
data(example.geno)
dim(example.geno)
example.geno[,c(1:2,10:14)]
data(example.map)
example.map
@ 

Note the following features:
\begin{itemize}
\item The pedigree data has more subjects than subjects that have genotypes, to keep
  pedigrees connected.
\item Subjects with genotype data are matched to the pedigree data by the ped 
  and person IDs in both data sets
\item The number of non-id columns in the genotype data frame must match the number 
  of rows in the map data frame
\item The two genes provided are actually the same simulated data, but we show they are 
  analyzed differently for chromosome 1 versus chromosome X
\end{itemize}

\section{Case-Control}

\subsection*{No covariate adjustment}

We perform a typical use of pedgene with default settings on the case-control
status provided in the example data.

<<ccbase>>=
pg.cc <- pedgene(ped=example.ped, geno=example.geno, map=example.map)

print(pg.cc)
@ 


\noindent The results show the output for the two genes in a table, 
containing gene, chromosome, the number of variants used, the non-informative variants 
that were removed, the kernel statistic and p-value, and the burden 
statistic and p-value.  The kernel p-value is calculated using Kounen's saddlepoint 
method~\cite{Kounen}, and the burden p-value is based on the normal distribution.

\subsection*{With covariate adjustment}

The pedgene function can work with case-control data and an adjusted trait using 
a covariate.  The data contains the sex variable, so we use it in a logistic regression to get 
adjusted fitted values for the trait, which we add to example.ped at a new column named
{\sl trait.adjusted}, which pedgene recognizes and uses in calculating residuals.

<<cccov>>=
binfit <-  glm(trait ~ (sex-1),data=example.ped, family="binomial")

example.ped$trait.adjusted[!is.na(example.ped$trait)] <- fitted.values(binfit) 
example.ped[1:10,]

pg.cc.adj <- pedgene(ped=example.ped, geno=example.geno, map=example.map)

summary(pg.cc.adj, digits=4)
@ 

\noindent The results are now shown from the summary method, which
adds the call to the pedgene function.  The results are different because we now 
input the fitted values from the logistic regression, which of course can be a more 
complex model than what we use here.

\section{Continuous Trait}

\subsection*{No covariate adjustment}

The pedgene method works similarly for continous traits. We first create a different
version of the pedigree data to have a continous trait, which we simulate.  We simulate 
complete trait data for all subjects, but pedgene will only use the subjects for which we 
have genotype data.

<<continuous>>=
set.seed(10)
cont.ped <- example.ped[,c("famid", "person", "father", "mother", "sex")]
beta.sex <- 0.3
cont.ped$trait <- (cont.ped$sex-1)*beta.sex + rnorm(nrow(cont.ped))

pg.cont <- pedgene(ped = cont.ped, geno = example.geno, map = example.map)
print(pg.cont, digits=4)
@ 

\noindent The results are now non-significant because we simulated
a trait that is slightly influenced by sex, but not the genotype data.

\subsection*{With covariate adjustment}

Now we add the {\sl trait.adjusted} column to the cont.ped data set and run pedgene
again. 

<<continuousadj>>=
gausfit <-  glm(trait ~ (sex-1),data=cont.ped, family="gaussian")

cont.ped$trait.adjusted <- fitted.values(gausfit) 
cont.ped[1:10,]

pg.cont.adj <- pedgene(ped=cont.ped, geno=example.geno, map=example.map)

summary(pg.cont.adj, digits=5)
@

\section{Remarks}

\begin{itemize}
\item The pedgene package is deigned to be rigid in what it expects for pedigree 
    and genotype data, which puts a little extra burden on the user.  However it 
    should reduce confusion when running the method.  
\item One potential run-time bottleneck is a set of pedigree checks performed on all pedigrees.
    If the pedigrees are clean and you have many subjects, you may wish to skip this step with 
    the argument checkpeds=FALSE. 
\item By default, the method uses Madsen-Browning weights~\cite{Madsen}, but it allows user-defined weights 
    per variant with the weights argument.
\end{itemize}


\section{R Session Information}
<<results=tex>>=
toLatex(sessionInfo())
@

\begin{thebibliography}{}
\bibitem{Schaid}
Schaid DJ, McDonnell SK., Sinnwell JP, Thibodeau SN. (2013)
{\bf Multiple Genetic Variant Association Testing by Collapsing and Kernel Methods 
  With Pedigree or Population Structured Data} {\em Genet Epidemiol}, 37(5):409-18.

\bibitem{Kounen}
Kounen D (1999) {\bf Saddle point approximatinos for distributions of
quadratic forms in normal variables} {\em Biometrika}, 86:929 -935.

\bibitem{Davies}
Davies RB (1980) {\bf Algorithm AS 155: The Distribution of a Linear Combination
of chi-2 Random Variables}  {\em Journal of the Royal Statistical
Society. Series C (Applied Statistics)}, 29(3):323-33
 
\bibitem{Madsen}
Madsen BE, Browning SR (2009). 
{\bf A groupwise association test for rare mutations using a weighted sum statistic} 
{\em PLoS Genet} 5(2):e1000384.


\end{thebibliography}
   
\end{document}
