---
title: "Introduction to the Unifed Distribution"
author: "Oscar Alberto Quijano Xacur"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: bibliography.bib
vignette: >
  %\VignetteIndexEntry{Introduction to the Unifed Distribution}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

\newcommand{\esp}{\mathbb{E}}
\newcommand{\var}{\mathbb{V}}
\newcommand{\varf}{{\bf V}}
\newcommand{\dkappa}{\dot{\kappa}}
\newcommand{\ddkappa}{\ddot{\kappa}}


# Introduction

The unifed distribution is the Exponential Dispersion Family (EDF)
generated by the uniform distribution [see @Jorgensen-book Chapters 2
and 3 to see how an EDF can be generated from a distribution]. It has
support on the interval (0,1).

The [unifed R
 package](https://CRAN.R-project.org/package=unifed)   focuses on
 the case where the dispersion parameter is equal to one. This is due
 to some numerical problems encountered when trying to implement the
 general case [see @unifed-article for details]. The package also
 includes code for using this distribution with [stan](https://mc-stan.org/).

The unifed can be used as the response distribution of a GLM. Note
that although the functions in this package implement the case with
dispersion 1, it is still possible to have classes with weights
different than one in a unifed GLM. This is because for a GLM we only
need to minimize the deviance and disregard the rest of the density
(which is the part that gives numerical stability).

We start with a section about Exponential Dispersion Families and
their properties. Then we introduce the unifed density, it's cumulant
generator, and unit deviance function. This is followed by an example,
where we use the package for fitting a frequentist and a Bayesian
unifed GLM to a publicly available dataset. We close with a brief
comparison between the unifed GLM and the beta regression.


# Exponential Dispersion Families

An Exponential Dispersion Family (EDF) is a set of distributions whose
densities are given by

\begin{equation}
f(y|\theta,\phi) = a(y,\phi)\exp\left(\frac{1}{\phi} \left\{y\theta
     - \kappa(\theta) \right\}\right),\qquad \theta\in\Theta, \phi\in\Phi.
\end{equation}

$\theta$ and $\Theta$ are called the canonical parameter and
canonical space, respectively and $\phi$ is known as the dispersion parameter. The parametrization above is
know as the canonical parametrization. For$\theta\in\mbox{int} \left(
\Theta\right)$ (here int stands for interior),
$$  \esp[Y] = \dot{\kappa}\left(\theta\right)\qquad \mbox{and}\qquad  \var[Y]=\phi\ddot{\kappa}\left(\theta\right),$$
where $\dot{\kappa}=\kappa'$ and
$\ddot{\kappa}=\dot{\kappa}'$. This motivates the following
definitions.

**Definition** Given an exponential dispersion family, the mean domain
of the family is defined as
$$    \Omega = \left\{\mu = \dot{\kappa}\left(\theta\right) : \theta \in \textrm{int}\left(\Theta\right) \right\}.$$

Another important property is that the support of the distribution
only depends on $\phi$ (and not on $\theta$). For a given family, let
$C_\phi$ be the convex support of any member of the family with
dispersion parameter $\phi$. We define the convex support of the family as
$$C_\Phi=\bigcup_{\phi\in\Phi} C_\phi.$$

**Definition** The unit deviance function of an exponential dispersion
  family is defined as$d: C_\Phi\times \Omega \rightarrow [0,\infty)$ with
$$    d\left(y,\mu\right) = 2 \left[ \sup_{\theta\in\Theta}\{\theta y - \kappa (\theta)\} -
      y \dot{\kappa}^{-1}(\mu) + \kappa\big(\dot{\kappa}^{-1}(\mu)\big)
 \right].$$
 
The unit deviance function allows to re-parametrize the densities of the family as
\begin{equation}
  \label{eq:mean-value-par}
  f(y|\mu,\phi) = c(y,\phi)\exp\left(-\frac{1}{2\phi}d(y,\mu) \right)
\end{equation}
for some function $c$. This is known as the mean--value parametrization. 

## Weights and Data Aggregation

In many applications it is useful to include a known positive weight
to each observation. When this is done, the dispersion parameter is
divided by the weight $w$, and the canonical and mean--value
parametrizations become
\begin{align*}
f(y|\theta,\phi) &= a(y,\phi/w)\exp\left(\frac{w}{\phi} \left\{y\theta
     - \kappa(\theta) \right\}\right),\\
f(y|\mu,\phi) &= c(y,\phi/w)\exp\left(-\frac{w}{2\phi}d(y,\mu) \right).
\end{align*}


There is a useful property of reproductive exponential dispersion
families that allows for data aggregation. Jørgensen's notation [from
@Jorgensen-book] is very convenient to express this property: given a
fixed exponential family, if \(Y\) belongs to that family with mean \(\mu\), dispersion parameter \(\phi\) and weight \(w\), we say that it is
\(ED(\mu,\phi/w)\) distributed. The property is then as follows: if
\(Y_1,Y_2,\cdots,Y_n\) are independent, and \(Y_i \sim
ED(\mu,\phi/w_i)\), then
\begin{equation}
  \label{eq:aggregate-equation}
  \bar{Y}=\frac{w_1Y_1+\cdots+w_nY_n}{w_+}\sim ED(\mu,\phi/w_+),\qquad
  w_+=\sum_{i=1}^n w_i.
\end{equation}


# The Unifed Family

The unifed can be characterized as the only EDF containing the uniform
distribution. In fact, *unifed* = ***uni**form* + ***ed**f*

Due to the numerical instability [see @unifed-article] of the density
when the dispersion parameter is different than 1, we focus here in
the case where the dispersion parameter is equal to 1. In this case,
the density is given by

$$f(x;\theta) = \left\{
  \begin{array}{ll}
    \frac{\theta}{e^\theta - 1} e^{x \theta} & \mbox{if } \theta \neq 0\\
    1 & \mbox{if } \theta = 0
  \end{array}
  \right. \quad \mbox{for } x \in (0,1),$$

where $\theta$ is called the canonical parameter. The `unifed` package
provides the functions `dunifed`, `punifed`, `qunfied` and `runifed`
for the density, distribution, quantile and random generation functions
respectively. Here are some examples of use of these functions taken
from the documentation

```{r,results="hide"}
library(unifed)
dunifed( c(0.1,0.3,0.7), 10)

x <- c(0.1,0.4,0.7,1)
punifed(x,-5)

p <- 1:9/10
qunifed(p,5)

runifed(20,-3.3)
```

The mean of this distribution can be any value in (0,1). In the next
a section explicit formula for the mean in terms of $\theta$ is given. A with
any proper exponential family, there is a one-to-one correspondence
between the mean $\mu$ and $\theta$. The next graphs show the density
of the unifed as the mean varies.

```{r,echo=F,fig.align='center'}
library(unifed)

add.theta.plot <- function(theta,mu,right=F,...){
    N <- 100
    x <- seq(0,1,length.out=N)
    y <- dunifed(x,theta=theta)
    points(x,y,type="l",...)
    label <- substitute(expression(mu == val),env=list(val=mu))
    if(right)
        text(x[N],y[N],labels=eval(label),pos="2",cex=0.8)
    else
        text(x[1],y[1],labels=eval(label),pos="4",cex=0.8)
       
}

unifed.density.plots1 <- function(){
    par(mar=c(2,2,0.5,0.5))
    plot(x=c(0,1),y=c(0,10),type="n",xlab="",ylab="",ylim=c(0,10.2),xlim=c(-0.05,1))
    mu.vector <- seq(0.1,0.5,by=0.1)
    theta.vector <- unifed.kappa.prime.inverse(mu.vector)
    for(i in 1:length(theta.vector)){
        theta <- add.theta.plot(theta.vector[i],mu.vector[i],lty=i)}
    
}

unifed.density.plots2 <- function(){
    par(mar=c(2,2,0.5,0.5))
    plot(x=c(0,1),y=c(0,10),type="n",xlab="",ylab="",ylim=c(0,10.2),xlim=c(0,1.05))
    mu.vector <- seq(0.5,0.9,by=0.1)
    theta.vector <- unifed.kappa.prime.inverse(mu.vector)
    for(i in 1:length(theta.vector)){
        theta <- add.theta.plot(theta.vector[i],mu.vector[i],T,lty=length(theta.vector)+1-i )}
    
}


par(mfrow=c(1,2))
## Densities with mean less than one
unifed.density.plots1()

## Densities with mean greater than one
unifed.density.plots2()

```

# Cumulant generator

The cumulant generator function characterizes and exponential
family. For the unifed it is given by

$$\kappa(\theta)=\left\{
    \begin{array}{ll}
      \log\left(\frac{e^\theta-1}{\theta}\right)& \mbox{if }\theta\neq
                                                  0\\
      0 & \mbox{if }\theta=0
    \end{array}
    \right..$$
	
One difficulty in implementing $\kappa$ in a computer function is that
$e^\theta$ above takes the value infinity even for relatively small
$\theta$. For instance `exp(1000)` in R gives `Inf`.

To avoid this problem the function that implements $\kappa$ in the package,
`unifed.kappa`, uses an approximation for values of $\theta$ that are
greater than 50. This function is implemented as follows

$$\kappa_{mod}(\theta)=\left\{
    \begin{array}{ll}
      \kappa(\theta) & \mbox{if }\theta \leq 50\\
       \theta - \log(\theta) & \mbox{if }\theta > 50
    \end{array}
    \right..$$

The justification for this approximation is that if $y$ is defined as 
$$ y = \log\left(\frac{e^\theta - 1 }{\theta}\right),$$

then also, 

$$ y = \theta  -  \log( \theta + e^{-y}).$$

Now, $\kappa$ is an increasing function and $y$ goes to infinity as
$\theta$ goes to infinity. Thus for large values of $\theta$ the term
$e^{-y}$ is close to zero. We use 50 as the threshold for the approximation
since evaluating the following code in R already gives zero
```{r,results='hide'}
 log( ( exp(50) - 1) / 50  ) - ( 50 - log (50) )
```

# The Variance and Unit Deviance Functions

From the properties of exponential families we know then that if $X$
follows a unifed distribution distribution with canonical parameter
$\theta$, its mean and variance are given by

$$\begin{aligned}
  \esp[X]&= \dkappa(\theta) = \left\{
  \begin{array}{ll}
   \frac{(\theta-1)e^\theta + 1}{\theta(e^\theta -1)} & \mbox{if
                                                        }\theta \neq
                                                        0\\
    \frac{1}{2} & \mbox{if }\theta=0
  \end{array}
  \right., \qquad \\
  \var[X]&= \ddkappa(\theta)=\left\{
    \begin{array}{ll}
      \left(\frac{ e^{2\theta} - (\theta+2)e^\theta + 1}
      {\theta^2 (e^\theta-1)^2}\right)& \mbox{if }\theta \neq 0 \\
      \frac{1}{12} & \mbox{if }\theta=0.
    \end{array}
    \right.
\end{aligned}$$

Where $\dkappa$ and $\ddkappa$ are the first and second derivative of
$\kappa$, respectively. The variance function allows us to express the
variance in terms of the mean. It is defined as 

$$\varf(\mu) = \ddkappa(\dkappa^{-1}(\mu)).$$

We have not been able to find an analytical expression for
$\dkappa^{-1}(\mu)$. Thus, the function `unifed.kappa.prime.inverse`implements it using the [Newton-Raphson
    method](https://en.wikipedia.org/wiki/Newton%27s_method) for
    solving equations. It takes a value between 0 and 1 and it returns
    the value of $\theta$ that corresponds to that mean. There are
    some numerical stability problems around 0 and 0.5.
 
In order to solve the problem around 0.5, this function returns 0
(which is the image of 0.5), for every $\mu$ with $|\mu -
0.5|<0.0001$.
 
To address the problems around 0, `unifed.kappa.prime.inverse` returns
$-\dkappa^{-1}(1-\mu)$ for $\mu<0.1$. The justification for this is that for every $\theta$

$$\dkappa(\theta) = 1 - \dkappa(-\theta).$$

In the package the variance function is implemented by the function
`unifed.varf`. The following code plots of the values of the variance
function.


```{r,fig.align='center'}
x <- seq(0,1,length.out=100)
y <- unifed.varf(x)
plot(x,y,type="l",xlab=expression(mu),ylab=expression(bold(V)(mu)),main="Unifed Variance Function")
```

We do not have an analytical expression for the unit deviance. The
package provides the function `unifed.deviance` that computes the
deviance numerically. It uses `unifed.kappa.prime.inverse` and the
fact that the deviance can be expressed as

$$d\left(y,\mu\right) = 2 \left[ y\{ \dot{\kappa}^{-1}(y)
        -\dot{\kappa}^{-1}(\mu) \}  - 
        \kappa\big(\dot{\kappa}^{-1}(y)\big) +
        \kappa\big(\dot{\kappa}^{-1}(\mu)\big)
 \right].$$
 
# Unifed GLM - An Illustrative Example

The package provides the function `unifed` which returns a family that
can be used with the `glm` function of R. This section gives an
example of how to prepare a dataset and fit a unifed GLM.

The data from this section appears in @glm-insurance-book. It is based
on 67,856 one–year auto in- surance policies from 2004 or 2005. It
comes with the package in the variable `car.insurance` and it is
lazily loaded. The original csv file can be downloaded from the [companion site of the
book](http://www.businessandeconomics.mq.edu.au/our_departments/Applied_Finance_and_Actuarial_Studies/research/books/GLMsforInsuranceData), as the dataset
called Car. The following text is the description of the variables
provided by the website of the book
```
This data set is based on  one-year vehicle insurance
policies taken out in 2004 or 2005. There are 67856 policies, of
which  4624 (6.8%) had at least one claim. 

Variables:

veh_value	vehicle value, in $10,000s
exposure	0-1
clm		occurrence of claim (0 = no, 1 = yes)
numclaims	number of claims
claimcst0	claim amount (0 if no claim)
veh_body	vehicle body, coded as
              BUS
              CONVT = convertible  
              COUPE 
              HBACK = hatchback              
              HDTOP = hardtop
              MCARA = motorized caravan
              MIBUS = minibus
              PANVN = panel van
              RDSTR = roadster
              SEDAN    
              STNWG = station wagon
              TRUCK           
              UTE - utility
veh_age	age of vehicle: 1 (youngest), 2, 3, 4
gender		gender of driver: M, F
area		driver's area of residence: A, B, C, D, E, F
agecat		driver's age category: 1 (youngest), 2, 3, 4, 5, 6
```

We are interested in modeling the exposure; which is the proportion of
time in a year in which the insurance policy is in-force for a given
client. We use `gender`, `agecat`, `area` and `veh_age` as the
explanatory variables.

## Preparing the Data

The following block aggregates the data using data frames

```{r,results="hide",eval=TRUE}
car.insurance$agecat <- factor(car.insurance$agecat)
car.insurance$veh_age <- factor(car.insurance$veh_age)

agg.car.data <- aggregate( cbind(exposure,rep(1, dim(car.insurance)[1] )) ~ gender + agecat + area + veh_age,
                          data=car.insurance,
                          FUN=sum)

colnames(agg.car.data)[colnames(agg.car.data) == "V2"] <- "weight"
colnames(agg.car.data)[colnames(agg.car.data) == "exposure"] <- "class.exposure"
agg.car.data$class.exposure <- agg.car.data$class.exposure / agg.car.data$weight
```

Using `data.table` the same can be achieved with the following

```{r,results="hide",eval=TRUE}
library(data.table)
car.insurance <- data.table(car.insurance)
car.insurance[,agecat:=factor(agecat)]
car.insurance[,veh_age:=factor(veh_age)]

agg.car.data <- car.insurance[,.(class.exposure=sum(exposure)/.N,
                              weight=.N),
                              by=.(gender,agecat,area,veh_age)]
```

## Fitting and Diagnostics

Now that we have an aggregated version of our data in the variable
`agg.car.data`, we can fit a unifed GLM as follows

```{r,results="hide",eval=TRUE}
exposure.model <- glm(class.exposure ~ gender + agecat + area + veh_age,
                 family=unifed(),
                 weights=weight,
                 data=agg.car.data)
```
The default link function for the unifed family is the logistic
function. Since no argument as passed to `unifed` above it is the one
being here. The documentation of `unifed` contains the other functions
that can be used as link function and how to select them.

To see the glm summary of this model, we use the `summary`
function. For the unifed family, it is necessary to pass the argument
`dispersion=1` explicitly. Otherwise we would be getting information
for a unifed quasi family.

```{r,eval=TRUE}
summary(exposure.model,dispersion=1)
```

The following shows a plot of the deviance residuals for this model

```{r,fig.align='center',eval=TRUE}
plot(predict(exposure.model, type="response"),
     residuals(exposure.model, type="deviance"),
     xlab="Predicted value",
     ylab="Deviance residuals")
```

## Verifying Data Aggregation

We mentioned before that GLMs allow for data aggregation. We use the
example from the previous section to show explicitly what that means.

Let us fit the glm again but without aggregating the data:

```{r,eval=FALSE}
exposure.model.2 <- glm(exposure ~ gender + agecat + area + veh_age,
                        family=unifed(),
                        data=car.insurance)
```

If you see the coefficients of the model above and compare them to the
coefficients of the model in the previous section you will see that
there are practically the same. Thus, when we said that the unifed GLM
is suitable for data aggregation we meant that the fitted coefficients
are the same for the original and the aggregated data. Note that this
property is not exclusive for the unifed GLM. It is true for every GLM
where the response is a continuous distribution.

Now, not all the printed decimals of the coefficients of both models
are the same. This is because the algorithm for fitting glms stops
after a convergence tolerance condition in the variation of the
coefficients between iterations is met. If we want to decrease the
variation between both models we can simply make the tolerance
condition more strict by using the `control` argument of the `glm`
function for both models:

```{r,eval=FALSE}
exposure.model <- glm(class.exposure ~ gender + agecat + area + veh_age,
                      family=unifed(),
                      weights=weight,
                      control=list(epsilon=1e-20,maxit=1e5),
                      data=agg.car.data)

exposure.model.2 <- glm(exposure ~ gender + agecat + area + veh_age,
                        family=unifed(),
                        control=list(epsilon=1e-20,maxit=1e5),
                        data=car.insurance)
```

## Bayesian Unifed GLM

The package provides stan code for working with the unifed. It
includes functions for computing the density and cumulative
distribution function, random number generator, the cumulant generator
and two of it's derivatives. See the section unifed.stan in the manual
of the package to see a comprehensive list. If you would like to get
familiar with stan you could start with [this tutorial](https://mc-stan.org/rstan/articles/rstan.html).

We show now how to fit a Bayesian version of the unifed GLM example
shown above in stan. For this we use the function `unifed_glm_lp`,
which receives three arguments: a vector with the observed responses,
the vector of canonical parameters and a vector of weights.

First save the following stan code in a file
`unifed_example.stan`. Note that a normal distribution with mean 0 and
standard deviation 20 is used as the prior of the regression
coefficients.

```
functions{
#include unifed.stan
}

data{
  int<lower=1> M; //Rows in the design matrix
  int<lower=1> P; //Columns in the design matrix
  matrix[M,P] X;
  vector<lower=0,upper=1>[M] y;
  int<lower=1> ws[M]; //Number of observations in each class
}


parameters{
  vector[P] beta;
}

transformed parameters{
  vector[M] theta;
  theta = X*beta;   
}


model{ 
  beta ~ normal(0,20);
  unifed_glm_lp(y, theta , to_vector(ws));      
}

generated quantities{
  vector[M] replicated_samples=rep_vector(0,M);
  vector[M] mu;
  
  for(i in 1:M){
    int Nobs;
    Nobs=ws[i];
    for(n in 1:Nobs ){
      replicated_samples[i]+=unifed_rng(theta[i]);
    }
    replicated_samples[i]/=Nobs;
    mu[i]=unifed_kappa_prime(theta[i]);
  }}
```

The include statement makes reference to the stan file provided in the
unifed function. When the code is compiled we have to tell stan where
to find that file. The R functions `unifed.stan.path` and
`unifed.stan.folder` return the full path to the file and to the
folder containing it respectively.

We now format data for the stan code. For this we use the variable
`agg.car.data` defined before.

```{r,eval=FALSE}
X <- model.matrix( ~ gender + agecat + area + veh_age, agg.car.data)

model.data <- list(M=dim(X)[1],
                   P=dim(X)[2],
                   X=X,
                   y= agg.car.data$class.exposure,
                   ws= agg.car.data$weight)

```
Finally, the following code compiles the stan code, gets the
simulations and saves them in the variable `m1`.

```{r,eval=FALSE}
library(rstan)
model.list <- stanc_builder("unifed_example.stan",
                            isystem=unifed.stan.folder())

m1 <- stan(model_code=model.list$model_code,
           data=model.data,
           warmup=3e3,
           iter=2e4,
           chains=4)
```


# Comparison to the Beta Regression

The beta regression [@beta-regression] is a model for applications
with a response variable in (0,1). In R it is implemented in the
package `betareg` [@r-beta-regression]. That package includes the
enhancements of the model from @beta-double-regression, where
explanatory variables are also used for the dispersion parameter.

The density of the beta distribution is much more flexible than the
density of the unifed distribution. To see this compare the plot of
the unifed densities shown here with the one provided for the beta in
@beta-regression.

In the cases where both a unifed GLM and a beta regression give a
similar fit, the parsimony principle suggests to take the unifed GLM
since it has less parameters. 

On the computational side, when using an unifed GLM one can apply
the properties of reproductive EDFs for data reduction.


`r if (knitr::is_html_output()) '# References {-}'`

<!--  LocalWords:  frequentist
 -->
