---
title: "Introduction to ineq.2d package"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to ineq.2d package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(ineq.2d)
```

This is the introduction to the ineq.2d package.

The package contains functions performing two-dimensional decomposition of the 
Theil index (see Giammatteo, 2007) and the squared coefficient of variation 
(see Garcia-Penalosa & Orgiazzi, 2013). Both measures can be decomposed by some 
feature that members of the studied population possess (e.g., sex, education, 
age) and their income source at the same time. 

Researchers and students interested in studying income or wealth inequality can 
benefit from fast and simple inequality decomposition offered by this package. 


First, let us load the test dataset to the environment and examine its content.
```{r}
data(us16)
str(us16)
```

This dataset contains several income variables: hitotal, hilabour, hicapital,
and hitransfer. This is a household-level data. This is why every variable name
begins with "h". hitotal represents total income of a given household. The other
three income variables are components of hitotal (i.e., their sum equals hitotal). 

Additionally, this dataset contains three variables representing some feature
of the household head: sex, educ, and age. 

Finally, the dataset contains population weights for every household: hpopwgt.

Let us now try decomposing both indexes only by sex. This is an example of
one-dimensional decomposition.

We decompose the Theil index first:
```{r}
theil.2d(us16, "hitotal", "sex", "hitotal", "hpopwgt")
```

Remember that the Theil index contains natural logarithm in its formula. 
This is why non-positive values are automatically removed during calculation.

Decomposition of the squared coefficient of variation (SCV) is done similarly:
```{r}
scv.2d(us16, "hitotal", "sex", "hitotal", "hpopwgt")
```

Every column of the output data frame represents a value of the feature used for
decomposition (here, it is sex). There can be inequality within groups formed by 
this feature and between them - there are twice as much columns as values of the 
given feature. Whether a column contains a value of within or between-group
inequality is indicated by ".W" and ".B" suffixes respectively.

Now, we can try two-dimensional decomposition. That is, we decompose both 
inequality measures by sex and by income source at the same time.

First, we decompose the Theil index:
```{r}
theil.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                   "hitransfer"), "hpopwgt")
```

Then, we decompose SCV:
```{r}
scv.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                 "hitransfer"), "hpopwgt")
```

Now we have both rows and columns in this data frame.
Every row of the data frame represents an income source. Thus, in case of two-
dimensional decomposition, every value in this data frame is the contribution of
inequality in income earned from i-th source by members of j-th population 
cohort to overall income inequality.

Remember that overall Theil index, which is the sum of all values in the data
frame, is always positive. However, some components of the index can have
negative contribution to inequality.

If you want the functions to return percentage shares of every inequality
component in overall inequality rather than indexes, then set the option "perc"
to "TRUE".
```{r}
theil.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                   "hitransfer"), "hpopwgt", perc = TRUE)

scv.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                 "hitransfer"), "hpopwgt", perc = TRUE)
```

Overall inequality measures can be obtained in two ways. The first one is to sum 
the values in the output data frame:
```{r}
theil1 <- theil.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                             "hitransfer"), "hpopwgt")
sum(theil1[,-1])

scv1 <- scv.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                         "hitransfer"), "hpopwgt")
sum(scv1[,-1])
```

The second way is to avoid specifying the feature and income sources:
```{r}
theil.2d(us16, "hitotal", weights = "hpopwgt")

scv.2d(us16, "hitotal", weights = "hpopwgt")
```

Decomposition by education level is done the same way as demonstrated above. 
You only need to specify "educ" instead of "sex" in function inputs.

Decomposition by age represents a more complicated example. Unlike sex and educ, 
which assume two and three values respectively, age can assume multiple values 
because it is measured in years. To decompose the indexes by age, one needs to 
add column indicating that a household is a member of some age cohort. This can
be done as follows:
```{r}
us16$cohort <- 0
us16[us16$age < 25, "cohort"] <- "t24"
us16[us16$age >= 25 & us16$age < 50, "cohort"] <- "f25t49"
us16[us16$age >= 50 & us16$age < 75, "cohort"] <- "f50t74"
us16[us16$age >= 75, "cohort"] <- "f75"
```

After this variable has been created, we can decompose the indexes by the age
cohorts and income sources:
```{r}
theil.2d(us16, "hitotal", "cohort", c("hilabour", "hicapital", 
                                      "hitransfer"), "hpopwgt")

scv.2d(us16, "hitotal", "cohort", c("hilabour", "hicapital", 
                                    "hitransfer"), "hpopwgt")
```

References:

Garcia-Penalosa, C., & Orgiazzi, E. (2013). Factor Components of Inequality:
A Cross-Country Study. Review of Income and Wealth, 59(4), 689-727.

Giammatteo, M. (2007). The Bidimensional Decomposition of Inequality:
A nested Theil Approach. LIS Working papers, Article 466, 1-30.