---
title: "Getting Started with trendseries"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with trendseries}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  markdown:
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4.5,
  message = FALSE,
  warning = FALSE
)
```

# What is trendseries?

The `trendseries` package helps you extract trends from economic time
series data. Trends can be broadly understood as the underlying
"direction" of the data, when stripped of its noise and seasonal
patterns.

The goal of `trendseries` is to provide a modern, pipe-friendly
interface for exploratory analysis of time series data in conventional
`data.frame` format. Throughout this vignette, the terms `data.frame`
and "data frame" will refer to any dataset in a rectangular format,
i.e., `data.frame`/`tibble`/`data.table`.

Most trend extraction methods in R expect `ts` objects, but real-world
data typically lives in data frames. Converting back and forth between
`ts` and `data.frame` is tedious and error-prone. `trendseries` bridges
this gap, letting you work directly with data frames and `tidyverse`
tools like `dplyr` and `ggplot2`.

This package was designed with economic time series in mind. It includes
methods commonly used in economics (e.g., Hodrick-Prescott filter) as
well as general-purpose smoothing methods (e.g., LOESS, moving
averages).

# Getting started

`trendseries` revolves around a general wrapper function
`augment_trends` that adds new columns to a data frame.

```{r}
library(trendseries)
library(dplyr)
library(ggplot2)

theme_series <- theme_minimal(paper = "#fefefe") +
  theme(
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    # Use colors
    palette.colour.discrete = c(
        "#2c3e50",
        "#e74c3c",
        "#f39c12",
        "#1abc9c",
        "#9b59b6"
    )
  )
```

This dataset contains monthly electric consumption for Brazilian
households from 1979 to 2025.

```{r}
head(electric)

ggplot(electric, aes(date, consumption)) +
  geom_line() +
  theme_series
```

To find the trend in data we use `augment_trends` and select a method:
in this case, STL (see `stats::stl`). The `date_col` (default `"date"`)
and `value_col` (default `"value"`) arguments identify the relevant
columns.

```{r}
elec_trend <- augment_trends(
  electric,
  value_col = "consumption",
  methods = "stl"
)

head(elec_trend)
```

`augment_trends` will do its best to try to infer the appropriate
frequency but this information can be supplied manually.

```{r, eval = FALSE}
elec_trend <- augment_trends(
  electric,
  date_col = "date",
  value_col = "consumption",
  methods = "stl",
  frequency = 12
)
```

There are two options to visualize the data using `ggplot2`. The first
is to convert the data to a "long" format.

```{r}
# Prepare data for plotting
plot_data <- elec_trend |>
  tidyr::pivot_longer(
    cols = -date,
    names_to = "series",
    values_to = "value"
  ) |>
  mutate(
    series = case_when(
      series == "consumption" ~ "Data (original)",
      series == "trend_stl" ~ "Trend (STL)"
    )
  )

# Create the plot
ggplot(plot_data, aes(x = date, y = value, color = series)) +
  geom_line(linewidth = 0.8) +
  labs(
    title = "Residential Electricity Consumption",
    x = NULL,
    y = "Electric Consumption (GWh)",
    color = NULL
  ) +
  theme_series
```

An alternative is to add the trend as an additional `geom_line` layer.
This is quicker but does not produce a color legend.

```{r}
ggplot(elec_trend, aes(x = date)) +
  geom_line(
    aes(y = consumption),
    linewidth = 0.8,
    alpha = 0.5,
    color = "#024873FF") +
  geom_line(
    aes(y = trend_stl),
    linewidth = 1,
    color = "#024873FF") +
  labs(
    title = "Residential Electricity Consumption",
    subtitle = "Decomposition using an STL trend",
    x = NULL,
    y = "Electric Consumption (GWh)",
    color = NULL
  ) +
  theme_series
```

## Multiple time series

`trendseries` makes it easy to compute trends across several series. One
or more grouping columns can be selected through the `group_cols`
argument.

```{r}
cities <- c("Houston", "San Antonio", "Dallas", "Austin")

txtrend <- txhousing |>
  filter(city %in% cities, year >= 2010) |>
  mutate(date = lubridate::make_date(year, month, 1)) |>
  augment_trends(
    value_col = "median",
    group_cols = "city"
  )

ggplot(txtrend, aes(date)) +
  geom_line(aes(y = median), alpha = 0.5, color = "#024873FF") +
  geom_line(aes(y = trend_stl), color = "#024873FF") +
  facet_wrap(vars(city)) +
  theme_series
```

## Multiple trend methods

`trendseries` also facilitates extracting trends with different methods
simultaneously. The next example uses a chained index of retail sales of
automotive fuel in the UK. The original data comes from the UK Office
for National Statistics.

```{r}
ggplot(retail_autofuel, aes(date, value)) +
  geom_line(lwd = 0.8, color = "#024873FF") +
  theme_series
```

This example also highlights how `augment_trends` fits neatly in a pipe
workflow.

```{r compare-methods}
fuel_trends <- retail_autofuel |>
  filter(date >= as.Date("2012-01-01")) |>
  augment_trends(
    methods = c("stl", "hp", "loess")
  )

comparison_plot <- fuel_trends |>
  tidyr::pivot_longer(
    cols = c(value, starts_with("trend_")),
    names_to = "method",
  ) |>
  mutate(
    method = case_when(
      method == "value" ~ "Data (original)",
      method == "trend_hp" ~ "HP Filter",
      method == "trend_stl" ~ "STL",
      method == "trend_loess" ~ "LOESS"
    )
  )

ggplot(comparison_plot, aes(x = date, y = value, color = method)) +
  geom_line(linewidth = 0.8) +
  labs(
    title = "Comparing Different Trend Extraction Methods",
    subtitle = "Same data, different methods",
    x = "Date",
    y = "Retail Sales Index",
    color = "Method"
  ) +
  theme_series
```

# Finer control

Filter-extraction methods are spread across different packages and thus
use different conventions for parameter names. `trendseries` tries to
simplify this when possible. Methods like moving averages and moving
medians have a shared "window" argument that defines the size of the
rolling window.

```{r}
elec_trends <- electric |>
  rename(value = consumption) |>
  # window controls the s.window argument by default
  augment_trends(methods = "stl", window = 17) |>
  # Creates a 11-month moving median
  augment_trends(methods = "median", window = 11) |>
  # Creates a (centered) 5-month moving average
  augment_trends(methods = "ma", window = 5) |>
  # Creates a (centered) 2x12 moving average
  augment_trends(methods = "ma", window = 12)
```

```{r echo = FALSE}
comparison_plot <- elec_trends |>
  tidyr::pivot_longer(
    cols = c(value, starts_with("trend_")),
    names_to = "method",
  ) |>
  mutate(
    method = case_when(
      method == "value" ~ "Data (original)",
      method == "trend_median" ~ "Median",
      method == "trend_stl" ~ "STL",
      method == "trend_ma" ~ "MA (5)",
      method == "trend_ma_1" ~ "MA (2x12)"
    )
  ) |>
  filter(date >= as.Date("2018-01-01"))

ggplot(comparison_plot, aes(x = date, y = value, color = method)) +
  geom_line(linewidth = 0.8) +
  labs(
    title = "Comparing Different Trend Extraction Methods",
    subtitle = "Same data, different methods",
    x = "Date",
    y = "Retail Sales Index",
    color = "Method"
  ) +
  theme_series
```

`trendseries` simplifies trend extraction at the cost of some
precision. For instance, `stats::stl` has both a `t.window` and an
`s.window` argument. The `window` argument in `trendseries` controls
`s.window` by default — an opinionated choice that favors simplicity.

# How does `trendseries` compare to the traditional workflow?

The usual workflow involves:

1. Converting pairs of `date` and `numeric` columns to `ts` objects. This usually means manually inputting both `frequency` and `start` parameters.
2. Applying a filter to the `ts` object.
3. Converting the `ts` object back to the original `data.frame`.

This can be cumbersome, especially when working with multiple series or
grouped data. Merging back the results with the original data can also
be error-prone due to misalignment of dates and additional `NA` values
introduced by some filters.

For instance, consider estimating a HP filter on `gdp_construction`. The first step requires converting the data frame to a `ts` object, manually inputting both `frequency` and `start` parameters.

```{r}
gdp_cons <- ts(
  gdp_construction$index,
  frequency = 4,
  start = c(1996, 1)
)

# Or, using lubridate to extract year and month
gdp_cons <- ts(
  gdp_construction$index,
  frequency = 4,
  start = c(lubridate::year(min(gdp_construction$date)),
            lubridate::quarter(min(gdp_construction$date)))
)
```

Then applying the HP filter using the `mFilter` package.

```{r}
gdp_trend_hp <- mFilter::hpfilter(gdp_cons, 1600)
```

And finally, converting it back to a `data.frame` and merging it with
the original data.

```{r}
# Convert back to data frame using tsbox
trend_df <- tsbox::ts_df(gdp_trend_hp$trend)
names(trend_df) <- c("date", "trend_hp")

# Join with original data
gdp_manual <- left_join(gdp_construction, trend_df, by = "date")
```

## What are the alternatives to `trendseries`?

The closest alternative to `trendseries` is the `tsibble`/`fable`
ecosystem, which provides a `model()` function for applying models —
including some trend extraction methods — to grouped time series. Like
`trendseries`, these packages integrate well with `tidyverse` tools and
pipes.

However, `fable` was designed primarily for forecasting, which means its
trend extraction capabilities are more limited. They also lack some
popular methods commonly used by economists, such as the HP filter and
the Hamilton filter.

Additionally, these packages require using the `tsibble` data structure,
which pulls users away from the familiar `data.frame`/`tibble` format.
For users working with just a few time series and relying on R's
built-in `ts` functionality, the `tsibble` structure can feel
unnecessarily complex.

## Acknowledgements

This package was inspired by the need for a simpler workflow for trend
extraction in R. It builds upon many existing packages, including:

-   `mFilter` for economic filters.
-   `hpfilter` for Hodrick-Prescott filtering.
-   `tsbox` for time series conversions.

## Getting Help

If you run into issues:

-   Check the documentation: `?augment_trends`
-   View examples: `example(augment_trends)`
-   Read other vignettes: `vignette(package = "trendseries")`
-   Report bugs: GitHub issues
