---
title: "Getting started with the R-package hystReet"
author: "Johannes Friedrich"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    fig_caption: yes
    fig_height: 7
    fig_width: 7
    number_sections: yes
    toc: yes
vignette: >
  %\VignetteIndexEntry{Getting started with the R-package hystReet}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, 
                      eval = FALSE, 
                      fig.align = "center")
```


# Introduction

[hystreet](https://hystreet.com) is a company collecting pedestrains in german cities. After registering you can download the data for free from 19 cities.


# Installation

Until now the package is not on CRAN but you can download it via GitHub with the following command:

```{r message=FALSE, warning=FALSE, include=FALSE, eval = TRUE}
library(lubridate)
library(scales)
library(ggplot2)
library(dplyr)
library(hystReet)
```


# API Keys

To use this package, you will first need to get a hystreet API key. To do so, you first need to set up an account on [https://hystreet.com/](https://hystreet.com/). After that you can request an API key via [e-mail](mailto:info@hystreet.com). Once your request has been granted, you will find you key in your hystreet account profile.

Now you have three options:

(1)
Once you have your key, save it as an environment variable for the current session by running the following:

```{r}
Sys.setenv(HYSTREET_API_TOKEN = "PASTE YOUR API TOKEN HERE")
```


(2)
Alternatively, you can set it permanently with the help of `usethis::edit_r_environ()` by adding the line to your `.Renviron`: 

```
HYSTREET_API_TOKEN = PASTE YOUR API TOKEN HERE
```

(3)
If you don't want to save it here, you can input it in each function using the `API_token` parameter.

# Usage

Function name       | Description                                        | Example
--------------------|----------------------------------------------------| -------
get_hystreet_stats() | request common statistics about the hystreet project | get_hystreet_stats() 
get_hystreet_locations() | request all available locations | get_hystreet_locations() 
get_hystreet_station_data() | request data from a stations  | get_hystreet_station_data(71)
set_hystreet_token() | set your API token | set_hystreet_token(123456789)

## Load some statistics

The function 'get_hystreet_stats()' summaries the number of available stations and the sum of all counted pedestrians.

```{r}
library(hystReet)

stats <- get_hystreet_stats()
```

## Request all stations

The function 'get_hystreet_locations()' requests all available stations of the project.

```{r}
locations <- get_hystreet_locations()
```

```{r, eval = TRUE, echo=FALSE}
knitr::kable(
  locations[1:10,],
  format = "html"
)
```


## Request data from a specific station

The (probably) most interesting function is 'get_hystreet_station_data()'. With the hystreetID it is possible to request a specific station. By default, all the data from the current day are received.
With the 'query' argument it is possible to set the received data more precise: 

* from: datetime of earliest measurement (default: today 00:00:00:): e.g. "2021-10-01 12:00:00" or "2021-10-01"
* to : datetime of latest measurement (default: today 23:59:59): e.g. "2021-12-01 12:00:00" or "2021-12-01"
* resolution: Resolution for the measurement grouping (default: hour): "day", "hour", "month", "week"

```{r}
location_71 <- get_hystreet_station_data(
  hystreetId = 71,
  query = list(from = "2021-12-01", to = "2022-01-01", resolution = "day"))
```


# Some ideas to visualise the data

Let´s see if we can see the most frequent days before Christmas ... I think it could be Saturday ;-). Also nice to see the 25th and 26th of December ... holidays in Germany :-).

```{r}
location_71 <- get_hystreet_station_data(
    hystreetId = 71, 
    query = list(from = "2021-12-01", to = "2022-01-01", resolution = "hour"))
```

```{r, eval = TRUE}
ggplot(location_71$measurements, aes(x = timestamp, y = pedestrians_count, colour = weekdays(timestamp))) +
  geom_path(group = 1) +
  scale_x_datetime(date_breaks = "7 days") +
  scale_x_datetime(labels = date_format("%d.%m.%Y")) +
  labs(x = "Date",
       y = "Pedestrians",
       colour = "Day")
```

## Compare different stations

Now let´s compare different stations:

1) Load the data

```{r}
location_73 <- get_hystreet_station_data(
    hystreetId = 73, 
    query = list(from = "2022-01-01", to = "2022-01-31", resolution = "day"))$measurements %>% 
  select(pedestrians_count, timestamp) %>% 
  mutate(station = 73)

location_74 <- get_hystreet_station_data(
    hystreetId = 74, 
    query = list(from = "2022-01-01", to = "2019-01-22", resolution = "day"))$measurements %>% 
    select(pedestrians_count, timestamp) %>% 
  mutate(station = 74)

data_73_74 <- bind_rows(location_73, location_74)
```

```{r,eval =TRUE}
ggplot(data_73_74, aes(x = timestamp, y = pedestrians_count, fill = weekdays(timestamp))) +
  geom_bar(stat = "identity") +
  scale_x_datetime(labels = date_format("%d.%m.%Y")) +
  facet_wrap(~station, scales = "free_y") +
  theme(legend.position = "bottom",
        axis.text.x = element_text(angle = 45, hjust = 1))
```

## Highest ratio (pedestrians/day)

Now a little bit of big data analysis. Let´s find the station with the highest pedestrians per day ratio:

```{r message=FALSE, warning=FALSE}
hystreet_ids <- get_hystreet_locations()

all_data <- lapply(hystreet_ids[,"id"], function(ID){
  temp <- get_hystreet_station_data(
    hystreetId = ID,
    query = list(from = "2021-01-01", to = "2021-12-31", resolution = "day"))
  
    lifetime_count <- temp$statistics$timerange_count
    days_counted <- as.integer(ymd(temp$metadata$measured_to)  - ymd(temp$metadata$measured_from))
    
    return(data.frame(
      id = ID,
      station = paste0(temp$city, " (",temp$name,")"),
      ratio = lifetime_count/days_counted))
  
})

ratio <- bind_rows(all_data)
```

What stations have the highest ratio?

```{r, eval = TRUE}
ratio %>% 
  top_n(5, ratio) %>% 
  arrange(desc(ratio))
```

Now let´s visualise the top 10 cities:

```{r, eval = TRUE}
ggplot(ratio %>% 
         top_n(10,ratio), aes(station, ratio)) +
  geom_bar(stat = "identity") +
  labs(x = "City",
       y = "Pedestrians per day") + 
    theme(legend.position = "bottom",
        axis.text.x = element_text(angle = 45, hjust = 1))
```

## Corona effects 

The Hystreet-API is a great source of analysing the social effects of the Corona pandemic in 2020.
Let´s collect all german stations since March 2020 and analyse the pedestrian count until 10th June 2020.

```{r corona_effects_data}
data <- lapply(hystreet_ids[,"id"], function(ID){

    temp <- get_hystreet_station_data(
        hystreetId = ID,
        query = list(from = "2020-03-01", to = "2020-06-10", resolution = "day")
    )
    
    return(data.frame(
    name = temp$name,
    city = temp$city,
    timestamp = format(as.POSIXct(temp$measurements$timestamp), "%Y-%m-%d"),
    pedestrians_count = temp$measurements$pedestrians_count,
    legend = paste(temp$city, temp$name, sep = " - ")
  ))
   
}) 

corona_data_all <- bind_rows(data)
```

```{r corona_effects_plot, eval = TRUE}
corona_data_all %>% 
ggplot(aes(ymd(timestamp), pedestrians_count, colour = legend)) +
geom_line(alpha = 0.2) +
    scale_x_date(labels = date_format("%d.%m.%Y"),
               breaks = date_breaks("7 days")
     ) +
  theme(legend.position = "none",
        legend.title = element_text("Legende"),
         axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(x = "Date",
       y = "Persons/Day")
```