Alcohol Counts

Counts involving alcohol are complicated due to a high rate of missing BAC values. FARS and GESCRSS include the variables alc_res (BAC values) and dr_drink (police-reported alcohol involvement). NHTSA also uses multiple imputation (MI) to estimate missing BAC values in FARS records (not GES/CRSS records). Traffic Safety Facts uses MI to report the number of fatalities from crashes involving a driver with a BAC of .08 g/dL or higher. Using alc_res and dr_drink produce smaller counts than those produced via MI.

rfars::counts() counts alcohol-involved crashes, fatalities and injuries using alc_res and dr_drink. However, the flat object returned by get_fars() includes the MI variables a1:a10 from the Midrvacc file, and p1:p10 from the Miper file. Below we explore the differences and show how to use the MI variables to replicate NHTSA’s reporting. See Appendix E of the FARS Analytical User’s Manual for more information on these files.

library(dplyr)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
library(tidyr)
library(rfars)

Here we use rfars::counts() to count alcohol-involved crashes in 2023:

myFARS <- get_fars(years = 2023, proceed = TRUE)
counts(myFARS, involved = 'alcohol') %>% knitr::kable(format = "html")
year what states region urb who involved n
2020 crashes all all all all alcohol 259
2021 crashes all all all all alcohol 267
2022 crashes all all all all alcohol 259
2023 crashes all all all all alcohol 235
2024 crashes all all all all alcohol 8990
## Note: rfars::counts() uses the variables alc_res and dr_drink to determine alcohol involvement. NHTSA reports counts using multiple imputation to estimate missing BAC values. See vignette('Alcohol Counts', package = 'rfars') for more information.

And here we use rfars::counts() to count the number of alcohol-involved fatalities in the same year:

counts(
  df = myFARS, 
  what = "fatalities",
  involved = 'alcohol'
) %>%
  knitr::kable(format = "html")
year what states region urb who involved n
2020 fatalities all all all all alcohol 282
2021 fatalities all all all all alcohol 282
2022 fatalities all all all all alcohol 274
2023 fatalities all all all all alcohol 254
2024 fatalities all all all all alcohol 9979
## Note: rfars::counts() uses the variables alc_res and dr_drink to determine alcohol involvement. NHTSA reports counts using multiple imputation to estimate missing BAC values. See vignette('Alcohol Counts', package = 'rfars') for more information.

The MI process produces 10 separate BAC estimates. The Miper file provides p1:p10, person level BAC estimates for each driver and non-occupant in the FARS Person data file. The Midrvacc file provides variables a1:a10, crash level BAC estimates derived from driver records in the Miper file.

To replicate the estimated number of fatalities reported in Traffic Safety Facts, we implement the procedure below.

Step 1. Start with the flat object and filter to fatalities. Then generate 30 indicator variables, 10 FPC_i, 10 SPC_i, and 10 TPC_i variables. i corresponds to the 10 a_i and p_i variables. FPC indicates that BAC = 0.00, SPC that BAC is between 0.01 and 0.07, and TPC that BAC is >= 0.08. So if a_1 = 5 (representing a BAC of 0.05), then FPC_1 = 0, SPC_1 = 1, and TPC_1 = 0.

temp <- myFARS$flat %>% 
  select(year:per_no, age, sex, per_typ, inj_sev, alc_res, dr_drink, a1:a10) %>%
  filter(inj_sev == "Fatal Injury (K)")

for(i in 1:10) {
  imputation_col <- paste0("a", i)
  temp[[paste0("FPC", i)]] <- ifelse(temp[[imputation_col]] == 0, 1, 0)  # BAC = 0.00
  temp[[paste0("SPC", i)]] <- ifelse(temp[[imputation_col]] >= 1 & temp[[imputation_col]] <= 7, 1, 0)  # BAC = 0.01-0.07
  temp[[paste0("TPC", i)]] <- ifelse(temp[[imputation_col]] >= 8, 1, 0)  # BAC = 0.08+
}

This produces the table below (transposed for easier viewing).

temp %>% 
  select(st_case, a1:a10, starts_with("FPC"), starts_with("SPC"), starts_with("TPC")) %>% 
  slice(1:10) %>% 
  t() %>%
  knitr::kable(format = "html")
st_case 510001 510002 510003 510004 510005 510006 510007 510008 510009 510011
a1 0 0 23 28 21 0 0 6 23 0
a2 0 0 0 28 21 0 0 6 23 0
a3 0 0 5 28 21 0 0 6 23 0
a4 0 0 0 28 21 0 0 6 23 0
a5 0 0 13 28 21 0 0 6 23 0
a6 0 0 21 28 21 0 0 6 23 0
a7 21 0 28 28 21 0 0 6 23 0
a8 0 0 0 28 21 0 0 6 23 0
a9 0 5 23 28 21 0 0 6 23 0
a10 0 0 0 28 21 0 0 6 23 0
FPC1 1 1 0 0 0 1 1 0 0 1
FPC2 1 1 1 0 0 1 1 0 0 1
FPC3 1 1 0 0 0 1 1 0 0 1
FPC4 1 1 1 0 0 1 1 0 0 1
FPC5 1 1 0 0 0 1 1 0 0 1
FPC6 1 1 0 0 0 1 1 0 0 1
FPC7 0 1 0 0 0 1 1 0 0 1
FPC8 1 1 1 0 0 1 1 0 0 1
FPC9 1 0 0 0 0 1 1 0 0 1
FPC10 1 1 1 0 0 1 1 0 0 1
SPC1 0 0 0 0 0 0 0 1 0 0
SPC2 0 0 0 0 0 0 0 1 0 0
SPC3 0 0 1 0 0 0 0 1 0 0
SPC4 0 0 0 0 0 0 0 1 0 0
SPC5 0 0 0 0 0 0 0 1 0 0
SPC6 0 0 0 0 0 0 0 1 0 0
SPC7 0 0 0 0 0 0 0 1 0 0
SPC8 0 0 0 0 0 0 0 1 0 0
SPC9 0 1 0 0 0 0 0 1 0 0
SPC10 0 0 0 0 0 0 0 1 0 0
TPC1 0 0 1 1 1 0 0 0 1 0
TPC2 0 0 0 1 1 0 0 0 1 0
TPC3 0 0 0 1 1 0 0 0 1 0
TPC4 0 0 0 1 1 0 0 0 1 0
TPC5 0 0 1 1 1 0 0 0 1 0
TPC6 0 0 1 1 1 0 0 0 1 0
TPC7 1 0 1 1 1 0 0 0 1 0
TPC8 0 0 0 1 1 0 0 0 1 0
TPC9 0 0 1 1 1 0 0 0 1 0
TPC10 0 0 0 1 1 0 0 0 1 0

We can examine one crash to see the relation between the a and FPC, SPS, and TPC variables. The table below shows the 10 iterations for a, and the corresponding indicator variables.

temp %>% 
  slice(1) %>% 
  select(st_case, a1:a10, starts_with("FPC"), starts_with("SPC"), starts_with("TPC")) %>%
  pivot_longer(-1) %>%
  mutate(
    iter = gsub("\\D", "", name),
    name = gsub("[^A-Za-z]", "", name)
  ) %>%
  pivot_wider() %>%
  knitr::kable(format = "html")
st_case iter a FPC SPC TPC
510001 1 0 1 0 0
510001 2 0 1 0 0
510001 3 0 1 0 0
510001 4 0 1 0 0
510001 5 0 1 0 0
510001 6 0 1 0 0
510001 7 21 0 0 1
510001 8 0 1 0 0
510001 9 0 1 0 0
510001 10 0 1 0 0

Step 2. Sum the indicator variables in each iteration.

case_results <- list()

for(i in 1:10) {
  fpc_col <- paste0("FPC", i)
  spc_col <- paste0("SPC", i)
  tpc_col <- paste0("TPC", i)
  
  case_results[[i]] <-
    temp %>%
    summarise(
      TOTAL = n(),
      !!paste0("FSBAC", i) := sum(!!sym(fpc_col), na.rm = TRUE),
      !!paste0("SSBAC", i) := sum(!!sym(spc_col), na.rm = TRUE),
      !!paste0("TSBAC", i) := sum(!!sym(tpc_col), na.rm = TRUE),
      .groups = 'drop'
    )
}

This produces 10 estimates of the number of fatalities from crashes where BAC = 0 (FSBAC), where BAC was between 0.01 and 0.07 (SSBAC), and where BAC >= 0.08 (TSBAC). The table below shows all of these estimates.

bind_rows(
  data.frame(case_results[[1]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=1),
  data.frame(case_results[[2]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=2),
  data.frame(case_results[[3]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=3),
  data.frame(case_results[[4]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=4),
  data.frame(case_results[[5]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=5),
  data.frame(case_results[[6]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=6),
  data.frame(case_results[[7]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=7),
  data.frame(case_results[[8]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=8),
  data.frame(case_results[[9]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=9),
  data.frame(case_results[[10]]) %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=10)
  ) %>%
  knitr::kable(format = "html")
TOTAL FSBAC SSBAC TSBAC iter
42997 27704 2180 13030 1
42997 27698 2208 13008 2
42997 27624 2246 13044 3
42997 27784 2218 12912 4
42997 27326 2237 13351 5
42997 27637 2240 13037 6
42997 27758 2159 12997 7
42997 27669 2198 13047 8
42997 27820 2273 12821 9
42997 27721 2213 12980 10

Step 3. Take the average of each set of estimates.

calc <- case_results[[1]]

for(i in 2:10) {
  calc <- calc %>% bind_cols(case_results[[i]] %>% select(-TOTAL))
}

calc <-
  calc %>%
  rowwise() %>%
  mutate(
    SBAC0 = round(mean(c_across(starts_with("FSBAC")), na.rm = TRUE)), # BAC 0.00
    SBAC1 = round(mean(c_across(starts_with("SSBAC")), na.rm = TRUE)), # BAC 0.01-0.07
    SBAC2 = round(mean(c_across(starts_with("TSBAC")), na.rm = TRUE))  # BAC 0.08+
  ) %>%
  ungroup()

This gives us our answer. The values below give the final estimates of the number of fatalities from crashes where BAC = 0 (SBAC0), where BAC was between 0.01 and 0.07 (SBAC1), and where BAC >= 0.08 (SBAC2). The value for SBAC2 is 12,429, which matches Traffic Safety Facts for 2023.

select(calc, SBAC0:SBAC2) %>% knitr::kable(format = "html")
SBAC0 SBAC1 SBAC2
27674 2217 13023

Shortcut. We can do this more directly if we’re only interested in the number of fatalities from crashes with alcohol-impaired drivers:

x <-
  myFARS$flat %>% 
  select(year:per_no, age, sex, per_typ, inj_sev, alc_res, dr_drink, a1:a10) %>%
  filter(inj_sev == "Fatal Injury (K)") %>%
  mutate_at(paste0("a", 1:10), function(x) 1*(x>=8)) %>%
  group_by(year) %>%
  summarize_at(paste0("a", 1:10), sum, na.rm=T) %>%
  rowwise() %>%
  mutate(a = round(mean(c_across(a1:a10)))) 

x$a
## [1]   282   284   289   264 11904

Please see Transitioning to Multiple Imputation – A New Method to Impute Missing BAC values in FARS for more guidance on generating other counts using the MI data.