Alcohol Counts

Counts involving alcohol are complicated due to a high rate of missing BAC values. FARS and GESCRSS include the variables alc_res (BAC values) and dr_drink (police-reported alcohol involvement). NHTSA also uses multiple imputation (MI) to estimate missing BAC values in FARS records (not GES/CRSS records). Traffic Safety Facts uses MI to report the number of fatalities from crashes involving a driver with a BAC of .08 g/dL or higher. Using alc_res and dr_drink produce smaller counts than those produced via MI.

rfars::counts() counts alcohol-involved crashes, fatalities and injuries using alc_res and dr_drink. However, the flat object returned by get_fars() includes the MI variables a1:a10 from the Midrvacc file, and p1:p10 from the Miper file. Below we explore the differences and show how to use the MI variables to replicate NHTSA’s reporting. See Appendix E of the FARS Analytical User’s Manual for more information on these files.

library(dplyr)
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.4.3

library(tidyr)
library(rfars)

Here we use rfars::counts() to count alcohol-involved crashes in 2023:

myFARS <- get_fars(years = 2023, proceed = TRUE)
counts(myFARS, involved = 'alcohol') %>% knitr::kable(format = "html")

year	what	states	region	urb	who	involved	n
2020	crashes	all	all	all	all	alcohol	259
2021	crashes	all	all	all	all	alcohol	267
2022	crashes	all	all	all	all	alcohol	259
2023	crashes	all	all	all	all	alcohol	235
2024	crashes	all	all	all	all	alcohol	8990

## Note: rfars::counts() uses the variables alc_res and dr_drink to determine alcohol involvement. NHTSA reports counts using multiple imputation to estimate missing BAC values. See vignette('Alcohol Counts', package = 'rfars') for more information.

And here we use rfars::counts() to count the number of alcohol-involved fatalities in the same year:

counts(
  df = myFARS, 
  what = "fatalities",
  involved = 'alcohol'
) %>%
  knitr::kable(format = "html")

year	what	states	region	urb	who	involved	n
2020	fatalities	all	all	all	all	alcohol	282
2021	fatalities	all	all	all	all	alcohol	282
2022	fatalities	all	all	all	all	alcohol	274
2023	fatalities	all	all	all	all	alcohol	254
2024	fatalities	all	all	all	all	alcohol	9979

## Note: rfars::counts() uses the variables alc_res and dr_drink to determine alcohol involvement. NHTSA reports counts using multiple imputation to estimate missing BAC values. See vignette('Alcohol Counts', package = 'rfars') for more information.

The MI process produces 10 separate BAC estimates. The Miper file provides p1:p10, person level BAC estimates for each driver and non-occupant in the FARS Person data file. The Midrvacc file provides variables a1:a10, crash level BAC estimates derived from driver records in the Miper file.

To replicate the estimated number of fatalities reported in Traffic Safety Facts, we implement the procedure below.

Step 1. Start with the flat object and filter to fatalities. Then generate 30 indicator variables, 10 FPC_i, 10 SPC_i, and 10 TPC_i variables. i corresponds to the 10 a_i and p_i variables. FPC indicates that BAC = 0.00, SPC that BAC is between 0.01 and 0.07, and TPC that BAC is >= 0.08. So if a_1 = 5 (representing a BAC of 0.05), then FPC_1 = 0, SPC_1 = 1, and TPC_1 = 0.

temp <- myFARS$flat %>% 
  select(year:per_no, age, sex, per_typ, inj_sev, alc_res, dr_drink, a1:a10) %>%
  filter(inj_sev == "Fatal Injury (K)")

for(i in 1:10) {
  imputation_col <- paste0("a", i)
  temp[[paste0("FPC", i)]] <- ifelse(temp[[imputation_col]] == 0, 1, 0)  # BAC = 0.00
  temp[[paste0("SPC", i)]] <- ifelse(temp[[imputation_col]] >= 1 & temp[[imputation_col]] <= 7, 1, 0)  # BAC = 0.01-0.07
  temp[[paste0("TPC", i)]] <- ifelse(temp[[imputation_col]] >= 8, 1, 0)  # BAC = 0.08+
}

This produces the table below (transposed for easier viewing).

temp %>% 
  select(st_case, a1:a10, starts_with("FPC"), starts_with("SPC"), starts_with("TPC")) %>% 
  slice(1:10) %>% 
  t() %>%
  knitr::kable(format = "html")

st_case	510001	510002	510003	510004	510005	510006	510007	510008	510009	510011
a1	0	0	23	28	21	0	0	6	23	0
a2	0	0	0	28	21	0	0	6	23	0
a3	0	0	5	28	21	0	0	6	23	0
a4	0	0	0	28	21	0	0	6	23	0
a5	0	0	13	28	21	0	0	6	23	0
a6	0	0	21	28	21	0	0	6	23	0
a7	21	0	28	28	21	0	0	6	23	0
a8	0	0	0	28	21	0	0	6	23	0
a9	0	5	23	28	21	0	0	6	23	0
a10	0	0	0	28	21	0	0	6	23	0
FPC1	1	1	0	0	0	1	1	0	0	1
FPC2	1	1	1	0	0	1	1	0	0	1
FPC3	1	1	0	0	0	1	1	0	0	1
FPC4	1	1	1	0	0	1	1	0	0	1
FPC5	1	1	0	0	0	1	1	0	0	1
FPC6	1	1	0	0	0	1	1	0	0	1
FPC7	0	1	0	0	0	1	1	0	0	1
FPC8	1	1	1	0	0	1	1	0	0	1
FPC9	1	0	0	0	0	1	1	0	0	1
FPC10	1	1	1	0	0	1	1	0	0	1
SPC1	0	0	0	0	0	0	0	1	0	0
SPC2	0	0	0	0	0	0	0	1	0	0
SPC3	0	0	1	0	0	0	0	1	0	0
SPC4	0	0	0	0	0	0	0	1	0	0
SPC5	0	0	0	0	0	0	0	1	0	0
SPC6	0	0	0	0	0	0	0	1	0	0
SPC7	0	0	0	0	0	0	0	1	0	0
SPC8	0	0	0	0	0	0	0	1	0	0
SPC9	0	1	0	0	0	0	0	1	0	0
SPC10	0	0	0	0	0	0	0	1	0	0
TPC1	0	0	1	1	1	0	0	0	1	0
TPC2	0	0	0	1	1	0	0	0	1	0
TPC3	0	0	0	1	1	0	0	0	1	0
TPC4	0	0	0	1	1	0	0	0	1	0
TPC5	0	0	1	1	1	0	0	0	1	0
TPC6	0	0	1	1	1	0	0	0	1	0
TPC7	1	0	1	1	1	0	0	0	1	0
TPC8	0	0	0	1	1	0	0	0	1	0
TPC9	0	0	1	1	1	0	0	0	1	0
TPC10	0	0	0	1	1	0	0	0	1	0

We can examine one crash to see the relation between the a and FPC, SPS, and TPC variables. The table below shows the 10 iterations for a, and the corresponding indicator variables.

temp %>% 
  slice(1) %>% 
  select(st_case, a1:a10, starts_with("FPC"), starts_with("SPC"), starts_with("TPC")) %>%
  pivot_longer(-1) %>%
  mutate(
    iter = gsub("\\D", "", name),
    name = gsub("[^A-Za-z]", "", name)
  ) %>%
  pivot_wider() %>%
  knitr::kable(format = "html")

st_case	iter	a	FPC	TPC
510001	1	0	1	0
510001	2	0	1	0
510001	3	0	1	0
510001	4	0	1	0
510001	5	0	1	0
510001	6	0	1	0
510001	7	21	0	1
510001	8	0	1	0
510001	9	0	1	0
510001	10	0	1	0

Step 2. Sum the indicator variables in each iteration.

case_results <- list()

for(i in 1:10) {
  fpc_col <- paste0("FPC", i)
  spc_col <- paste0("SPC", i)
  tpc_col <- paste0("TPC", i)
  
  case_results[[i]] <-
    temp %>%
    summarise(
      TOTAL = n(),
      !!paste0("FSBAC", i) := sum(!!sym(fpc_col), na.rm = TRUE),
      !!paste0("SSBAC", i) := sum(!!sym(spc_col), na.rm = TRUE),
      !!paste0("TSBAC", i) := sum(!!sym(tpc_col), na.rm = TRUE),
      .groups = 'drop'
    )
}

This produces 10 estimates of the number of fatalities from crashes where BAC = 0 (FSBAC), where BAC was between 0.01 and 0.07 (SSBAC), and where BAC >= 0.08 (TSBAC). The table below shows all of these estimates.

bind_rows(
  data.frame(case_results[[1]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=1),
  data.frame(case_results[[2]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=2),
  data.frame(case_results[[3]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=3),
  data.frame(case_results[[4]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=4),
  data.frame(case_results[[5]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=5),
  data.frame(case_results[[6]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=6),
  data.frame(case_results[[7]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=7),
  data.frame(case_results[[8]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=8),
  data.frame(case_results[[9]])  %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=9),
  data.frame(case_results[[10]]) %>% select(TOTAL, FSBAC=2, SSBAC=3, TSBAC=4) %>% mutate(iter=10)
  ) %>%
  knitr::kable(format = "html")

TOTAL	FSBAC	SSBAC	TSBAC	iter
42997	27704	2180	13030	1
42997	27698	2208	13008	2
42997	27624	2246	13044	3
42997	27784	2218	12912	4
42997	27326	2237	13351	5
42997	27637	2240	13037	6
42997	27758	2159	12997	7
42997	27669	2198	13047	8
42997	27820	2273	12821	9
42997	27721	2213	12980	10

Step 3. Take the average of each set of estimates.

calc <- case_results[[1]]

for(i in 2:10) {
  calc <- calc %>% bind_cols(case_results[[i]] %>% select(-TOTAL))
}

calc <-
  calc %>%
  rowwise() %>%
  mutate(
    SBAC0 = round(mean(c_across(starts_with("FSBAC")), na.rm = TRUE)), # BAC 0.00
    SBAC1 = round(mean(c_across(starts_with("SSBAC")), na.rm = TRUE)), # BAC 0.01-0.07
    SBAC2 = round(mean(c_across(starts_with("TSBAC")), na.rm = TRUE))  # BAC 0.08+
  ) %>%
  ungroup()

This gives us our answer. The values below give the final estimates of the number of fatalities from crashes where BAC = 0 (SBAC0), where BAC was between 0.01 and 0.07 (SBAC1), and where BAC >= 0.08 (SBAC2). The value for SBAC2 is 12,429, which matches Traffic Safety Facts for 2023.

select(calc, SBAC0:SBAC2) %>% knitr::kable(format = "html")

SBAC0	SBAC1	SBAC2
27674	2217	13023

Shortcut. We can do this more directly if we’re only interested in the number of fatalities from crashes with alcohol-impaired drivers:

x <-
  myFARS$flat %>% 
  select(year:per_no, age, sex, per_typ, inj_sev, alc_res, dr_drink, a1:a10) %>%
  filter(inj_sev == "Fatal Injury (K)") %>%
  mutate_at(paste0("a", 1:10), function(x) 1*(x>=8)) %>%
  group_by(year) %>%
  summarize_at(paste0("a", 1:10), sum, na.rm=T) %>%
  rowwise() %>%
  mutate(a = round(mean(c_across(a1:a10)))) 

x$a

## [1]   282   284   289   264 11904

Please see Transitioning to Multiple Imputation – A New Method to Impute Missing BAC values in FARS for more guidance on generating other counts using the MI data.