The drhutools package is designed to support political science research and academic tasks by providing a set of practical tools. The functions are developed to streamline routine data analysis and visualization while accommodating domain-specific requirements.

Installation

You can install the stable version of drhutools from CRAN or the developed version from GitHub using the following commands:

# Install the stable version
install.packages("drhutools")

# Install drhutools from GitHub
remotes::install_github("yuedeng/drhutools")

Efficient File Organization with `folderSystem`

A well-organized folder system enhances research efficiency and ensures continuity, allowing you to easily resume work at any stage of the project. The folderSystem function establishes a standardized folder structure tailored for research projects, particularly those involving empirical studies in social science.

library(drhutools)

folderSystem()

When the function is executed, it creates the following folder structure in the working directory. Each folder includes a brief guide explaining its recommended usage. Users may delete these instructional files once they have organized their actual project files. If folders with the same names already exist in the directory, the function will not recreate or overwrite them, ensuring no accidental loss of existing data.

## +- paper
## |  |
## |  +- submission
## |  |  |
## |  |  `- files for submission here; delete this file after locating the real files here.txt
## |  |
## |  `- images
## |     |
## |     `- non-code-generated images here; delete this file after locating the real files.txt
## |
## +- output
## |  |
## |  `- image, results, and other output files here; delete this file after locating the real files here.txt
## |
## +- document
## |  |
## |  `- documents and materials here; delete this file after locating the real files.txt
## |
## +- data
## |  |
## |  `- all data file here.csv
## |
## `- codes
##    |
##    `- put codes here; delete this file after locating the real files.txt

Visualizing Experimental Results with `cdplot`

cdplot enables the comparison of empirical cumulative distribution functions (ECDFs) between treatment and control groups in experiments or quasi-experiments. Unlike conventional bar plots or difference-in-mean statistics, ECDFs provide a comprehensive, non-parametric view of differences between the treatment and control groups, capturing the entire distribution of outcomes.

The function generates a ggplot object that displays:

The ECDFs of the control and treatment groups.
Points and dashed lines highlighting the value at which the treatment group differs most from the control group.
For multi-group experiments, separate plots comparing the control group to each treatment group.

Before using cdplot, the users should organize the experimental data in a “long” format, where the first column contains the outcome variable. The second column contains the group assignment, stored as a factor with levels. The first level is treated as the control group.

data("PlantGrowth")

plot_plant <- cdplot(PlantGrowth, ks_test = TRUE)
plot_plant

## [[1]]

## 
## [[2]]

## 
## [[3]]

Users can customize the appearance of the plot by adjusting: - point_size to control the size of the points. - point_color to define the color of the points. - link_color to set the color of the dashed lines.

Additionally, the function can perform and display the results of a Kolmogorov-Smirnov (K-S) test to compare the distributions. Set the ks_test argument to TRUE to show the test result in the bottom-right corner of the plot.

Color-Blind Friendly Palette

While everyone has their preferred colors, this package includes a palette that I personally use and recommend. The primary colors are gold (#FFCD00) and black (#000000), which inspired the name _gb.

This palette integrates seamlessly with ggplot2 visualizations, allowing users to apply it as they would any other palette. The visualizations shown above were created using this palette. Below is an additional example demonstrating how to use it in practice:

ggplot(mtcars, aes(wt, mpg, color = cyl)) +
  geom_point() +
  scale_color_gb(discrete = FALSE)

ggplot(mpg, aes(y = class, fill = drv)) +
  geom_bar() +
  scale_fill_gb()

In addition to the primary palette (main), the package offers four alternatives to suit various visualization needs:

tricol: A gradient effect using gold, black, and dark grey.
digitMixed: A set of five colors optimized for digital publications.
printMixed: A set of five colors optimized for printed materials.
full: A comprehensive palette containing all colors available through gb_cols.

I also invite users to contribute their favorite palettes. You can customize and add your own palette by assigning it a unique name and providing a list of colors.

Standard Map of China: `goodmap`

Drawing maps can often be a challenge for Chinese scholars. The goodmap function simplifies this process by creating national maps based on a template provided by Amap.com. This function is inspired by Dawei Lang’s excellent package leafletCN and optimizes leafletCN::geojsonMap to focus specifically on national maps. It also incorporates geodata updated in 2020 by Yang Cao (details here).

Static Maps

The current version of goodmap allows users to draw points or fill polygons based on the full names of prefectures or provinces. Here is an example workflow for generating such maps.

Preparing Data for Polygon Maps

To draw a polygon map, the dataset should be formatted with full city or provincial names. If your data lacks this format, tools such as regioncodes can help convert the data. The data structure should resemble the example below (toy_poly):¹

With properly structured data, users can easily generate a national map of China at the provincial or prefectural level:²

goodmap(
  toy_poly,
  type = "polygon",
  level = "province"
)

Preparing Data for Point Maps

To create a map with points, set type = "point". The data should follow this structure:

toy_point <- data.frame(
  g_lat = c(
    39.947298,
    39.830932,
    39.159621,
    38.745234,
    34.705527,
    23.090849,
    20.008295,
    31.564526,
    29.153561,
    30.368317,
    27.302689,
    41.850161,
    41.7295,
    49.977569,
    31.220653,
    29.962122,
    29.865772
  ),
  g_lon = c(
    116.322434,
    116.20602,
    117.196032,
    113.58242,
    113.755818,
    108.685362,
    109.715334,
    105.974878,
    112.248827,
    102.811716,
    105.28199,
    123.801936,
    125.962291,
    127.493741,
    121.47536,
    121.349437,
    118.436866
  ),
  value_set = c(8, 4, 4, 4, 8, 6, 6, 5, 2, 4, 4, 9, 5, 8, 4, 1, 3)
)

The g_lat and g_lon columns define the latitude and longitude of the points, while the value_set column contains the variable to be displayed. If value_set contains discrete variables, set color_type = "factor". The legend can be named using the legend_name argument.

goodmap(
  toy_point,
  type = "point",
  color_type = "factor",
  point_radius = 7,
  legend_name = "Number",
)

Animated Maps

goodmap can also create animations to illustrate geographic dynamics over time. To do this, set animate = TRUE and specify the time variable. Here is an example:

toy_point$year <- c(
    2021,
    2021,
    2021,
    2021,
    2021,
    2021,
    2021,
    2017,
    2017,
    2017,
    2017,
    1997,
    1997,
    1997,
    1997,
    1997,
    1997
  )

goodmap(
  toy_point,
  type = "point",
  color_type = "factor",
  animate = TRUE,
  animate_var = "year"
)

Currently, animated plots are stored in a temporary file. If satisfied with the result, users should save the animation to a desired location before rerunning the function.

Psychological Scale Scoring: `traits`

The traits function calculates personality trait scores based on psychological survey responses. The current version supports scoring for two widely used scales:

TOSCA-3SC: The Test of Self-Conscious Affect—Short Version, developed by Tangney (1990), measures self-conscious emotions, particularly shame and guilt. Participants respond to hypothetical scenarios, and their answers are analyzed to determine the intensity of their shame or guilt reactions.
Grit-O: The Grit Scale, created by Duckworth et al. (2007), evaluates grit—defined as perseverance and passion for long-term goals.

Data Requirements

To use traits, the survey data must include specific column names corresponding to the questions in each scale:

TOSCA-3SC: Columns Q3|R3 through Q13|R4.
Grit-O: Columns Q14|1 through Q25|1.

Example

The following example demonstrates how to prepare and analyze a dataset using traits:

column_names <- c(
  "Q3|R3", "Q3|R4", "Q4|R3", "Q4|R4", "Q5|R5", "Q5|R6", "Q6|R3", "Q6|R4", "Q7|R3",
  "Q7|R4", "Q8|R5", "Q8|R6", "Q9|R5", "Q9|R6", "Q10|R5", "Q10|R6", "Q11|R5", "Q11|R6", "Q12|R3",
  "Q12|R4", "Q13|R3", "Q13|R4", "Q14|1", "Q15|1", "Q16|1", "Q17|1", "Q18|1", "Q19|1", "Q20|1",
  "Q21|1", "Q22|1", "Q23|1", "Q24|1", "Q25|1"
)

toy_data <- data.frame(matrix(sample(1:5, 10 * length(column_names), replace = TRUE),
  ncol = length(column_names)
))

names(toy_data) <- column_names

traits(toy_data)

##    score_shame score_guilt score_gritO score_gritS
## 1           36          36    2.166667       1.750
## 2           36          28    2.500000       2.875
## 3           33          33    2.916667       3.000
## 4           27          30    2.583333       2.500
## 5           25          39    3.000000       3.500
## 6           34          37    3.916667       4.125
## 7           24          32    3.000000       3.000
## 8           37          32    2.583333       2.250
## 9           36          33    4.083333       4.250
## 10          40          37    3.000000       2.500

This example generates random data for the required columns and calculates the scores for TOSCA-3SC and Grit-O. Adjust your dataset to match the column structure and format for accurate scoring.

Guarding Against False Positives in QCA

Qualitative Comparative Analysis (QCA) evaluates multiple configurations simultaneously, which raises a multiple-testing problem: the more configurations examined in a truth table, the greater the chance that at least one appears significant purely by chance. The functions below, originally from Bear Braumoeller’s QCAfalsePositive package, address this by providing adjusted tests for three QCA variants: crisp-set (csQCA), multi-value (mvQCA), and fuzzy-set (fsQCA).

Binomial Tests for csQCA and mvQCA

For csQCA, csQCAbinTest calculates the probability that each supporting configuration arose by chance, given how often the outcome occurs in the sample. It then adjusts those p-values to account for all configurations in the truth table.

The key arguments are:

freq.y: the proportion of cases where the outcome equals 1.
configs: a named list mapping each configuration to its number of supporting cases.
total.configs: the total number of rows in the truth table (including those not in configs).

test_cs <- csQCAbinTest(
  freq.y      = 0.7,
  configs     = list(aB = 5, bCD = 3, Ce = 2),
  total.configs = 20
)
summary(test_cs)

## Call:
## csQCAbinTest(freq.y = 0.7, configs = list(aB = 5, bCD = 3, Ce = 2), 
##     total.configs = 20)
## 
## Counterexamples
##     Number of cases   p-raw  p-adj  
## aB                5 0.00243 0.0486 *
## bCD               3 0.02700 0.5130  
## Ce                2 0.09000 1.0000  
## Total number of configurations: 20 
## p-value adjustment method: holm

The summary reports, for each configuration, the raw binomial p-value (p-raw) and the Holm-adjusted p-value (p-adj) that corrects for the full set of truth table rows. A small adjusted p-value indicates the result is unlikely to be a false positive.

mvQCAbinTest follows the same logic for multi-value QCA and accepts identical arguments.

Permutation Test for fsQCA

Fuzzy-set QCA requires a different approach because membership scores are continuous rather than binary. fsQCApermTest builds a null distribution by repeatedly shuffling the outcome variable and recomputing consistency and counterexample counts for each configuration. The observed values are then compared against this distribution, with p-values adjusted for the total number of configurations tested.

The social.revolutions dataset (Ragin 2000) provides a classic illustration. It contains fuzzy-set membership scores for state breakdown, popular insurrection, and social revolution across 20 hypothetical cases:

head(social.revolutions)

##   soc.rev breakdown pop.ins
## 1    0.10      0.41    0.83
## 2    0.34      0.69    0.42
## 3    0.13      0.72    0.71
## 4    0.10      0.78    0.34
## 5    0.04      0.15    0.47
## 6    0.11      0.36    0.15

We test four configurations formed by all combinations of breakdown (B) and pop.ins (I) and their complements:

intersect  <- pmin(social.revolutions$breakdown, social.revolutions$pop.ins)
intersect2 <- pmin(social.revolutions$breakdown, 1 - social.revolutions$pop.ins)
intersect3 <- pmin(1 - social.revolutions$breakdown, social.revolutions$pop.ins)
intersect4 <- pmin(1 - social.revolutions$breakdown, 1 - social.revolutions$pop.ins)

# num.iter is reduced here for illustration; use the default 10,000 in practice
test_fs <- fsQCApermTest(
  y             = social.revolutions$soc.rev,
  configs       = list(BI = intersect, Bi = intersect2,
                       bI = intersect3, bi = intersect4),
  total.configs = 4,
  num.iter      = 500
)
summary(test_fs)

## Call:
## fsQCApermTest(y = social.revolutions$soc.rev, configs = list(BI = intersect, 
##     Bi = intersect2, bI = intersect3, bi = intersect4), total.configs = 4, 
##     num.iter = 500)
## 
## Counterexamples
##    Observed Upper Bound Lower c.i.  p-adj se(p-adj)
## BI    0.000      11.000      4.000  0.000    0.0000
## Bi    8.000      14.000      8.000  0.210    0.0911
## bI    9.000      12.000      6.000  0.926    0.0585
## bi    7.000      12.000      5.000  0.884    0.0716
## 
## Consistency
##    Observed Lower Bound Upper c.i.   p-adj se(p-adj)
## BI  1.00000     0.64133    0.86533 0.00000         0
## Bi  0.52933     0.48400    0.69467 1.00000         0
## bI  0.67336     0.56934    0.82664 1.00000         0
## bi  0.61029     0.56801    0.82904 1.00000         0
## 
## Total number of configurations: 4 
## Number of permutations: 500 
## p-value adjustment method: holm

The summary presents two panels. The Counterexamples panel shows, for each configuration, how many cases contradict it and whether that count is low enough to be implausible under the null distribution. The Consistency panel shows observed consistency scores and whether they exceed what would be expected by chance. Both panels display raw and Holm-adjusted p-values.

Calling plot() on the result overlays the observed value (black dot) on the null distribution, with the critical region (adjusted for multiple inference) shaded in dark blue:

plot(test_fs)

Permutation distributions for four fsQCA configurations. Dark blue marks the critical region after multiple-testing adjustment; the black dot is the observed value.

Any configuration whose observed value falls well within the light-blue region—far from the critical dark-blue region—warrants scrutiny as a potential false positive that survives only because many configurations were evaluated simultaneously.

CRAN check seems not allow Chinese characters in the vignette since it intends to compile a pdf version. To pass the CRAN check, I had to insert a screenshot as following rather than the real toy data. For users who want to try the toy data, you can find the codes to create it in the example of goodmap.↩︎
If errors occur or the output is unreadable, adjusting the encoding may resolve the issue.↩︎

`drhutools`: Political Science Academic Research Gears

HU Yue

QIU Qian

DENG Wen

2026-06-04

Installation

Efficient File Organization with `folderSystem`

Visualizing Experimental Results with `cdplot`

Color-Blind Friendly Palette

Standard Map of China: `goodmap`

Static Maps

Preparing Data for Polygon Maps

Preparing Data for Point Maps

Animated Maps

Psychological Scale Scoring: `traits`

Data Requirements

Example

Guarding Against False Positives in QCA

Binomial Tests for csQCA and mvQCA

Permutation Test for fsQCA

Affiliation

References

drhutools: Political Science Academic Research Gears

HU Yue

QIU Qian

DENG Wen

2026-06-04

Installation

Efficient File Organization with folderSystem

Visualizing Experimental Results with cdplot

Color-Blind Friendly Palette

Standard Map of China: goodmap

Static Maps

Preparing Data for Polygon Maps

Preparing Data for Point Maps

Animated Maps

Psychological Scale Scoring: traits

Data Requirements

Example

Guarding Against False Positives in QCA

Binomial Tests for csQCA and mvQCA

Permutation Test for fsQCA

Affiliation

References

`drhutools`: Political Science Academic Research Gears

Efficient File Organization with `folderSystem`

Visualizing Experimental Results with `cdplot`

Standard Map of China: `goodmap`

Psychological Scale Scoring: `traits`