Coarse-to-fine spatial downscaling (CF-DS): Application example

This vignette demonstrates how to perform spatial downscaling (areal interpolation) using the spCF package. Spatial downscaling aims to predict disaggregate-level (fine-scale) responses from aggregate-level (coarse-scale) observations. CF-DS predicts the disaggregate-level spatial process through a coarse-to-fine approach that sequentially captures spatial patterns across increasingly finer scales, while imposing an aggregation constraint so that the disaggregate-level predictions aggregate exactly to the observed aggregate-level values. This framework is particularly suitable for moderate- to large-sized datasets. See Murakami et al. (2026) for further details.

Data and setup

Let us load the required packages

library(spCF)
library(sf)
#> Linking to GEOS 3.14.1, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
library(CARBayesdata)

The pollutionhealthdata and GGHB.IZ datasets included in the CARBayesdata package are used in this example. The data comprise the observed respiratory hospitalisation counts (observed) across 271 intermediate zones in Greater Glasgow. The 271 zones are treated as disaggregate-level units and grouped into 30 artificial aggregate-level regions. The goal is to downscale the responses observed for these 30 regions to the 271 zones.

set.seed(123)
data(GGHB.IZ)
data(pollutionhealthdata)
d  <- pollutionhealthdata[pollutionhealthdata$year == 2010, ]
ar <- merge(GGHB.IZ, d, by = "IZ")

The disaggregate-level data consist of center coordinates, covariates (pm10, jsa, and price), and a proportional allocation weight (prop_weight) used to distribute the aggregate-level response across disaggregate units. Here, expected number of hospitalisation counts (expected) is used.

### Disaggregate-level data (271 units)
coords      <- st_coordinates(suppressWarnings(st_centroid(ar)))
x           <- data.frame(pm10 = ar$pm10, jsa = ar$jsa, price = ar$price)
prop_weight <- as.numeric(ar$expected)

For demonstration purposes, the aggregate-level data are generated by clustering the disaggregate-level units into 30 regions. The variable agg_id indicates the corresponding aggregate-level unit:

### Aggregate-level ID (30 units)
agg_id      <- as.integer(stats::kmeans(coords, centers = 30)$cluster)

The aggregate-level response Y is then obtained by aggregating the observed counts within each aggregate-level unit. Two types of response are possible: Y_type = "sum" for downscaling extensive (count-like) data (e.g., population), where \(Y_I\) is the sum over each aggregate-level unit, and Y_type = "mean" for intensive (density-like) data (e.g., population density), where \(Y_I\) is the mean. Here the count is extensive, so "sum" is used:

### Y_type = "sum"  : Y_I = sum(response for each aggregate-level unit)
### Y_type = "mean" : Y_I = mean(response for each aggregate-level unit)
Y_type      <- "sum"
Y           <- as.numeric(stats::aggregate(ar$observed, by = list(agg_id),
                          FUN = if (Y_type == "sum") sum else mean)[, 2])

Coarse-to-fine spatial downscaling (CF-DS)

In CF-DS, the spatial process is defined as a sum of scale-wise processes, where the number of spatial scales, R, is optimized via holdout validation. A smaller R, corresponding to early stopping, allows the spatial process to capture only large-scale patterns, whereas a larger R enables the spatial process to represent small-scale patterns. The cf_downscale_hv function performs holdout validation as follows:

mod_hv   <- cf_downscale_hv(Y = Y, Y_type = Y_type, x = x,
                            prop_weight = prop_weight,
                            coords = coords, agg_id = agg_id)
#> [1] --- SSE: Linear regression ---
#> [1] 0.6070381
#> [1] --- SSE: Learning multi-scale spatial process ---
#> [1] 0.6092286 (Scale  1) agg constraint not yet satisfied
#> [1] 0.610391 (Scale  2) agg constraint not yet satisfied
#> [1] 0.610114 (Scale  3) agg constraint not yet satisfied
#> [1] 0.6111434 (Scale  4) agg constraint not yet satisfied
#> [1] 0.6116811 (Scale  5) agg constraint not yet satisfied
#> [1] 0.608303 (Scale  6) agg constraint not yet satisfied
#> [1] 0.6004919 (Scale  7) agg constraint not yet satisfied
#> [1] 0.5916741 (Scale  8) agg constraint not yet satisfied
#> [1] 0.5832815 (Scale  9) agg constraint not yet satisfied
#> [1] 0.5723028 (Scale 10) agg constraint not yet satisfied
#> [1] 0.5591086 (Scale 11) agg constraint not yet satisfied
#> [1] 0.5437407 (Scale 12) agg constraint not yet satisfied
#> [1] 0.5261863 (Scale 13) agg constraint not yet satisfied
#> [1] 0.5062412 (Scale 14) agg constraint not yet satisfied
#> [1] 0.4881999 (Scale 15) agg constraint not yet satisfied
#> [1] 0.4696572 (Scale 16) agg constraint not yet satisfied
#> [1] 0.450284 (Scale 17) agg constraint not yet satisfied
#> [1] 0.4416905 (Scale 18) agg constraint not yet satisfied
#> [1] 0.4253678 (Scale 19) agg constraint not yet satisfied
#> [1] 0.4103166 (Scale 20)
#> [1] 0.3967811 (Scale 21)
#> [1] 0.3797582 (Scale 22)
#> [1] 0.3764305 (Scale 23)
#> [1] 0.3764412 (Scale 24)
#> [1] 0.3796882 (Scale 25)
#> [1] 0.3797895 (Scale 26)
#> [1] 0.3763135 (Scale 27)
#> [1] 0.3815458 (Scale 28)
#> [1] 0.3813764 (Scale 29)
#> [1] 0.3836858 (Scale 30)
#> [1] 0.3843489 (Scale 31)
#> [1] 0.3846322 (Scale 32)
#> [1] 
#> [1] -> Selected finest scale: 32 (bandwidth: 698.2244)
#> [1]

By default, 75% of the aggregate-level units are used for training and the remaining units are used for validation (train_rat = 0.75). Alternatively, the training units can be specified explicitly using the id_train argument. As shown in the output, the validation score is monitored as learning proceeds from the coarsest scale (Scale 1) to finer scales, and the learning terminates once the aggregation constraint is satisfied and the validation score no longer improves.

After holdout validation, the full model is trained using the cf_downscale function as follows:

mod      <- cf_downscale(Y = Y, x = x, prop_weight = prop_weight,
                         coords = coords, agg_id = agg_id, mod_hv = mod_hv)
#> [1] --- Learning multi-scale spatial process ---
#> [1]  Scale  1 (bandwidth:18301.1)
#> [1]  Scale  2 (bandwidth:16470.99)
#> [1]  Scale  3 (bandwidth:14823.89)
#> [1]  Scale  4 (bandwidth:13341.5)
#> [1]  Scale  5 (bandwidth:12007.35)
#> [1]  Scale  6 (bandwidth:10806.62)
#> [1]  Scale  7 (bandwidth:9725.956)
#> [1]  Scale  8 (bandwidth:8753.36)
#> [1]  Scale  9 (bandwidth:7878.024)
#> [1]  Scale 10 (bandwidth:7090.222)
#> [1]  Scale 11 (bandwidth:6381.2)
#> [1]  Scale 12 (bandwidth:5743.08)
#> [1]  Scale 13 (bandwidth:5168.772)
#> [1]  Scale 14 (bandwidth:4651.895)
#> [1]  Scale 15 (bandwidth:4186.705)
#> [1]  Scale 16 (bandwidth:3768.035)
#> [1]  Scale 17 (bandwidth:3391.231)
#> [1]  Scale 18 (bandwidth:3052.108)
#> [1]  Scale 19 (bandwidth:2746.897)
#> [1]  Scale 20 (bandwidth:2472.208)
#> [1]  Scale 21 (bandwidth:2224.987)
#> [1]  Scale 22 (bandwidth:2002.488)
#> [1]  Scale 23 (bandwidth:1802.239)
#> [1]  Scale 24 (bandwidth:1622.015)
#> [1]  Scale 25 (bandwidth:1459.814)
#> [1]  Scale 26 (bandwidth:1313.832)
#> [1]  Scale 27 (bandwidth:1182.449)
#> [1]  Scale 28 (bandwidth:1064.204)
#> [1]  Scale 29 (bandwidth:957.7839)
#> [1]  Scale 30 (bandwidth:862.0055)
#> [1]  Scale 31 (bandwidth:775.8049)
#> [1]  Scale 32 (bandwidth:698.2244)

By default, a per-area multiplicative adjustment (adj = TRUE, the default) is applied to satisfy the aggregation constraint, so that the downscaled predictions aggregate exactly to the observed Y. Alternatively, if adj = FALSE, the constraint is satisfied only approximately, which may be preferable when Y contains noise.

The estimated regression coefficients, standard deviations of the scalewise components, and error statistics are displayed as follows:

mod
#> Call:
#> cf_downscale(Y = Y, x = x, prop_weight = prop_weight, coords = coords, 
#>     agg_id = agg_id, mod_hv = mod_hv)
#> 
#> ----Coefficients---------------------------------------
#>              coef      coef_se lower_95CI  upper_95CI
#> V1     0.67451717 0.0048002354  0.6651087  0.68392563
#> pm10   0.03270926 0.0003761497  0.0319720  0.03344651
#> jsa    0.02627716 0.0005328386  0.0252328  0.02732153
#> price -0.28666895 0.0019616896 -0.2905139 -0.28282404
#> 
#> ----Standard deviations (influential elements only)----
#>           elements standard_deviation
#> 1               xb        0.232156910
#> 2   spatial_scale1        0.000855227
#> 3   spatial_scale2        0.001110786
#> 4   spatial_scale3        0.001344851
#> 5   spatial_scale4        0.001399523
#> 6   spatial_scale5        0.001549541
#> 7   spatial_scale6        0.001963986
#> 8   spatial_scale7        0.001890540
#> 9   spatial_scale8        0.002101424
#> 10  spatial_scale9        0.002549838
#> 11 spatial_scale10        0.002875470
#> 12 spatial_scale11        0.003456967
#> 13 spatial_scale12        0.004201119
#> 14 spatial_scale13        0.005050087
#> 15 spatial_scale14        0.006596558
#> 16 spatial_scale15        0.007170035
#> 17 spatial_scale16        0.007339478
#> 18 spatial_scale17        0.006469095
#> 19 spatial_scale18        0.005905852
#> 20 spatial_scale19        0.005994244
#> 21 spatial_scale20        0.006630537
#> 22 spatial_scale21        0.006154833
#> 23 spatial_scale22        0.005308058
#> 24 spatial_scale23        0.004531535
#> 25 spatial_scale24        0.004246483
#> 26 spatial_scale25        0.004018343
#> 27 spatial_scale26        0.004211106
#> 28 spatial_scale27        0.004927188
#> 29 spatial_scale28        0.004270462
#> 30 spatial_scale29        0.004592801
#> 31 spatial_scale30        0.003461485
#> 32 spatial_scale31        0.002850030
#> 33 spatial_scale32        0.001625115
#> 34       residuals        1.115114050
#> 
#> ----Error statistics ----------------------------------
#>              stat      value
#> 1   validation_R2  0.9975636
#> 2 validation_RMSE 54.8224099
#> 3  validation_MAE 50.3789532

Here, validation_R2 and validation_RMSE are evaluated by aggregating the predictions to the aggregate-level validation units obtained from the holdout validation in cf_downscale_hv. The high R2 value in this example is mainly due to the small number of aggregate-level units: because there are only 30 units in total, the validation set contains only 8 units.

Mapping

Downscaling result

The aggregate-level response and the downscaled disaggregate-level predictive means are mapped as follows:

### Aggregate-level polygons and response
ar$agg_id <- agg_id
agg_poly  <- stats::aggregate(ar["agg_id"], by = list(agg_id = agg_id),
                              FUN = function(z) z[1])
agg_poly$Y<- Y

### Disaggregate-level predictive mean
ar$pred   <- mod$pred$pred

plot(agg_poly["Y"], nbreaks = 20, key.pos = 4, axes = TRUE, lwd=0.2,
     main = "Aggregated data (30 units)")

plot(ar["pred"], nbreaks = 20, key.pos = 4, axes = TRUE, border = NA,
     main = "Downscaling result (271 units)")

The downscaled map recovers the disaggregate-level spatial pattern of respiratory hospitalisations while preserving consistency with the aggregate-level observations.

Predictive standard deviations

The standard deviations of the disaggregate-level predictions, which quantify the downscaling uncertainty, are mapped as follows:

ar$pred_sd <- mod$pred$pred_sd
plot(ar["pred_sd"], pal = function(n) hcl.colors(n, "Viridis"), nbreaks = 9, border = NA,
     key.pos = 4, axes = TRUE)

Reference

Murakami, Daisuke, Yongwan Chun, Takahiro Yoshida, and Hajime Seya. 2026. Scalable Coarse-to-Fine Spatial Downscaling. ArXiv.