# Chapter 20 Spatial response surface sampling

As with conditioned Latin hypercube sampling, spatial response surface sampling is an experimental design adapted for spatial surveys. Experimental response surface designs aim at finding an optimum of the response within specified ranges of the factors. There are many types of response surface designs, see Myers, Montgomery, and Anderson-Cook (2009). With response surface sampling one assumes that some type of low order regression model can be used to accurately approximate the relationship between the study variable and the covariates. A commonly used design is the central composite design. The data obtained with this design are used to fit a multiple linear regression model with quadratic terms, yielding a curved, quadratic surface of the response.

The response surface sampling approach is an adaptation of an experimental design, but at the same time it is an example of a model-based sampling design. Sampling units are selected to implicitly optimise the estimation of the quadratic regression model. However, this optimisation is done under one or more spatial constraints. Unconstrained optimisation of the sampling design under the linear regression model will not prevent the units from spatial clustering, see the optimal sample in Figure 16.1. The assumption of independent data might be violated when the sampling units are spatially clustered. For this reason, the response surface sampling design selects samples with good spatial coverage, so that the design becomes robust against violation of the independence assumption.

Lesch, Strauss, and Rhoades (1995) adapted the response surface methodology for observational studies. Several problems needed to be tackled. First, when multiple covariates are used, the covariates must be decorrelated. Second, when sampling units are spatially clustered, the assumption in linear regression modelling of spatially uncorrelated model residuals can be violated. To address these two problems, Lesch, Strauss, and Rhoades (1995) proposed the following procedure; see also Lesch (2005):

1. Transform the covariate matrix into a scaled, centred, decorrelated matrix by principal components analysis (PCA).
2. Choose the response surface design type.
3. Select candidate sampling units based on the distance from the design points in the space spanned by the principal components. Select multiple sampling units per design point.
4. Select the combination of candidate sampling units with the highest value for a criterion that quantifies how uniform the sample is spread across the study area.

This design has been applied, among others, for mapping soil salinity (ECe), using electromagnetic (EM) induction measurements and surface array conductivity measurements as predictors in multiple linear regression models. For applications, see Corwin and Lesch (2005), Lesch (2005), Fitzgerald et al. (2006), Corwin et al. (2010), and Fitzgerald (2010).

Spatial response surface sampling is illustrated with the EM measurements (mS m-1) of the apparent electrical conductivity on the 80 ha Cotton Research Farm in Uzbekistan. The EM measurements in vertical dipole mode, with transmitter at 1 m and 0.5 m from the receiver, are on transects covering the Cotton Research Farm (Figure 20.1). As a first step, the natural log of the two EM measurements, denoted by lnEM, are interpolated by ordinary kriging to a fine grid (Figure 20.2). These ordinary kriging predictions of lnEM are used as covariates in response surface sampling. The two covariates are strongly correlated, $$r=0.73$$, as expected since they are interpolations of measurements of the same variable but of different overlapping layers.

Function prcomp of the stats package is used to compute the principal component scores for all units in the population (grid cells). The two covariates are centred and scaled, i.e., standardised principal components are computed.

pc <- grdCRF %>%
dplyr::select(lnEM100cm, lnEM50cm) %>%
prcomp(center = TRUE, scale = TRUE)

The means of the two principal component scores are 0; however, their standard deviations are not zero but 1.330 and 0.480. Therefore, the principal component scores are divided by these standard deviations. They then will have the same weight in the following steps.

grdCRF <- grdCRF %>%
mutate(
PC1 = pc$x[, 1] / pc$sdev[1],
PC2 = pc$x[, 2] / pc$sdev[2])

Function ccd of package rsm is now used to generate a central composite response surface design (CCRSD). Argument basis specifies the number of factors, which is two in our case. Argument n0 is the number of centre points, and argument alpha determines the position of the star points (explained hereafter).

library(rsm)
set.seed(314)
print(ccdesign <- ccd(basis = 2, n0 = 1, alpha = "rotatable"))
   run.order std.order  x1.as.is  x2.as.is Block
1          1         4  1.000000  1.000000     1
2          2         5  0.000000  0.000000     1
3          3         2  1.000000 -1.000000     1
4          4         1 -1.000000 -1.000000     1
5          5         3 -1.000000  1.000000     1
6          1         5  0.000000  0.000000     2
7          2         3  0.000000 -1.414214     2
8          3         1 -1.414214  0.000000     2
9          4         4  0.000000  1.414214     2
10         5         2  1.414214  0.000000     2

Data are stored in coded form using these coding formulas ...
x1 ~ x1.as.is
x2 ~ x2.as.is

The experiment consists of two blocks, each of five experimental units. Block 1, the so-called cube block, consists of one centre point and four cube points. In the experimental unit represented by the centre point, both factors have levels in the centre of the experimental range. In the experimental units represented by the cube points, the levels of both factors is either -1 or +1 unit in the design space. Block 2, referred to as the star block, consists of one centre point and four star points. With alpha = "rotatable" the star points are on the circle circumscribing the square (Figure 20.3).

To adapt this design for an observational study, we drop one of the centre points (0,0).

ccd_df <- data.frame(x1 = ccdesign$x1, x2 = ccdesign$x2)
ccd_df <- ccd_df[-6, ]

The coordinates of the CCRSD points are multiplied by a factor so that a large proportion $$p$$ of the bivariate standardised principal component scores of the population units is covered by the circle that passes through the design points (Figure 20.3). The factor is computed as a sample quantile of the empirical distribution of the distances of the points in the scatter to the centre. For $$p$$, I chose 0.7.

d <- sqrt(grdCRF$PC1^2 + grdCRF$PC2^2)
fct <- quantile(d, p = 0.7)
print(fct)
     70%
1.472547 
ccd_df <- ccd_df %>%
mutate(x1 = x1 * fct, x2 = x2 * fct)

The next step is to select for each design point several candidate sampling points. For each of the nine design points, eight points are selected that are closest to that design point. This results in 9 $$\times$$ 8 candidate sampling points.

candi_all <- NULL
for (i in seq_len(nrow(ccd_df))) {
d2dpnt <- sqrt((grdCRF$PC1 - ccd_df$x1[i])^2 +
(grdCRF$PC2 - ccd_df$x2[i])^2)
grdCRF <- grdCRF[order(d2dpnt), ]
candi <- grdCRF[c(1:8), c("point_id", "x", "y", "PC1", "PC2")]
candi$dpnt <- i candi_all <- rbind(candi_all, candi) } Figure 20.4 shows the nine clusters of candidate sampling points around the design points. Note that the location of the candidate sampling points associated with the design points with coordinates (0,-2.13), (1.51,-1.51), and (2.13,0) are all far inside the circle that passes through the design points. So, for the optimised sample, there will be three points with principal component scores that considerably differ from the ideal values according to the CCRSD design. Figure 20.5 shows that in geographical space for most design points there are multiple spatial clusters of candidate units. For instance, for design point nine, there are three clusters of candidate sampling units. Therefore, there is scope to optimise the sample computationally. As a first step, an initial subsample from the candidate sampling units is selected by stratified simple random sampling, using the levels of factor dpnt as strata. Function strata of package sampling is used for stratified random sampling . library(sampling) set.seed(314) units_stsi <- sampling::strata( candi_all, stratanames = "dpnt", size = rep(1, 9)) mysample0 <- getdata(candi_all, units_stsi) %>% dplyr::select(-(ID_unit:Stratum)) The locations of the nine sampling units are now optimised by minimising a criterion that is a function of the distance between the nine sampling points. Two minimisation criteria are implemented, a geometric criterion and a model-based criterion. In the geometric criterion (as proposed by Lesch (2005)) for each sampling point the log of the shortest distance to the other points is computed. The minimisation criterion is the negative of the sample mean of these distances. The model-based minimisation criterion is the average correlation of the sampling points. This criterion requires as input the parameters of a residual correlogram (see Section 21.3). I assume an exponential correlogram without nugget, so that the only parameter to be chosen is the distance parameter $$\phi$$ (Equation (21.13)). Three times $$\phi$$ is referred to as the effective range of the exponential correlogram. The correlation of the random variables at two points separated by this distance is 0.05. A penalty term is added to the geometric or the model-based minimisation criterion, equal to the average distance of the sampling points to the associated design points, multiplied by a weight. With weights $$>0$$, sampling points close to the design points are preferred over more distant points. In the next code chunk, a function is defined for computing the minimisation criterion. Given a chosen value for $$\phi$$, the 9 $$\times$$ 9 distance matrix of the sampling points can be converted into a correlation matrix, using function variogramLine of package gstat . Argument weight is an optional argument with default value 0. getCriterion <- function(mysample, dpnt, weight = 0, phi = NULL) { D2dpnt <- sqrt((mysample$PC1 - dpnt$x1)^2 + (mysample$PC2 - dpnt$x2)^2) D <- as.matrix(dist(mysample[, c("x", "y")])) if (is.null(phi)) { diag(D) <- NA logdmin <- apply(D, MARGIN = 1, FUN = min, na.rm = TRUE) %>% log criterion_cur <- mean(-logdmin) + mean(D2dpnt) * weight } else { vgmodel <- vgm(model = "Exp", psill = 1, range = phi) C <- variogramLine(vgmodel, dist_vector = D, covariance = TRUE) criterion_cur <- mean(C) + mean(D2dpnt) * weight } return(criterion_cur) } Function getCriterion is used to compute the geometric criterion for the initial sample. criterion_geo <- getCriterion(mysample = mysample0, dpnt = ccd_df) The initial value of the geometric criterion is -4.829. In the next code chunk, the initial value for the model-based criterion is computed for an effective range of 150 m. It does not make sense to make the effective range smaller than the size of the grid cells, which is 25 m in our case. For smaller ranges, the correlation matrix is for any sample a matrix with zeroes. If the effective range is smaller than the smallest distance between two points in a cluster, the mean correlation is equal for all samples. phi <- 50 criterion_mb <- getCriterion(mysample = mysample0, dpnt = ccd_df, phi = phi) The initial value of the model-based criterion is 0.134. The objective function defining the minimisation criterion is minimised with simulated annealing (Kirkpatrick, Gelatt, and Vecchi (1983), Aarts and Korst (1989)). One sampling point is randomly selected and replaced by another candidate sampling point from the same cluster. If the criterion of the new sample is smaller than that of the current sample, the new sample is accepted. If it is larger, it is accepted with a probability that is a function of the change in the criterion (the larger the increase, the smaller the acceptance probability) and of an annealing parameter named the temperature (the higher the temperature, the larger the probability of accepting a new, poorer sample, given an increase of the criterion). See Section 23.1 for a more detailed introduction to simulated annealing. The sampling pattern can be optimised with function anneal of package sswr. The arguments of this function will be clear from the description of the sampling procedure above. set.seed(314) mySRSsample <- anneal( mysample = mysample0, candidates = candi_all, dpnt = ccd_df, phi = 50, T_ini = 1, coolingRate = 0.9, maxPermuted = 25 * nrow(mysample0), maxNoChange = 20, verbose = TRUE) Figure 20.6 shows the optimised CCRSD samples plotted in the space spanned by the two principal components, obtained with the geometric and the model-based criterion, plotted together with the design points. The two optimised samples are very similar. Figure 20.7 shows the two optimised CCRSD samples plotted in geographical space on the first standardised principal component scores. ## 20.1 Increasing the sample size Nine points are rather few for fitting a polynomial regression model, especially for a second-order polynomial with interaction. Therefore, in experiments often multiple observations are done for each design point. Increasing the sample size of a response surface sample in observational studies is not straightforward. The challenge is to avoid spatial clustering of sampling points. A simple solution is to select multiple points from each subset of candidate sampling units. The success of this solution depends on how strong candidate sampling units are spatially clustered. For the Cotton Research Farm for most design points the candidate sampling units are not in one spatial cluster; so in this case, the solution may work properly. I increased the number of candidate sampling units per design point to 16, so that there is a larger choice in the optimisation of the sampling pattern. candi_all <- NULL for (i in seq_len(nrow(ccd_df))) { d2dpnt <- sqrt((grdCRF$PC1 - ccd_df$x1[i])^2 + (grdCRF$PC2 - ccd_df$x2[i])^2) grdCRF <- grdCRF[order(d2dpnt), ] candi <- grdCRF[c(1:16), c("point_id", "x", "y", "PC1", "PC2")] candi$dpnt <- i
candi_all <- rbind(candi_all, candi)
}

A stratified simple random subsample of two points per stratum is selected, which serves as an initial sample.

set.seed(314)
units_stsi <- sampling::strata(
candi_all, stratanames = "dpnt", size = rep(2, 9))
mysample0 <- getdata(candi_all, units_stsi) %>%
dplyr::select(-(ID_unit:Stratum))

The data frame with the design points must be doubled. Note that the order of the design points must be equal to the order in the stratified subsample.

tmp <- data.frame(ccd_df, dpnt = 1:9)
ccd_df2 <- rbind(tmp, tmp)
ccd_df2 <- ccd_df2[order(ccd_df2$dpnt), ] Figures 20.8 and 20.9 show the optimised CCRSD sample of 18 points in geographical and principal component space, respectively, obtained with the model-based criterion, an effective range of 150 m, and zero weight for the penalty term. Sampling points are not spatially clustered, so I do not expect violation of the assumption of independent residuals. In principal component space, all points are pretty close to the design points, except for the four design points in the lower right corner, where no candidate units near these design points are available. ## 20.2 Stratified spatial response surface sampling The sample size can also be increased by stratified spatial response surface sampling. The strata are subareas of the study area. When the subsets of candidate sampling units for some design points are strongly spatially clustered, the final optimised sample obtained with the method of the previous section may also show strong spatial clustering. An alternative is then to split the study area into two or more subareas (strata) and to select from each stratum candidate sampling units. This guarantees that for each design point we have at least as many spatial clusters of candidate units as we have strata. The spatial strata are not used for fitting separate regression models. All data are used to fit one (second-order) polynomial regression model. Figure 20.10 shows two subareas used as strata in stratified response surface sampling of the Cotton Research Farm. The candidate sampling units are selected in a double for-loop. The outer loop is over the strata, the inner loop over the design points. Note that variable dpnt continues to increase by 1 after the inner loop over the nine design points in subarea 1 is completed, so that variable dpnt (used as a stratification variable in subsampling the sample of candidate sampling points) now has values $$1,2, \dots , 18$$. An equal number of candidate sampling points per design point in both strata (eight points) is selected by sorting the points of a stratum by the distance to a design point using function order. Figure 20.11 shows the candidate sampling points for stratified CCRSD sampling. candi_all <- NULL for (h in c(1, 2)) { data_stratum <- grdCRF %>% filter(subarea == h) candi_stratum <- NULL for (i in seq_len(nrow(ccd_df))) { d2dpnt <- sqrt((data_stratum$PC1 - ccd_df$x1[i])^2 + (data_stratum$PC2 - ccd_df$x2[i])^2) data_stratum <- data_stratum[order(d2dpnt), ] candi <- data_stratum[c(1:8), c("point_id", "x", "y", "PC1", "PC2", "subarea")] candi$dpnt <- i + (h - 1) * nrow(ccd_df)
candi_stratum <- rbind(candi_stratum, candi)
}
candi_all <- rbind(candi_all, candi_stratum)
}

As before, dpnt is used as a stratum identifier to subsample the candidate sampling units. Finally, the number of rows in data.frame ccd_df with the design points is doubled.

set.seed(314)
units_stsi <- sampling::strata(
candi_all, stratanames = "dpnt", size = rep(1, 18))
mysample0 <- getdata(candi_all, units_stsi) %>%
dplyr::select(-(ID_unit:Stratum))
ccd_df2 <- rbind(ccd_df, ccd_df)

Figures 20.12 and 20.13 show the optimised sample of 18 points in geographical and principal component space, obtained with the model-based criterion with an effective range of 150 m. The pattern in the principal component space is worse compared to the pattern in Figure 20.9. In stratum 1, the distance to the star point at the top and the upper left and upper right cube points is very large. In this stratum no population units are present that are close to these design points. By adding a penalty term to the minimisation criterion that is proportional to the distance to the design points, the distance is somewhat decreased, but not really for the three design points mentioned above (Figure 20.9). Also note the spatial cluster of three sampling units in Figure 20.12 obtained with a weight equal to 5.

## 20.3 Mapping

Once the data are collected, the study variable is mapped by fitting a multiple linear regression model using the two covariates, in our case the two EM measurements, as predictors. The fitted model is then used to predict the study variable for all unsampled population units.

The value of the study variable at an unsampled prediction location $$\mathbf{s}_0$$ is predicted by $$$\widehat{Z}(\mathbf{s}_0) = \mathbf{x}_0 \hat{\pmb{\beta}}_{\text{OLS}}\;, \tag{20.1}$$$ with $$\mathbf{x}_0$$ the ($$p+1$$)-vector with covariate values at prediction location $$\mathbf{s}_0$$ and 1 in the first entry ($$p$$ is the number of covariates) and $$\hat{\pmb{\beta}}$$ the vector with ordinary least squares estimates of the regression coefficients: $$$\hat{\pmb{\beta}}_{\text{OLS}} = (\mathbf{X}^{\mathrm{T}}\mathbf{X})^{-1} (\mathbf{X}^{\mathrm{T}}\mathbf{z})\;, \tag{20.2}$$$ with $$\mathbf{X}$$ the $$(n \times (p+1))$$ matrix with covariate values and ones in the first column ($$n$$ is the sample size, and $$p$$ is the number of covariates) and $$\mathbf{z}$$ the $$n$$-vector with observations of the study variable.

Although the principal component scores are used to select the sampling locations, there is no need to use these scores as predictors in the linear regression model. When all principal components derived from the covariates are used as predictors, the predicted values and standard errors obtained with the model using the principal components as predictors are equal to those obtained with the model using the covariates as predictors.

The variance of the prediction error can be estimated by $$$\widehat{V}(\widehat{Z}(\mathbf{s}_0)) = \hat{\sigma}^2_{\epsilon}(1+\mathbf{x}_0^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\mathbf{X})^{-1}\mathbf{x}_0)\;, \tag{20.3}$$$ with $$\hat{\sigma}^2_{\epsilon}$$ the estimated variance of the residuals.

In R the model can be calibrated with function lm, and the predictions can be obtained with function predict. The standard errors of the estimated means can be obtained with argument se.fit = TRUE. The variances of Equation (20.3) can be computed by squaring these standard errors and adding the squared value of the estimated residual variance, which can be extracted with sigma().

mdl <- lm(lnECe ~ lnEM100cm + lnEM50cm, data = mysample)
zpred <- predict(mdl, newdata = grdCRF, se.fit = TRUE)
v_zpred <- zpred\$se.fit^2+sigma(mdl)^2

The assumption underlying Equations (20.2) and (20.3) is that the model residuals are independent. We assume that all the spatial structure of the study variable is explained by the covariates. Even the residuals at two locations close to each other are assumed to be uncorrelated. A drawback of the spatial response surface design is that it is hard or even impossible to check this assumption, as the sampling locations are spread throughout the study area. If the residuals are not independent, the covariance of the residuals can be accounted for by generalised least squares estimation of the regression coefficients (Equation (21.24)). The study variable can then be mapped by kriging with an external drift (Section 21.3). However, this requires an estimate of the semivariogram of the residuals (Section 21.5).

### References

Aarts, E. H. L., and J. H. M. Korst. 1989. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Chichester: John Wiley & Sons.
Corwin, D. L., and S. M. Lesch. 2005. “Characterizing Soil Spatial Variability with Apparent Soil Electrical Conductivity: Part II. Case study.” Computers and Electronics in Agriculture 46 (1–3): 135–52.
Corwin, D. L., S. M. Lesch, E. Segal, T. H. Skaggs, and S. A. Bradford. 2010. “Comparison of Sampling Strategies for Characterizing Spatial Variability with Apparent Soil Electrical Conductivity Directed Soil Sampling.” Journal of Environmental and Engineering Geophysics 15 (3): 147–62. https://doi.org/10.2113/JEEG15.3.147.
Fitzgerald, G. J. 2010. “Response Surface Sampling of Remotely Sensed Imagery for Precision Agriculture.” In Proximal Soil Sensing, edited by R. A. Viscarra Rossel, A. B. McBratney, and B. Minasny, 121–29. Dordrecht: Springer.
Fitzgerald, G. J., S. M. Lesch, E. M. Barnes, and W. E. Luckett. 2006. “Directed Sampling Using Remote Sensing with a Response Surface Sampling Design for Site-Specific Agriculture.” Computers and Electronics in Agriculture 53 (2): 98–112.
Kirkpatrick, S., C. D. Gelatt, and M. P. Vecchi. 1983. “Optimization by Simulated Annealing.” Science 220 (4598): 671–80. https://doi.org/https://doi.org/10.1126/science.220.4598.671.
Lenth, R. V. 2009. Response-Surface Methods in R, Using rsm.” Journal of Statistical Software 32 (7): 1–17.
Lesch, S. M. 2005. “Sensor-Directed Response Surface Sampling Designs for Characterizing Spatial Variation in Soil Properties.” Computers and Electronics in Agriculture 46 (1–3): 153–79.
Lesch, S. M., D. J. Strauss, and J. D. Rhoades. 1995. “Spatial Prediction of Soil Salinity Using Electromagnetic Induction Techniques, 2. An Efficient Spatial Sampling Algorithm Suitable for Multiple Linear Regression Model Identification and Estimation.” Water Resources Research 31 (2): 387–98.
Myers, R. H., D. C. Montgomery, and C. M. Anderson-Cook. 2009. Response Surface Methodology: Process and Product Optimization Using Designed Experiments. Third. Hoboken: John Wiley & Sons.
Pebesma, E. J. 2004. “Multivariable Geostatistics in S: The Gstat Package.” Computers & Geosciences 30 (7): 683–91.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Tillé, Y., and A. Matei. 2021. sampling: Survey Sampling. https://CRAN.R-project.org/package=sampling.