In this chapter, we have discussed those methods that can be used to solve the two problems of cluster detection: determining whether any significant clusters exist, and pinpointing the spatial location and extent of clusters. Many other spatial methods can be found in the literature on spatial epidemiology and spatial statistics, although most such methods either do not find specific clusters, or do not evaluate the statistical significance of discovered clusters. More general overviews of the literature on spatial statistical methods can be found in Lawson (2001) and Elliott et al. (2000). In addition to the spatial cluster detection methods discussed here, these methods include general and focused clustering methods, disease mapping approaches, and spatial cluster modeling.

General clustering methods are hypothesis testing methods that test for a general tendency of the data to cluster; in other words, they attempt to answer the question, "Is this data set more spatially clustered than we would expect?'' Such methods do not identify specific clusters, but instead give a single result of "spatially clustered'' or "not spatially clustered.'' These methods are useful if we want to know whether anything unexpected is going on, but do not care about the specific locations of unexpected events. Examples of such methods include Whittemore et al. (1987), Cuzick and Edwards (1990), and Tango (1995); see Lawson (2001) and Elliott et al. (2000) for more details. We also refer the interested reader to two general tests for space-time clustering: Knox (1964) and Mantel (1967).

Focused clustering methods are hypothesis testing methods that, given a prespecified spatial location, attempt to answer the question, "Is there an increase in risk in areas near this location?'' These methods can be used to examine potential environmental hazards, such as testing for an increased risk of lung cancer near a coal-burning power plant. Since the locations are specified in advance, these methods cannot be used to identify specific cluster locations, but are instead used to test locations that have been identified by other means. Examples of such methods include Stone (1988), Besag and Newell (1991), and Lawson (1993); see Lawson (2001) and Elliott et al. (2000) for more details.

Disease mapping approaches have the goal of producing a spatially smoothed map of the variation in disease risk. For example, a very simple disease mapping approach might plot the observed disease rate (number of observed cases per unit population) in each area; more advanced approaches use a variety of Bayesian models and other spatial smoothing techniques to estimate the underlying risk of disease in each area. These methods do not explicitly identify cluster locations, but disease clusters may be inferred manually by identifying high-risk areas on the resulting map. Nevertheless, no hypothesis testing is typically done, so we cannot draw statistical conclusions as to whether these high-risk areas have resulted from true disease clusters or from chance fluctuations. Examples of such methods include Clayton and Kaldor (1987), Besag et al. (1991), and Clayton and Bernardinelli (1992); see Lawson, (2001) and Elliott et al. (2000) for more details.

Finally, spatial cluster modeling methods attempt to combine the benefits of disease mapping and spatial cluster detection, by constructing a probabilistic model in which the underlying clusters of disease are explicitly represented. A typical approach is to assume that cases are generated by some underlying process model which depends on a set of cluster centers, where the number and locations of cluster centers are unknown. Then we attempt to simultaneously infer all the parameters of the model, including the cluster centers and the disease risks in each area, using a simulation method such as reversible jump Markov chain Monte Carlo (Green, 1995). Thus, precise cluster locations are inferred, and while no formal significance testing is done, the method is able to compare models with different numbers of cluster centers, giving an indication of both whether there are any clusters and where each cluster is located. One typical disadvantage of such methods is computational: the underlying models rarely have closed-form solutions, and the Markov chain Monte Carlo methods used to approximate the model parameters are often computationally intensive. Examples of such methods include Lawson (1995), Lawson and Clark (1999), and Gangnon and Clayton (2000). For a more detailed discussion of spatial cluster modeling, see Lawson and Denison (2002).

Was this article helpful?

Quit smoking for good! Stop your bad habits for good, learn to cope with the addiction of cigarettes and how to curb cravings and begin a new life. You will never again have to leave a meeting and find a place outside to smoke, losing valuable time. This is the key to your freedom from addiction, take the first step!

## Post a comment