## The Spatial Scan Statistic

The spatial scan statistic (Kulldorff and Nagarwalla, 1995, Kulldorff, 1997) is a powerful and general method for spatial cluster detection. It is in common use by epidemiologists for finding significant spatial clusters of disease cases, which may be indicative of an outbreak. In this section, we present the spatial scan statistic as originally described by Kulldorff, along with a number of generalizations, extensions, and variants which extend the scope and applicability of this method.

In its original formulation, Kulldorff's statistic assumes that we have a set of spatial locations s¡, and are given a count c¡ and a population p¡ corresponding to each location. For example, each s¡ may represent the centroid of a census tract, the corresponding count c¡ may represent the number of respiratory emergency department visits in that census tract, and the corresponding population p¡ may represent the "at-risk population'' of that census tract, derived from census population and possibly adjusted for covariates. The statistic makes the assumption that each observed count c¡ is drawn randomly from a Poisson distribution with mean q¡p¡, where p¡ is the (known) at-risk population of that area, and q¡ is the (unknown) risk, or underlying disease rate, of that area. The risk is the expected number of cases per unit population; that is, we expect to see a number of cases equal to the product of the population and the risk, but the observed number of cases may be more or less than this expectation due to chance. Thus, our goal is to determine whether observed increases in count in a region are due to increased risk, or chance fluctuations. The Poisson distribution is commonly used in epidemiology to model the underlying randomness of observed case counts, making the assumption that the variance is equal to the mean. If this assumption is not reasonable (i.e., counts are "overdispersed'' with variance greater than the mean, or "underdispersed" with variance less than the mean), we should instead use a distribution which separately models mean and variance, such as the normal or negative binomial distributions. We also assume that each count c¡ is drawn independently, although the model can be extended to account for spatial correlations between nearby locations.