The biggest challenge for an evaluator who wishes to study an outbreak detection algorithm is obtaining surveillance data for a sufficiently large number of outbreaks with which to measure sensitivity and timeliness. The exact number of outbreaks required depends on the tightness of the statistical error bounds on these measurements that he desires, but as a rough approximation, 10 outbreaks are required as a bare minimum.
To have the greatest validity, an evaluator tests an algorithm using surveillance data collected from real outbreaks. However, with the exception of influenza, rotavirus, adenovirus, and some foodborne illnesses, obtaining surveillance data for even 10 outbreaks is difficult, at best. For some diseases of great concern as bioterrorist threats, outbreaks are non-existent. For outbreaks to be useful to the evaluator, they must have occurred in regions that collected biosurveillance data. Although the availability of suitable data is rapidly improving, the present lack of surveillance data is a significant barrier to research using real data. Indeed, there are few published evaluations of outbreak detection algorithms using real data that also have had sufficient sample size to compute confidence intervals on their measures of sensitivity and timeliness. At the time of this writing, we are aware only of studies by Hogan et al. (2003), Ivanov et al. (2003), and Campbell et al. (2004).
Fortunately, much can be learned about outbreak detection algorithms using synthetic data whose characteristics resemble those of real data, at least for those characteristics relevant to the algorithm's performance. Evaluators have used three types of synthetic surveillance data in published studies: fully synthetic, semi-synthetic, and high-fidelity synthetic.
Was this article helpful?