## Current Methods For Interpreting Biosurveillance Data

In current practice, most biosurveillance systems do not use Bayesian algorithms to compute the posterior probability of an outbreak (or of its characteristics). Currently available systems output alerts when daily counts exceed a threshold, which is set in an ad hoc manner to limit the number of false alarms. When interpreting an alert, the decision maker has available only (1) a display of the time series data, and (2) the false-alarm rate (or recurrence interval, which is the average time between false alarms in the absence of an outbreak) of the system. The decision maker must interpret the time series knowing only the false-alarm rate of the system.1

Decision makers do not interpret surveillance data probabilistically, although they could. It is instructive to consider

1 In some cases, the situation is even worse. Some detection systems under-estimate the false-alarm rate because they calculate it from an assumed distribution (e.g., normal or Poisson) rather than measuring it empirically.This practice makes the signal interpretation problem even more difficult if the actual distribution differs from the assumed parametric form.

Handbook of Biosurveillance 417 Elsevier Inc.

what can be discovered about interpretation of biosurveillance data if they were to interpret the output of their systems probabilistically. To make this exercise concrete, we use the signal shown in Figure 30.1 and a prior probability of a Cryp-tosporidium outbreak equal to 0.0035 per day (we describe how we obtained this prior probability later in this chapter).

Equation 30.1 is Bayes' theorem applied to the interpretation of biosurveillance data. Equation 30.1 allows a decision maker to compute the posterior probability of outbreak, given an alarm provided that three quantities are known: (1) the prior probability of outbreak, P(Outbreak) (2) the sensitivity of the detection system, P(AlarmiOutbreak) and (3) the false-alarm rate, P(AlarmiNo Outbreak), of the detection system. In current practice, a decision maker typically knows the false-alarm rate and can estimate the prior probability (e.g., influenza occurs once per year, so the probability of an influenza outbreak being ongoing on any given day is roughly the duration of influenza outbreaks divided by 365). Typically the decision maker does not know the sensitivity of an outbreak-detection system but can, nevertheless, compute upper and lower bounds as we will now demonstrate.

A decision maker can use Equation 30.1 to compute upper and lower bounds on the posterior probability by assuming that the sensitivity of the system lies within some range. If the sensitivity of the system is unknown, the worst and best

P(Outbreak i Alarm) =

P(Alarm I Outbreak)P(Outbreak) P(Alarm I Outbreak)P(Outbreak) + P(Alarm I No Outbreak)[l - P(Outbreak)]

possible sensitivities conceivable would be used to obtain bounds on what the actual posterior probability could be. In particular, the decision maker would use the best possible sensitivity of 1.0 to computer an upper bound, and the worst sensitivity possible to compute a lower bound. The worst possible sensitivity equals the false-alarm rate, as we will now prove: Figure 30.2 is an receiver operating characteristic (ROC) curve (see Chapter 20) for a system with the worst possible sensitivity. The ROC curve for this system is a straight line with slope equal to one. The mathematical equation for the slope is 1 = sensitivity/false-alarm rate; after trivial algebraic manipulation, we obtain sensitivity = false-alarm rate. QED (end of proof).

Equation 30.2 shows how the posterior probability for the worst-case detection system equals the prior probability. We simply use the result that we just obtained showing that sensitivity is always equal to the false-alarm rate, and algebraically simplify the substituted expression to obtain this result. This result is exactly what a Bayesian would expect. A detection system that is incapable of discriminating between outbreaks and nonoutbreaks should not change our belief (prior probability) that an outbreak is present.

P( Alarm i Outbreak)P(Outbreak) P(Outbreak i Alarm) =-^-'—-'-

P(Alarm I Outbreak)P(Outbreak) + P(Alarm I No Outbreak)[l - P(Outbreak)] ^ {Sensitivity of Worst System}P(Outbreak)

ISensitivity of I p(Outbreak) + P(AlarmlNo Outbreak)[l- P(Outbreak)] [Worst System J /[ v /]

{False- AlarmRate}P (Outbreak) {False- AlarmRate}P(Outbreak) + P(Alarm I No Outbreak)[l- P(Outbreak)]

P(Alarm|No Outbreak)P(Outbreak) {P(Alarm I No Outbreak) }P(Outbreak) + P(Alarm I No Outbreak)[l- P(Outbreak)] {P(Alarm I No Outbreak) }P(Outbreak) P(Alarm I No Outbreak) = P (Outbreak) figure 30.2 Receiver operating characteristic (ROC) curve for worst detection system.

Equation 30.3 shows the calculation for the best case. We use a false-alarm rate of one per year, or 1/365 days = 0.0027 per day (from our Cryptosporidium example). We use a prior probability of an outbreak of 0.0035 per day, also from our Cryptosporidium example (we describe how we obtained this probability later in this chapter).

Thus, a decision maker faced with the output of our example system (that has a false-alarm rate of one per year) and a situation in which it is believed that the prior probability of an outbreak is equal to 0.0035, only knows that the posterior probability of an outbreak is between 0.0035 and 0.5654. For decision-making purposes, this is a very broad range. The optimal action to take when the probability of a Cryptosporidium outbreak is equal to 0.5654 is quite different from when it is 0.0035. In fact, the interpretation method that we next discuss shows that an estimate for the posterior probability given this anomaly in surveillance data is 0.0410.