Biosurveillance Using Bayesian Networks

As an example of the use of Bayesian networks for biosurveillance, this section presents an approach to modeling and detecting non-contagious outbreak diseases, such as disease due to airborne-released anthrax. In particular, the approach models each individual in the population being monitored for an outbreak. Modeling an entire population of people in just one city-wide area leads to a Bayesian network model with millions of nodes. For example, the model reported here contains approximately 20 million nodes. Each individual in the population is represented by a 14-node subnetwork, which captures important syndromic information that is commonly available for health surveillance (such as ED chief complaints), while avoiding any information that could personally identify the individual (e.g., name, social security number, and home street address).

Given current data about individuals in the population, we use a Bayesian network to infer the posterior probabilities of outbreak diseases in the population. To provide timely detection, inference needs to be performed in real time, such that the biosurveillance system "keeps up'' with the data streaming in. Once the probability of an outbreak exceeds a particular threshold, an alert is generated by the Bayesian-network-based biosurveillance system; this alert can serve to warn public health officials.

Using such a large Bayesian network presents both modeling and inference challenges. To help make modeling more tractable in terms of computational space, we use the following approach: if some groups of people are indistinguishable, according to the data being captured, we model them with a single subpopulation subnetwork. To speed up inference, we use a method that need only update the network state based on new information about individuals in the population (such as newly available clinical information that is based on people who have recently visited EDs in seeking care).

A key contribution of this section of the chapter is the explication of assumptions and techniques that are sufficient to allow the scaling of Bayesian network modeling and inference to millions of nodes for real-time surveillance applications, thus providing a proof of concept that Bayesian networks can serve as the foundation of a system that effectively performs Bayesian biosurveillance of disease outbreaks. With this foundation in place, many extensions are possible, and we outline several of them in the final section of the chapter.

In the remainder of this section, we first outline our general approach for using causal Bayesian networks to represent noncontagious diseases that can cause outbreaks of disease. Next, we introduce the specific network we have constructed to monitor for an outbreak caused by the outdoor release of anthrax spores. We then describe an experiment that involves injecting simulated cases of patients with anthrax (which were generated from a separate model) onto background data of real cases of patients who visited EDs during a period when there were no known outbreaks of disease occurring. We measure how long it takes the Bayesian network system to detect such simulated outbreaks. Finally, we discuss these results and suggest directions for future research.

0 0

Post a comment