Many detection algorithms (Goldenberg et al., 2002, Zhang et al., 2003, Fawcett and Provost, 1997) assume that the observed data consist of cases from background activity, which we will refer to as the baseline, plus any cases from irregular behavior. Under this assumption, detection algorithms operate by subtracting away the baseline from recent data and raising an alarm if the deviations from the baseline are significant. The challenge facing all such systems is to estimate the baseline distribution using data from historical data. In general, determining this distribution is extremely difficult due to the different trends present in surveillance data. Seasonal variations in weather and temperature can dramatically alter the distribution of surveillance data. For example, influenza season typically occurs during mid-winter, resulting in an increase in ED cases involving respiratory problems. Disease outbreak detectors intended to detect epidemics such as severe acute respiratory syndrome (SARS), West Nile virus and anthrax are not interested in detecting the onset of flu season and would be confused by influenza. Day-of-week variations make up another periodic trend.
In WSARE 2.0, we made the baseline distribution to be raw data obtained from selected historical days. WSARE 3.0 instead learns the baseline distribution from historical data in order to better account for observed environmental factors for the current day (e.g., public holiday, day of week, weather). WSARE 3.0 takes all records prior to the past 24 hours and builds a Bayesian network from this subset.
Bayesian networks are overviewed in Chapters 13 and 18, and an introductory tutorial is available at www.cs.cmu.edu/~awm/ tutorials/ bayesnet.html. During the structure learning, we differentiate between environmental attributes, which are features such as the season and the day of week that cause trends in the data and response attributes, which are the remaining features. The environmental attributes are specified by the user based on the user's knowledge of the problem domain.
Was this article helpful?