We insert another copy of Table 14.1, with some of the above methods added for comparison. The newly evaluated algorithms follow:
• Regression using two features: the mean count over the past week and hours of daylight. This allows the algorithm to account for seasonal variation by putting a negative coefficient in front of hours of daylight.
• Regression using the additional feature is_Monday, which is set to 1 if today is Monday and 0 otherwise. This allows the algorithm to anticipate the Monday bump in physician visits and so be less prone to false positives.
• Regression using indicator variables for all days of the week except for Sunday (which would be redundant), and additionally, hours_of_daylight and mean_count_over_pre-vious_seven_days.
• Using sickness/availability to compensate day-of-week effects, and then using the approach of comparing against yesterday. This method thus looks for jumps in the day-of-week-adjusted counts.
For this data set, with its seasonal components and day-of-week components, we see that sickness availability (to cope with day-of-week effects) combined with moving average (to cope with seasonal trends) performs well. Some of the regression methods perform almost equally well. We should note that this does not mean these methods are best in general: individual properties of individual data sets mean different approaches can be stronger for different data sets. Our only general advice is that in our experience relatively simple methods usually work at least as well as complex approaches. A second important note is that the numbers in Table 14.2 cannot be used as an estimate of how quickly real outbreaks are expected to be detected: the numbers are a function of many things, including the simulated magnitude of the outbreaks and the simulated noise levels.
Was this article helpful?