feature under consideration is the Home_Location for the ED case. Imagine (for simplicity) that there are four possible Home_Location values: NE, NW, SE, SW. We start with the rule Home_Location = NW and count the number of cases for the current day that have Home_Location = NW and those that have Home_Location =/ NW. The cases from five to eight weeks ago are subsequently examined to obtain the counts for the cases matching the rule and those not matching the rule. The four values form a 2 x 2 contingency table such as the one shown in Table 15.1.

Each contingency table is scored according to whether there is a significant increase in the fraction of records matching the rule in the recent data compared with the historical data. There many possible scoring functions: these are discussed in (Wong et al., 2003). By default, WSARE uses the Fisher exact test, described in the same paper.

WSARE searches over all possible rules that can be made out of attributes of the database records. In a typical ED data set the rules would thus include both general and specific examples, for example: HomeLocation=NW, Gender=Male, AgeGroup=Over60, Syndrome=GI, InternalBleeding=True, HomeZip=12345.

There are many anomalous scenarios that would not be detected by single-component rules. For example, there might be a slight increase in pediatric cases throughout the city and a slight increase in cases from zip code 54321, but a dramatic increase in pediatric cases from 54321. For this reason,WSARE also searches for rules made up of multiple components.

Was this article helpful?

0 0

Post a comment