The Importance Of Multiple Sources Of Data

Multiple independent sources of information can increase both sensitivity and specificity. Let us begin with a drastically oversimplified example. Suppose Sensor A has a daily 90% chance of signaling an attack (event SIGA) if one occurs, and a 1% chance of signaling an attack when there is none. Suppose sensor B monitors an independent data source, and has a daily 90% chance of signaling an attack (event SIGB) if one occurs, and a 1% chance of signaling an attack when there is none. Writing ATT as the event of attack, we have:

P(SIGAI~ATT) = 0.01 P(SIGAIATT) = 0.9 P(SIGBI~ATT) = 0.01 P(SIGBIATT) = 0.9

Then there is now a 99% chance that at least one detector will signal if there is an attack, and there is only a probability of

1 in 10,000 that both detectors will signal if there is no attack. Thus, in many situations (both the [SIGA and SIGB] case and the [~SIGA and ~SIGB] case), the operational decision is much clearer. Even in the case of inconsistent signals, the analysis task can be better informed.

Another compelling example concerns time series analysis. Figure 15.1 shows two times series, for daily sales of two fictional products in a fictional city. Apart from a general upward trend, no serious anomalies stand out.

If, however, we look at the same data in a different way, February 20 stands out as somewhat anomalous. The new view is a scatterplot. For each date, it plots one data point, with the x-coordinate denoting sales of Product A, and the y-coor-dinate denoting sales of Product B. There is a general correlation in sales, but February 20 is atypical because B sales are high, taking into account A's sales.

In this chapter, we briefly survey methods which can notice effects that are revealed by inspecting more than one time series at a time.

Was this article helpful?

0 0

Post a comment