This chapter completes our discussion of evaluation methods, which began in Chapter 20 ("Methods for Evaluating Algorithms") and continued in Chapter 21 ("Methods for Evaluating Surveillance Data"). Although this chapter is conceptually self-contained, readers may benefit from reading the other chapters, either before or alongside this chapter.
In the earlier chapters, we discussed experimental methods that can provide insight into algorithms and surveillance data— both fundamental elements of any biosurveillance system. Analogous to the testing conducted by an engineer, the methods described in those chapters are best understood as laboratory or "bench" tests of components that we intend to incorporate into more complex systems, once they are thoroughly debugged and optimized.
In this chapter, we discuss methods for evaluating other elements of a biosurveillance system, such as computers that collect, store, display, or transfer data, as well as methods for the overall evaluation of a completed system. The methods apply to biosurveillance systems that are completely manually or a mixture of manual and automatic elements. The human element in a biosurveillance system always includes epidemiologists, who may or may not find a system useful or who may be misled by information that is not presented clearly. It commonly includes information technology staff, who must maintain the system, and data entry personnel, who can make errors or find the system difficult to use. We refer, collectively, to evaluations of operational systems as "field testing." Many important characteristics of biosurveillance systems can only be determined once they are deployed, at least partially, in the field.
Similar to a consumer report about refrigerators, a field evaluation may measure multiple characteristics of a system. Ultimately, the results can be summarized in a comparison table of price, reliability, sensitivity, false-alarm rate, timeliness and other attributes of the system. These results, if published, can then guide future consumers, who know which attributes are most important to them. Unlike a consumer product, however, biosurveillance is a societal function. Therefore, evalua-tors should estimate the benefit and cost of a system from the perspective of societal utility; ultimately, the aim is to understand the value of a biosurveillance system based on its cost and expected benefit to a population.
This chapter begins with examples of questions that can only be addressed through field testing. These questions clarify the role of field testing. After this initial discussion, we examine specific attributes of systems and components that can be measured and discuss briefly the methods that can be used. The list of attributes is lengthy, numbering 21 in total. But, as you will see from the example we provide of a field evaluation at the end of this chapter, the set of attributes relevant to any specific study is a small subset drawn from this list, and the goals of the evaluation will readily lead you to the appropriate attributes.
2. QUESTIONS NOT ADDRESSED BY BENCH TESTING OF DATA AND ALGORITHMS_
The following questions can only be answered by field testing a biosurveillance system:
• Do epidemiologists use the system?
• How often do the components of the system fail and result in degradation of the system or complete failure of the system?
• What is the quality of the data under field conditions?
• What are the time latencies inherent in the various processing steps of a system?
• What is the actual detection and characterization performance in the field?
• What are the benefits of a system?
• How long does it take to deploy or build a system?
• What are the costs and level of effort required to build and maintain a system?
• Are the correct decisions and actions taken in response to surveillance data?
• How well does the system interoperate with other systems?
A typical laboratory evaluation focuses on a single use, a single type of data, or the information needs of a specific user. A fielded biosurveillance system may have multiple uses and multiple users. The range of uses, data, and users of a fielded biosurveillance system are, typically, quite large. The uses may include, in addition to prevention and control of disease, policy making, needs assessments, and accountability. These observations imply that many studies may be required to fully understand the properties of a biosurveillance system.
3. GOALS OF FIELD TESTING_
As with any scientific experiment, it is essential that the individual responsible for the evaluation establish a clear objective for a study. Likewise, it is important for an evaluator to ensure that the evaluation's goal is appropriate for the stage of development or maturity of a system or component being evaluated. If a system is complete and its component parts well studied, the appropriate goal might be measuring its detection performance and an appropriate method might be a prospective trial. This goal and method, however, would be extremely premature for a recently installed surveillance system that was receiving data from one hospital or a small fraction of hospitals in the surveillance region.
In general, a system (or any of its components) has a development life cycle. Evaluators may subject the system to many studies during that life cycle, perhaps with each study having a different goal. For systems that reach maturity and have been subjected to many evaluations along the way, the evaluation goal ultimately becomes the following: "What is the value of the system, and what is its cost?" This is the highest (and most expensive) level of evaluation.
Professional evaluators use the concepts of formative study and summative study to help them achieve clarity about their goals in designing and conducting a study. A formative study is one with the goal to improve a system or a system component. A formative study produces insight about whether a component or system is working as expected and whether we can improve it. A formative study might simply determine whether a biosurveillance system receives all data from hospitals that are transmitted and, if not, what is causing data loss.
Formative studies are most useful to the organization that is using or developing the system. Formative studies produce information that the organization can use to improve the system, and typically include an analysis of errors or failures. A formative study may or may not be suitable for publication, depending on the extent to which the results would be useful to other organizations.
The goal of a summative study is to establish the value of a system. An evaluator most often measures value relative to some other system or approach, but he can also measure it in more absolute terms (e.g., cost and/or benefit). For example, a study may focus on how well a system detects cases of disease under field conditions compared with an alternative approach. Such a study would use measurement techniques identical to the techniques described in Chapter 20 ("Methods for Evaluating Algorithms").
The results of summative studies are generally useful not only to the organization conducting the study but also to other organizations who are grappling with questions such as what type of biosurveillance system to acquire, or how to further develop their existing systems. For this reason, evaluators generally publish the results of summative studies. Summative evaluations require considerable effort and should not be undertaken prematurely on a system that, for example, does not have a mature data collection subsystem or has not been subject to extensive formative testing. One would like the system to be as good as it can be before going to the effort to benchmark its performance.
These distinctions between formative and summative are not absolute. A study may have both formative and summative features. A summative study may have a strong formative influence on a system. The intense scrutiny that a system receives during the evaluation may reveal areas of improvement. In some sense, the concept "formative" is simply a device to remind and encourage evaluators to conduct the simplest study that meets the need.
Because the experiments described in previous chapters on evaluation were obviously scientific experiments, we did not point out that evaluations, in general, are scientific experiments. It is easy to lose sight of this fact when field testing biosurveillance systems because of the complexity of the systems and the logistics of an evaluation. But, please keep in mind that any evaluation is a scientific experiment—nothing more and nothing less—and should be conducted like one, using appropriate methods, clarity, and scientific rigor.
Was this article helpful?