We use the term attribute to refer to any property of a biosurveillance system (or system component) that we can measure. We can measure something as simple as the time lag between when a datum is collected about a patient to its receipt by a health department, or something as complex as the overall cost of a biosurveillance system.
There are several well-known guidelines for evaluation of biosurveillance systems (CDC, 2001, Buehler et al., 2004, Bravata et al., 2004). These guidelines include many examples of what we would term attributes, but they include other characteristics that are not easily measured and therefore do not qualify as attributes in the sense we are using the term.1
Table 37.1 summarizes the attributes we will discuss and their relationships to characteristics discussed in the published guidelines. The list in Table 37.1 is ordered roughly by formative to summative.
We included attributes with a "looking-to-buy-a-refrigera-tor" mind set, based on our experience building, deploying, and operating surveillance systems. We discuss our rationale for including each attribute.
4.1. Data Quality
Data quality refers to the completeness and accuracy of the data recorded in an information system (Hogan and Wagner, 1997).
1 The guidelines also provide advice on how to plan an evaluation and suggest information to include in a publication such as report such as a system description, a list of stakeholders; a chart describing the flow of data and the lines of response in a surveillance system; and a timeline for surveillance data. These aspects are outside the scope of this chapter and the interested reader should consult the references.
table 37^1 Measurable Attributes of Biosurveillance Systems
CDC working groups
Examples of What Can Be Measured
Data quality Sampling bias Disease coverage Reliability
Sensitivity, specificity, timeliness of case/outbreak detection Diagnostic precision for case and outbreak detection Support for outbreak characterization Time latencies
Meets functional requirements Acceptability of system or components Compliance with standards Portability
Privacy and Confidentiality Security
Benefits (morbidity and mortality reduction) Benefits (other)
Cost to build or acquire
Cost to operate
Cost to add functionality
Cost to integrate with other systems
Cost of false alarms
Sensitivity, PPV, and timeliness Alluded to
Usefulness, simplicity, representativeness Flexibility, comparable hardware and software, simplicity
Completeness and accuracy of data
Reporting by zip code, sociodemographic group
Number of diseases in system database
Number of errors by people or computers
Sensitivity, specificity, timeliness of case and outbreak detection
Sensitivity specificity and timeliness at different levels of diagnostic precision Relevant data collected by the system, collects usage during outbreak investigations Delays between collection and receipt of surveillance data The difference between prespecified requirements and actual functions Subject or agency participation rates, interview completion rates and question refusal rates; log-ins. Results of conformance testing of user interface, data format, data coding Cost to install in a different location
Compliance with local and state regulations; ability to reidentify individuals System vulnerabilities or security failures as identified by security audits Expected reductions in mortality and morbidity through earlier detection
Expected reduction in operational costs owing to policy improvements or workflow efficiency. Actual cost to build or purchase and install Salaries and overhead; hardware and licenses Actual cost Actual cost
Staff time, costs of treatments or other measures taken in response to a false alarm
CDC indicates Centers for Disease Control and Prevention.
Completeness is the proportion of data that we expect to find in the system that are actually in the system. An evaluator can measure completeness (or speaking more precisely, incompleteness) by counting the number of missing records or the number of missing fields within records. Accuracy refers to how closely the received data reflect the truth, as defined by some gold standard. An evaluator measures data accuracy with the same methods used to measure a classifier's accuracy (described in Chapter 20). The evaluator would compare data values recorded in the surveillance system to "true" values established by, for example, expert review of patient charts. Alternatively, the evaluator might use examination of other records or even direct observation of the actual patient encounter, as was done by Payne et al. (1993) in an evaluation of an immunization registry. From these comparisons, evaluators compute the sensitivity and specificity of the data received by the system, relative to the truth established by the gold standard determination.
In an operational biosurveillance system, surveillance data may arrive late. A network problem or human error may delay a transmission for hours, days, or weeks. Data may be subject to revisions and corrections by the sender after the initial submission. As a result, the completeness and accuracy of data in the databases of biosurveillance systems may improve as a function of time. An evaluator must be careful to measure the quality of data by using a snapshot of the data that reflects the state of the database at the time that an epidemiologist or other user would be conducting analyses and making decisions based on the data. The evaluator must be careful that studies of data accuracy do not include subsequent corrections and additions. This concern is especially important if the evaluator is conducting a retrospective study in which data in a database will clearly have been subject to such post hoc improvements. Unless the surveillance system timestamps all incoming data and an evaluator has used these elements to recreate an image of the data at the time of real-world use, a study may overestimate data quality.
For biosurveillance systems that receive data in batches, the unit of analysis can be the file rather than the individual patient record. Data completeness can be measured as the number of files received of the total sent. We discuss an evaluation of data completeness of batch and real-time data feeds in the final section of this chapter.
A study of data quality is an appropriate early study of a biosurveillance system. It can exert a strong formative influence on a system. The evaluation should include an error analysis of those records that are missing and those that are inaccurate to establish the causes of these errors. The evaluation should attempt to classify the causes as "potentially correctable" or "not correctable." Potentially correctable causes may include poorly crafted surveillance forms or computer-user interfaces, insufficient training and supervision of persons who complete these surveillance forms, or careless management of data.
An evaluator can reanalyze the study data set after manually correcting the potentially correctable errors to estimate an upper bound of data quality that can be expected from the system if the correctable causes of errors can be corrected (for a study of data quality in an electronic medical record system, see Wagner and Hogan, 1996).
Unless a biosurveillance system collects data about every individual in a population, it is subject to possible biases in its pattern of sampling that can affect the utility of the data. A system may sample poorer neighborhoods more or less often than affluent ones, children more or less often than adults, and institutionalized people more or less often than noninstitution-alized people. The differences can be great, as would be the case early in deployment of a biosurveillance system when hospital participation is spotty. Sampling bias is referred to by the Centers for Disease Control and Prevention (CDC) working group as "representativeness."
An evaluator can crudely measure sampling bias as the percentage of potential reporting entities from different zip codes or other geographic regions that participate in a system. As a more fine-grained measure, the evaluator can measure the fraction of received reports of a health-event with a known prevalence. For example, to estimate the sampling bias in a biosurveillance system that receives data from hospitals, published emergency department utilization data or hospital discharge data sets (described in Chapter 21) may be used to define the total number of individuals seeking care in a region, and those numbers may be compared with the number of reports received by the biosurveillance system. Hospital discharge data sets include sex, age, and zip code, enabling detailed analysis of sampling biases in those sociodemographic and geographic strata.
The decision between a using a crude or fine-grained measure of bias depends on how the surveillance data are used by epidemiologists and other end-users. If an epidemiologist or a detection algorithm uses the surveillance data to estimate incidence of disease in a strata of the population, it is essential to understand the sampling bias at that level. If, on the other hand, the analysis is not affected by biases (e.g., as in the National Retail Data Monitor) (Wagner et al., 2003), then it may not even be necessary to measure sampling bias.
The utility of a biosurveillance system is a function of the number of diseases that it covers. A special purpose system for a single disease is less valuable than is a system that covers 100 diseases if all other factors, such as cost, are equal. For a manual biosurveillance system, an evaluator can determine disease coverage by inspection of reporting forms and procedures. For an automatic system, the evaluator can inspect the structure and content of the database of the system.
A biosurveillance system, if you do not forget to include the human element, may involve thousands, if not hundreds of thousands of "components," all of which can potentially malfunction or fail.
Reliability of the computer elements of a biosurveillance system can be measured by how often the components are in a usable state when users or other computer systems attempt to access them. An evaluation can record system downtime and response delay characteristics experienced by users and other systems. Specific methods include automatic logging of unavailability of systems and functions accompanied by error analyses designed to identify correctable causes. Results can be reported as percentage of time that a system or feature was available.
Computer system reliability is relatively easy to measure, and the results of a study can be used to improve component reliability or to contribute to an estimation of overall system reliability. Information technology professionals can also audit the design and implementation of a system to determine its fault tolerance.
"Component" reliability of humans is more time-consuming to measure. In general, an evaluator would first identify the role that the person plays in the overall system and then measure the person's error rate. For example, if the person was the on-call epidemiologist and one of her roles was to field phone calls from emergency departments, a study could measure the number of phone calls missed and the reasons why.
Although an engineer can estimate overall system reliability from data about failure rates for individual components, it is also possible to test or measure overall system reliability directly. An evaluator can manipulate the inputs to a component or the overall system by injecting test data into the system at some point in the processing chain and following its appearance at various downstream points to measure time latencies or data loss and even whether and when a human notices the data or takes action. A test might involve buying, for example, 50 thermometers from a retail store and tracking their appearance in the biosurveillance system and effects on the human components. We refer to this test as the Moore test in honor of the first individual to challenge a biosurveillance system in this manner (Andrew Moore, personal communication).
4.5. Sensitivity, Specificity, and Timeliness of Case and Outbreak Detection
We discussed methods for measuring sensitivity, specificity (or false-alarm rate), and timeliness in detail in Chapter 20. We note that published guidelines recommend that evaluators measure predictive value positive and sensitivity (CDC, 2001; Buehler et al., 2004; Bravata et al., 2004). We recommend instead that, whenever possible, evaluators measure the sensitivity and specificity of a biosurveillance system. Appendix B provides a rationale for this recommendation.
Evaluators measure sensitivity, specificity, and timeliness in the field because these key properties of a biosurveillance system may differ from those measured in laboratory evaluations. In the field, a biosurveillance system is subject to the possibility of failure of many system components, whether they be human, network, or computer. If a component or the entire system fails often enough, receiver operating characteristic (ROC) and activity monitoring operating characteristic (AMOC) curves derived under more ideal laboratory conditions will overestimate the expected detection performance of a system. We note that laboratory evaluations may use carefully prepared data from which obvious errors in data files, such as duplicate records, have been removed.
Measuring field performance of case-detection algorithms is not difficult for common diseases because there will be sufficient numbers of cases. For this reason, an evaluator can conduct a field evaluation of a case-detection system almost immediately. Field evaluation of an outbreak-detection system, in contrast, is quite challenging because of the need for a large sample of outbreaks from which to measure sensitivity and timeliness.
Field evaluations of outbreak-detection algorithms are appropriate for large biosurveillance systems. Any single biosurveillance system, unless it is regional or nationwide, will not encounter a sufficient number of outbreaks unless it is deployed in several states. Field evaluations of outbreak-detection systems are appropriate for mature systems. It is hard to imagine freezing the development of a rapidly evolving biosurveillance system for a period of a year or more to accumulate an adequate sample.
One strategy for field testing of outbreak-detection algorithms is a multicenter trial in which essentially identical surveillance strategies for a common type of outbreak would be implemented in biosurveillance systems in multiple cities (or hospital in the case of nosocomial infections).A second approach would use a standard case-study format, such as the one developed by the RODS Laboratory (http://rods.health.pitt.edu/) (Rizzo et al., 2005), which encourages a uniform method for measuring time of detection, false-alarm rates, and cost of false alarms and for reporting of sufficient detail about any differences in surveillance data or systems to allow meta analysis across studies of individual outbreaks or small numbers of outbreaks.
Sensitive, specific, and timely detection of outbreaks is, of course, a raison d'être for a biosurveillance system; if its performance is not satisfactory along these dimensions, the other attributes we discuss are somewhat moot. Given that field evaluations will likely not be widely feasible until the current generation of biosurveillance systems matures, evaluators should conduct laboratory evaluations of the outbreak-detection algorithms planned for use in a system even before system development to provide insights into the expected field performance of the system.
4.6. Diagnostic Precision of Case and Outbreak Detection
As we discussed in Chapter 3, evaluators can measure sensitivity, specificity, and timeliness for health-events that vary in their level of diagnostic precision (ranging from "cow is sick" to "cow died from foot-and-mouth disease"). Thus, evaluations, and any publications resulting from the evaluations, should clearly state the level of diagnostic precision and discuss its implications for the value of the system to users.
An evaluation can study the increase in diagnostic precision that a system achieves as a function of time. Biosurveillance systems typically produce increasing levels of diagnostic precision over time because data with higher diagnostic value accumulate as a result of testing of individuals. An evaluator can study how the uncertainty about these measurements changes over time by using methods identical to those described in Chapter 20. Although this type of study has not yet been done, the analysis would involve the use of ROC curve analysis generalized to four dimensions (sensitivity, specificity, time, and diagnostic precision).
The purpose of a biosurveillance system is not limited to detection of cases and outbreaks. Outbreak characterization is an equally important function and involves identifying the set of affected individuals, the geographic scope, biologic agent, and other characteristics of an outbreak.
As suggested by the riveting descriptions of outbreak investigations in Chapter 2, biosurveillance systems have been collecting and analyzing data to elucidate all of the aforementioned characteristics for many decades, albeit in a highly labor-intensive manner involving paper forms and "shoe-leather" epidemiology. Despite the informality of many current systems, evaluators can measure the time during the outbreak when each of these characteristics was elucidated (Dato et al., 2001, 2004; Ashford et al., 2003).
As automation results in biosurveillance systems that are more formal and, therefore, more amenable to modeling, evaluators will be able to apply the methods described in Chapter 20 to measure the sensitivity, specificity, and timeliness at which these characteristics are elucidated.
We use the term time latencies to refer to the time delays between individual steps in the processing of information by a biosurveillance system. For computerized functions, an evalu-ator can use the timestamps associated with data (when they were recorded, received or transmitted) to measure time latencies between steps. For example, the timestamp of an emergency department registration and the timestamp of receipt by the biosurveillance system of that registration can be used to measure the delay from when information first became available about a patient in electronic form to its receipt by a biosurveillance organization. For manual functions of the system, time-motion studies may be required, unless the processing of information involved routine time or date stamping.
Note that the earlier CDC workgroup defined timeliness as "the speed between steps in a public health surveillance system" (CDC, 2001), whereas the other workgroup defined timeliness of outbreak detection as "the time lapse from the exposure to disease agent to the initiation of a public health intervention" (Buehler et al., 2004).
The design of modern information systems involves a process of functional requirements definition that precedes implementation (discussed in Chapter 4). During this process, a system analyst works with future users of the system to define their needs and then develops system specifications. An eval-uator can compare the actual functions of a system against the prespecified requirements.
Acceptability refers to the willingness of people and organizations to participate in a surveillance system or to use it (CDC, 2001). In particular, it refers to users' willingness to interact with a system and the willingness of hospitals, physicians, clinicians, industry, and government agencies to provide data as input.
Evaluators can use surveys or rates of participation to measure how willing these various entities are to provide data, modify their information systems to provide data (if necessary), answer questions about the data, modify format of data (when requested), and continue providing data. They can measure acceptability by using subject or agency participation rates, interview completion rates and question refusal rates (if the system involves interviews), and completeness of report forms; physician, laboratory, or hospital/facility reporting rate, and timeliness of data reporting (CDC, 2001).
A survey of users could ascertain whether usage was influenced by the health or economic importance of the health-related event, acknowledgment by the system of the person's contribution, dissemination of aggregate data back to reporting sources and interested parties, responsiveness of the system to suggestions or comments, burden on time relative to available time, ease and cost of data reporting, federal and state statutory assurance of privacy and confidentiality, the ability of the system to protect privacy and confidentiality, federal and state statute requirements for data collection and case reporting, and participation from the community in which the system operates (CDC, 2001).
Compliance with standards means the extent to which a system makes use of standards for data representation, message format, code sets, and case definitions (CDC, 2001). A rigorous evaluation would involve conformance testing. A less rigorous evaluation might involve self-reporting by the system's developers.
The standards themselves could also be evaluated, although this type of study is outside of the scope of this chapter. As biosurveillance systems begin to incorporate automatic data feeds from other computer systems, it is becoming possible to study the availability and adequacy of standards for various types of surveillance data. Methods for evaluating standards could include surveys to determine market penetration of standards, measures of costs and effort required for implementation, and measures of the degree to which the standards facilitate data exchange.
Biosurveillance systems are expensive to develop and debug. From a societal utility perspective, the value of a system (often supported by tax dollars) includes its potential to be used in another location.
Portability refers to the effort required to install a system in a second location. There may be many differences between locations in the health events of interest (e.g., malaria), the prior probabilities of disease, and the types of surveillance data available. The organization of the human and animal healthcare systems (e.g., high or low penetration of managed care, existence of call centers) may vary. There may be differences in workflow in the participating organizations, the legal environment, and even the native language. Portability can perhaps best be measured by the cost and effort to install a system in another location.
Confidentiality is the expectation that information shared with another individual or organization will not be divulged. Privacy, although sometimes confused with confidentiality, is the right of an individual or organization not to divulge information. In the United States, these rights are relative, not absolute, in the area of biosurveillance because governmental public health is granted broad power to collect data for purposes of disease control (see Chapter 5). However, many consider it a violation of an individual's right to confidentiality if governmental public health collects data that are not essential for its function. The delineation between what governmental public health should and should not collect is dictated by the ethical principle of "need-to-know," which asserts that the intrusion into a person's privacy should be the minimum required to accomplish a function. For example, a health department does not need to know the name and addresses of every patient that visits an emergency department, although some health departments in the United States currently receive such information.
Privacy and confidentiality in electronic environments are addressed and protected in three ways: public policy, including state and federal law; technology itself, including adequate security, audit trails, and various forms of access restriction; and education and training (Alpert, 1998).
An evaluation of confidentiality could measure the rate of confidentiality violations by using surveys, anthropological observational methods, or incident reports. An evaluator could also examine the data access privileges granted to various users of the system against the data required to do their jobs. This relationship is an example of the need-to-know principle.
An evaluation of privacy and confidentiality might involve convening a panel comprising ethicists, legal experts, community representatives, and outside experts on biosurveillance to examine the data collection and management practices of a biosurveillance organization (Yasnoff et al., 2001).
Security is the degree to which a biosurveillance system is vulnerable to corruption through physical damage, corruption or data theft by hackers, theft of passwords, or denial-of-service attacks. Security is typically measured by a comprehensive audit of physical security of the server site, firewalls, monitoring procedures, examination of documentation, and account authorization conducted by experts on computer and system security. Results of security evaluations are rarely if ever published because they identify vulnerabilities that could be exploited.
4.15. Benefits (Mortality and Morbidity Reduction)
The purpose of biosurveillance is to improve health. Therefore, a focus of evaluation should be on assessing a system's contribution to this goal.
The improvement in mortality and morbidity owing to a system would be the most direct measure of benefit, but it is difficult to quantify. To measure directly, evaluators need models of the expected effect with and without the system. For a further discussion of this topic, see Chapter 31, "Economic Studies in Biosurveillance."
Evaluators often use less direct measures of benefit that correspond to different points in the chain linking information to benefit depicted in Figure 1.3 (see Chapter 1). An evaluator can measure the improvement that a system achieves in the quality of information or decision making at many points in this chain and, assuming that all of the downstream steps in the chain function optimally, project the expected benefit in terms of improvement in health or reduction in mortality and morbidity.
Instances in which use of the system led to early detection of cases or outbreaks also provide evidence of system value. Examples of case reports of this type have been compiled and can be accessed by authorized public health officials at https://www. rods.pitt. edu/cases/.
The CDC guidelines discuss benefit mainly under "usefulness," which includes both benefit to policy making and disease-control programs. (CDC, 2001)
4.16. Benefits (Other)
Biosurveillance systems can improve the efficiency of an organization, thus lowering its operational costs. The systems can also provide information that facilitates assessment of the effect of prevention and control programs; leads to improved clinical, behavioral, social, policy, or environmental practices; or stimulates research intended to lead to prevention or control (CDC, 2001). Evaluators can measure improvements in efficiency by examining staffing before and after system deployment or modification and, at a minimum, can enumerate the additional uses of a system.
We discuss the costs to build, operate, expand, and integrate biosurveillance systems in separate sections because they correspond to different phases of biosurveillance-system development, and the costs of the earlier phases will be known and available for publication before the costs of later phases.
However, some general principles apply to estimating costs in all phases. When estimating costs, a first step is to identify all the cost items, which can be derived from (1) lists of personnel/consultants, their specific roles, skills, and costs; and (2) descriptions of hardware and software required and their costs (CDC, 2001). More detailed cost-related factors are listed under "simplicity" in the CDC guidelines: for example, amount and type of data necessary to establish that a health-related event has occurred; time spent on collecting data; amount of follow-up that is necessary to update data on a case; time spent on transferring, entering, editing, storing, and backing up data; time spent on analyzing and preparing the data for dissemination; staff training requirements; and time spent on maintaining the system.
The costs of the build-or-acquire phase include the costs of planning; acquisition of software, hardware, and licenses; the fees charged by contractors; and the labor and facility costs of the organization. The total is usually known by the organization through its normal accounting procedures.
The cost to operate a system is also usually known by the organization. It comprises costs of personnel, license fees, hardware upgrades, and costs involved with any repair of the system's computers, including parts and service.
If a system has been in field operation for a period of time in excess of five days, resources have already likely been expended to plan to increase the functionality of the system (at least person-hours have been expended complaining about the system around the water cooler). If the system has been in field operation for several years, the developers have no doubt made substantial extensions, and there is experience with the cost and effort required to modify the system. The CDC guidelines use the term flexibility to refer to the cost and effort to modify a system in response to changing information needs or operating conditions, such as new health-related events, changes in case definitions or technology, and variations in funding or reporting sources. The change could be as simple as adding a new question to a form or as complex as the cost of fundamentally changing some aspect of the system.
Because disease outbreaks do not respect the jurisdictional boundaries of hospitals or of health departments, biosurveillance systems operated by these entities must exchange data and, more generally, interoperate with each other. Integration consumes personnel time and financial resources of both organizations, and evaluators can study the costs and effort to develop this functionality and operate it through examination of project records and budgets of the two organizations.
Because of the inherent uncertainty of case and outbreak detection and characterization, users of biosurveillance systems may take actions that, in retrospect, were unnecessary or incorrect. The wasted effort, costs, or potentially more dire consequences of such actions are of great interest to users of these systems and their managers. These effects influence decisions to use a system, and how to configure its alarm thresholds and other parameters.
There have been no published studies that comprehensively investigate this effect.
5. AN EXAMPLE EVALUATION: FIELD TESTING
Was this article helpful?