Chief complaints and ICD-coded diagnoses are among the most highly available types of biosurveillance data that we discuss in this section of the book and they are being monitored by many biosurveillance organizations.
Chief complaints are routinely collected by the healthcare system. They are available early in the course of the clinical care process and they contain information about a patient's symptoms. Research to date has demonstrated that they can be readily obtained from emergency departments and hospitals, more often than not in real time as HL7 messages. They are less accessible in the outpatient setting. Chief complaints must be subject to NLP to extract information for biosurveillance. The two basic approaches are keyword matching and Bayesian text processing. The NLP assigns a chief complaint either directly to a syndrome category, or extracts symptoms that are then further processed in a second step to determine whether the patient matches a Boolean case definition. Research on the accuracy of automatic syndrome assignment from chief complaints shows that automatic classification from chief complaint data into syndromes such as respiratory and gastrointestinal can be accomplished with moderately good sensitivity and specificity. This accuracy is not sufficient to support detection of small outbreaks. When researchers have attempted to automatically assign patients to more diagnosti-cally precise syndromes such as febrile respiratory, sensitivity declines quickly—chief complaints simply do not contain sufficient information to support such automatic assignments. Research has also demonstrated that detection algorithms can use daily counts of syndromes derived in this manner to detect large respiratory or diarrheal outbreaks such as those due to influenza and rotavirus. It is likely that large outbreaks of diseases that initially present with other syndromes monitored for would also be detected, although studies of such outbreaks do not yet exist, so this point remains a matter of opinion.
ICD codes are also a type of data routinely collected by the healthcare system. They are far more heterogeneous in meaning than chief complaints as the ICD coding system contains codes for symptoms, syndromes, and diagnoses at different levels of diagnostic precision. Additionally, codes are collected at multiple points during the course of a patients illness and the accuracy and diagnostic precision of coding may vary across these points. ICD codes become available later in the course of the clinical care than chief complaints. From the perspective of biosurveillance, they contain indirect information about a patient's symptoms: the reasoning process that developers use when they include an ICD code in a "code set'' is "would a patient with disease X have respiratory symptoms? If so, let's include the code for disease X in a code set for respiratory.'' Research to date has demonstrated that ICD codes can be readily obtained from the healthcare system. The military fortuitously uses ICD to encode all services (and doctors do the encoding).There are several examples of ICD code sets developed for biosurveillance. A surprisingly small amount of work has been done on developing very specific ICD codes sets to automate the surveillance for conditions such as pneumonia. Research on the accuracy of automatic syndrome assignment shows results similar to chief complaints. As with chief complaints, this accuracy is not sufficient to support detection of small outbreaks. Research on the accuracy of automatic disease assignment using hospital discharge diagnoses shows higher accuracy. Research has also demonstrated that detection algorithms can use daily counts of syndromes derived in this manner to detect large respiratory or diarrheal outbreaks such as those due to influenza and rotavirus. It is likely that large outbreaks of diseases that initially present with other syndromes monitored for would also be detected, although there are no natural occurring examples, so this point remains a matter of opinion.
The results suggest a few immediate applications such as influenza monitoring and early warning of cohort exposures. They also carry an important implication: users of systems should not be overly reassured about the absence of a spike of activity as these data when used alone are not likely to detect a small outbreak. Monitoring of chief complaints and ICD-coded diagnoses alone simply does not have high sensitivity for small outbreaks (unless they are tightly clustered in space and time and possibly demographic strata and the detection system is capable of exploring those dimensions).
It is unfortunate that some authors over-interpret these limited results to conclude that "syndromic surveillance'' does not work. A more accurate summarization of the state-of-the-art might be that surveillance of diagnostically imprecise syndrome categories is not capable of detecting small outbreaks because of the background level of patients satisfying the broad syndrome definition. When the availability of additional clinical data (e.g., temperatures and laboratory test orders or results) allows monitoring of more diagnostically precise syndromes, we expect that smaller outbreaks will be detectable and the time of detection of larger outbreaks will improve. This type of syndromic surveillance has been practiced for quite some time (e.g., polio, AIDS), albeit manually due to the lack of automatic access to required surveillance data.
Was this article helpful?