Human review of ED reports

0.61 (0.51-0.69)

1.0 (0.96-1.0)


0.39 (0.31-0.49)

«From Wagner, M., Espino, J.,Tsui, F.-C., et al. (2004). Syndrome and outbreak detection from chief complaints: the experience of the Real-Time Outbreak and Disease Surveillance Project. MMWR Morb Mortal Wkly Rep 53(Suppl):28-31, with permission.

bFrom Chapman, W. W., Espino, J. U., Dowling, J. N., et al. (2003). Detection of Acute Lower Respiratory Syndrome from Chief Complaints and ICD-9 Codes. Technical Report, CBMI Report Series 2003. Pittsburgh, PA: Center for Biomedical Informatics, University of Pittsburgh, with permission.

cFrom Beitel, A. J., Olson, K. L., Reis, B. Y., et al. (2004). Use of emergency department chief complaint and diagnostic codes for identifying respiratory illness in a pediatric population. Pediatr Emerg Care 20:355-60, with permission.

dFrom Chapman, W. W., Dowling, J. N., and Wagner, M. M. (2005). Classification of emergency department chief complaints into seven syndromes: a retrospective analysis of 527,228 patients. Ann Emerg Med 46(5):445-455. eFrom Chapman WW, unpublished results.

/From Ivanov, O., Wagner, M. M., Chapman, W. W., et al. (2002). Accuracy of three classifiers of acute gastrointestinal syndrome for syndromic surveillance. In:

Proceedings of American Medical Informatics Association Symposium, 345-9, with permission.

gFrom Chapman, W. W., Dowling, J. N., Wagner, M. M. (2004). Fever detection from free-text clinical records for biosurveillance. J Biomed Inform 2004;120-7, with permission.

gLarge positive likelihood ratio due to specificity of 0.9999.

hNot able to calculate (denominator is zero).

CCBC, Chief complaint Bayesian classifier CI, confidence interval.

positive and negative. The likelihood ratio positive is the purest measure of the informational content of a chief complaint for detecting a syndrome (i.e., its ability to discriminate between a person with the syndrome and one without the syndrome). In a Bayesian analysis, it is a number that indicates the degree to which a system should update its belief that a patient has the syndrome, given the chief complaint (see Chapter 13).

The gold standard used in these studies varied. The most valid standard used was classification based on review of patients' ED reports using random selection of patients. The earliest studies evaluating the ability of chief complaints to identify syndromes were able to use this gold standard because they studied common syndromes, such as respiratory (Chapman et al., 2003, Beitel et al., 2004) or gastrointestinal (Ivanov et al., 2002). When a syndrome is common, a pool of randomly selected patients will produce a sufficient sample of actual respiratory cases.

Later studies examined less common syndromes. To obtain a sufficient sample of patients with uncommon syndromes, researchers searched ICD-9 discharge diagnoses to find cases (Wagner et al., 2004, Chapman et al., 2005b, Mundorff et al., 2004). Using a patient's discharge diagnosis as the gold standard enabled these studies to acquire large numbers of patients—even for rare syndromes, such as botulinic. Chart review, however, probably provides more accurate gold standard classifications than ICD-9 codes (Chang et al., 2005).

A few recent studies have used chart review as the gold standard for evaluating a variety of syndromes, including syndromes of low prevalence (Chang et al., 2005, Chapman et al., 2005c). One study compared chief complaint classification during the 2002 Winter Olympic Games against goldstandard classification of potentially positive cases selected by Utah Department of Health employees who performed drop-in surveillance (Wagner et al., 2004).

Chapman et al. (2005c) used ICD-9 searching to find a set of patients with discharge diagnoses of concern in biosurveillance. Physicians then reviewed ED reports for each of the cases to finalize a reference syndrome assignment. Using ICD-9 codes to select patients made it possible to use chart review on a fairly small sample of patients while still acquiring a reasonably sized set of patients for seven different syndromes.

An important issue is whether the same classification accuracy observed in a study of chief complaints from hospital X will be observed for chief complaints from hospital Y. Levy et al. (2005) showed that classification accuracy of a keyword-based parser differed from hospital to hospital for gastrointestinal syndrome. Chapman et al. (2005b), however, showed that the classification accuracy of a Bayesian chief complaint classifier was no different when it was used on a set of chief complaints from a geographic region other than the one that it had been trained on.

There are a number of studies in the literature that we did not include in Table 23.6 because they measured the sensitivity and specificity of an NLP program's syndrome assignment relative to a physician who is classifying a patient only from the chief complaint (Chapman et al., 2005a, Olszewski, 2003a, Sniegoski, 2004). These studies report much higher sensitivities and specificities than those in Table 23.6. These studies represent formative studies of NLP algorithms. The accuracy of syndrome classification should always be measured relative to the actual syndrome of the patient as determined by a method at least as rigorous as medical record review or discharge diagnoses when accepting or rejecting Hypothesis 1 for a syndrome under study.

In summary, the experiments in Table 23.6, although somewhat heterogeneous methodologically, are similar enough to be considered meta-analytically. They made the same measurements (sensitivity and specificity), studied similar syndromes, used simple techniques for classifying chief complaints into syndromic categories, and used similar gold standards.

With respect to Hypothesis 1, these experiments demonstrate that:

1. Chief complaint data contain information about syndromic presentations of patients and various NLP techniques including a naive Bayesian classifier and keyword methods can extract that information.

2. For syndromes that are at the level of diagnostic precision of respiratory or gastrointestinal it is possible to automatically classify ED patients (both pediatric and adult) from chief complaints with a sensitivity of approximately 0.60 and a specificity of approximately 0.95.

3. Sensitivity of classification is better for some syndromes than for others.

4. When syndromes are more diagnostically precise (e.g., respiratory with fever), the discrimination ability declines quickly.

5. The specificity of syndrome classification from chief complaints is less than 100%, meaning that daily aggregate counts will have false positives among them due to falsely classified patients.6

Was this article helpful?

0 0

Post a comment