Discourse Relationships Among Sentences

Sentences in a patient report are not meant to stand alone-they often convey a story about the differential diagnosis and treatment process for a patient. Some of the variables our example SARS expert system would need cannot be obtained without integrating and disambiguating information from the entire report. Once the individual variables have been located in a report, some type of discourse processing must integrate values for the variables to answer questions such as: (1) Were the relevant findings reported for the patient or for someone else (e.g., a family member, as in "patient's mother died at the age of 48 with an MI'')? (2) Did the relevant findings occur at the current hospital visit (versus past history or hypothetical findings)? (3) Is it likely the patient has a respiratory disease or disorder? Three discourse techniques that may help answer these questions are section identification, co-reference resolution, and diagnostic modeling.

Patient reports are semistructured, depending on the type of report and the institution from which the report is generated. For instance, ED reports may contain sections for chief complaint, past history, history of present illness, physical exam, radiologic or lab findings, hospital course, discharge diagnosis, and plan. The section in which a finding is described can provide information important to understanding the meaning of the report. For example, our SARS detector may have a variable for pneumonia history, a variable for radiological

figure 17.3 Partial Bayesian network for radiological findings.Words in leaf nodes come directly from text,concepts in other nodes (shown with asterisks^]) are inferred based on training examples. Two sentences are slotted in the network: (a) There is a hazy opacity in the left lower lobe. (b) Both upper lobes show ill-defined densities.

figure 17.3 Partial Bayesian network for radiological findings.Words in leaf nodes come directly from text,concepts in other nodes (shown with asterisks^]) are inferred based on training examples. Two sentences are slotted in the network: (a) There is a hazy opacity in the left lower lobe. (b) Both upper lobes show ill-defined densities.

evidence of pneumonia, and a variable for a pneumonia diagnosis. An instance of pneumonia described in the radiological findings section of a report is likely to provide radiological evidence of pneumonia, whereas an instance of pneumonia in the discharge diagnosis section probably indicates a diagnosis. Report section identification can also assist in understanding whether the finding occurred in the past history, the current visit, or as a hypothetical finding (e.g., a finding described in the plan section is more likely to be a hypothetical finding), can identify findings for family members (e.g., a finding in the social history section may not be the patient's finding), and can provide insight regarding the anatomic location of an ambiguous finding (e.g., a mass described in the radiology finding section is probably a pulmonary mass).

Patient reports tell a story involving various findings, physicians, patients, family members, medications, and treatments that are often referred to more than once in the text. Identifying which expressions are really referring to the same entity is important in integrating information about that entity. Useful discourse clues for identifying coreferring expressions include how close the expressions are within the text (e.g., a referring expression is more likely to refer to a referent in the previous sentence than to a referent five sentences back), overlapping words (e.g. "the pain'' is more likely to refer to "chest pain'' than to "atelectasis''), and the semantic type of the entities (e.g., "she'' can only refer to a human entity, not to a finding or disease).

Integrating the clinical information within a report to determine the clinical state of the patient (e.g., the likelihood the patient has SARS) requires a diagnostic model relating the individual variables or findings to the diagnosis. Many diagnostic models have been used in medicine, including rule sets, decision trees, neural networks, and Bayesian networks. Diagnostic models are also helpful for determining the values of individual variables. For example, a Bayesian network can model which radiological findings occur with which diseases. With this type of semantic model, even if a report did not mention pneumonia, for example, the model could infer that acute bacterial pneumonia is probable given the radiologic finding of a localized infiltrate (Chapman et al., 2001c).

None of the NLP techniques we have described perform perfectly, but some of the techniques described in this section are easier to address than others. For instance, automatic part-of-speech taggers perform similarly to human taggers. The ability to perform inference on information in a report as a physician does is more complex, entailing both semantic and discourse modeling.

Although the task is difficult, developing NLP techniques for classifying, extracting, and encoding individual variables from patient medical reports is feasible and has been accomplished to different extents by many groups. Successful extraction of variables in spite of imperfect syntactic and semantic techniques can occur for many reasons, including access to the

UMLS databases and tools, structure and repetition within reports, and modeling a limited domain. NLP research over the years has revealed that NLP techniques perform better in narrower domains. For instance, modeling the lexical semantics of the biomedical domain is easier than modeling the lexical semantics of all scientific domains, and modeling the lexical semantics of patient reports related to SARS would be easier than modeling all clinical findings in patient reports.

Most of the studies in NLP have focused on the ability of the technology to extract and encode individual variables from the reports. Fewer studies have integrated NLP variables from an entire report to diagnose patients or have evaluated whether an NLP-based expert system can improve patient care. Below we discuss different levels of evaluation of NLP technology related to biosurveillance.

Swine Influenza

Swine Influenza

SWINE INFLUENZA frightening you? CONCERNED about the health implications? Coughs and Sneezes Spread Diseases! Stop The Swine Flu from Spreading. Follow the advice to keep your family and friends safe from this virus and not become another victim. These simple cost free guidelines will help you to protect yourself from the swine flu.

Get My Free Ebook


Post a comment