NLP can be relatively easy or difficult depending on how complex the text is and on what variables you want to extract. For example, it is relatively easy to extract symptoms from free-text chief complaints using simple methods, because chief complaints are short phrases describing why the patient came
Handbook of Biosurveillance ISBN 0-12-369378-0
Elsevier Inc. All rights reserved.
to the ED. It is not possible to extract diagnoses from chief complaints, because information in a chief complaint is recorded before the patient even sees a physician. Once a patient is examined by a physician, the patient's diagnosis may be recorded in a dictated report. Extracting information from dictated reports is much more difficult, because a report tells a complex story about the patient involving references to time and negation of symptoms that are not present in chief complaints.
There are many types of technologies used in NLP. In general, the selection of technology depends on the linguistic characteristics of the text. There are some linguistic characteristics that are so difficult to process that effective NLP methods do not exist for them. For example, few NLP systems can accurately extract information that is being conveyed by use of a metaphor. Fortunately, metaphor is not a frequent characteristic in the data sources of potential value in biosurveillance.
In the remainder of this chapter we will discuss (1) the linguistic characteristics of clinical texts that should be considered when implementing NLP for biosurveillance, (2) the types of NLP technologies researchers are using to successfully model information in text, (3) evaluation methods for determining how successful an NLP application is in the domain of outbreak and disease surveillance, and (4) the feasibility of using NLP to encode information for biosurveillance expert systems.
Was this article helpful?