Linguistic Characteristics Of Clinical Textwhat Makes Nlp Hard

According to Zelig Harris (Friedman et al., 2002), the informational content and structure of a domain form a specialized language called a sublanguage. The sublanguage of patient medical records exhibits linguistic characteristics that influence an NLP system's ability to extract information from the text. When a physician reads a patient's medical reports, she understands the linguistic characteristics of the text and can make reasonable inferences from the record. For instance, a physician will not assign the Respiratory Fx a value of yes if the respiratory finding described in the report is described as occurring in the patient's past history. For an NLP application to determine the values of clinical variables from patient records the same way a physician would, the application must account for or model the linguistic characteristics of the clinical text. Some important linguistic characteristics of the sublanguage of patient reports are (1) linguistic variation, (2) polysemy, (3) negation, (4) contextual information, (5) finding validation, (6) implication, and (7) co-reference.

