Every word in a language has at least one part of speech. The most common parts of speech in English are noun (e.g., "tuberculosis,'' "heart''), verb (e.g., "see,'' "prescribe''), adjective (e.g., "severe,'' "red''), adverb (e.g., "quickly,'' "carefully''), determiner (e.g., "the,'' "some''), preposition (e.g., "of,'' "in''), participle (e.g., "up,'' "out''), and conjunction (e.g., "and,'' "but''). The difficulty in automatically assigning a part of speech to words in a sentence is that some words can have more than one part of speech. For example, the word "discharge'' can be a verb or a noun. Automated part-of-speech taggers use either rules or probability distributions learned from hand-tagged training sets to assign parts of speech and perform with an accuracy of 96-97% on general English texts, such as newspaper articles, scientific journals, and books. Part-of-speech distribution in patient reports is different than that of nonclinical texts. For example, discharge summaries contain more nouns and past tense verbs and fewer proper nouns (e.g., people and company names) and present tense verbs (Campbell and Johnson, 2001). Not surprisingly, training a part-of-speech tagger on medical texts improves its accuracy when assigning parts of speech to patient reports (Campbell and Johnson, 2001, Coden et al., 2005). Publicly available part-of-speech taggers trained on medical documents are just beginning to become available (Smith et al., 2004).
A word's part of speech can sometimes be helpful in understanding which word sense is being used in a sentence. Returning to the example of the word "discharge,'' a statistical analysis of the distribution of "discharge'' in patient reports may show that if "discharge'' is being used as a verb, the word sense is more likely Disch1 (release from hospital).
Syntactic rules use the part of speech to combine words into phrases and phrases into sentences. For instance, an adjective followed by a noun is a noun phrase, an auxiliary verb followed by a verb is a verb phrase, and a preposition followed by a noun phrase is a prepositional phrase. Phrases can be combined so that a noun phrase followed by a prepositional phrase creates another noun phrase and a noun phrase followed by a verb phrase creates a sentence. This process of breaking down a sentence into its constituent parts is called parsing. Automated parsers employ a grammar consisting of rules or probability distributions for generating combinations of words and a lexicon listing the possible parts of speech for the words. Automated parsers may attempt to produce a deep parse that connects all the words and phrases together into a sentence (Figure 17.2[a]) or a partial parse (also called a shallow parse), which combines words into noun phrases, verb phrases, and prepositional phrases but does not attempt to link the phrases together (Figure 17.2[b]). A deep parse gives you more information about the relationships among the phrases in the sentence but is more prone to error. A partial parse is easier to compute without errors and may be sufficient for some tasks.
As with part-of-speech tagging, the syntactic characteristics of patient reports differ from those of nonclinical texts (Campbell and Johnson, 2001). A publicly available parser trained on medical texts does not yet exist. Szolovitz (2003) showed that the Link Grammar Parser (available at www. link.cs.cmu.edu/link/) only recognized 38% of the words in a large sample of ED reports. For this reason, he adapted the SPECIALIST Lexicon distributed by the National Library of Medicine to the format required for the Link Grammar Parser
(b) Np[The patient] vp[denies] np[shortness] PP[of breath], Nf>[chest pain], up [nausea], np [vomiting], <-oni[but] VP | reports] Np[headacbe).
figure 17.2 (a) The tree structure of a deep parse in which words are combined into phrases and phrases are combined into a sentence. det, determiner; N, noun; prep, preposition; adj, adjective; aux, auxiliary verb; v, verb. (b) A partial parse that only labels simple phrases and conjunctions (conj) without linking the phrases together.
and provided over 200,000 new entries for the Link Grammar Lexicon, quintupling the size of the original Lexicon (available at www.medg.lcs.mit.edu/projects/text/).
The syntactic structure of a sentence can provide information about the semantic relationships among the words. For example, in Figure 17.2(a) a relationship between the mass and the right upper lobe is indicated by the fact that the prepositional phrase "in the right upper lobe'' is attached to the noun phrase "the mass.'' Statistical methods that rely on whether or not a word or phrase occurs in the sentence without requiring a syntactic relation between the constituents may mistakenly infer a location relation between a noun phrase and prepositional phrase. For instance, in sentence 22, the noun "mass'' and the prepositional phrase "in the right upper lobe'' both occur in the sentence, but without syntactic knowledge there is no way to know the phrases are actually unrelated.
(22) There is no change in the mass, but the infiltrate in the right upper lobe has increased.
Was this article helpful?