Extracting Word Patterns Latent Semantic Analysis

Latent Semantic Analysis (LSA; Foltz et al., 1998; Landauer & Dumais, 1997; Landauer, Foltz, &

Laham, 1998) is a semantic text analysis strategy and concerned with the use of words in their context. Compared to most existing semantic text analysis programs, however, LSA does not adopt the top-down strategy of specifying a semantic grammar and looking at the occurrence of S-V-0 constellations. Instead—in a bottom-up manner—it distills information about the semantic similarity of words by analyzing their usage across a large body of text.

Applying singular value decomposition, a mathematical data reduction technique akin to factor analysis, LSA creates a multidimensional semantic space that allows one to calculate the similarity between any two words used in a given body of text by comparing their coordinates in the semantic space. If, for example, the words patient and physician consistently co-occur in a sentence across a large amount of text, LSA assigns them similar factor weights. Ignoring syntactical information, LSA infers similarity in meaning from patterns of word co-occurrences. LSA was initially developed as a search engine with a focus on words that carry content (i.e., nouns, verbs, adjectives). This has lead to its application as a tool to measure textual coherence (e.g., Foltz et al., 1998) and to provide computerized tutoring (e.g., Graesser et al., 1999).

More recently, LSA has been adapted to analyze textual style. For this, LSA ignores low-frequency content words and focuses on high-frequency words that have minimal semantic function (i.e., pronouns, articles, prepositions). In a reanalysis of three studies on the salutary effects of emotional writing, Campbell and Pennebaker (2003) linked an LSA measure of similarity in people's essays across 3 days of writing to their subsequent health. They found that similarity in the use of common words, especially personal pronouns, was negatively related to health benefits. This study underscores that LSA is not an esoteric tool for cognitive scientists, but can offer a fresh perspective on persistent problems in social psychology.

Clearly, LSAs word pattern analysis has limitations (Perfetti, 1998). Its inability to consider syntactic structure or to make use of acquired word knowledge certainly distinguishes it from human coders. However, Landauer et al. (1998) argued that "one might consider LSAs maximal knowledge of the world to be analogous to a well-read nun's knowledge of sex, a level of knowledge often deemed a sufficient basis for advising the young" (p. 261). LSA is representational in its aim and semantic in the approach. As explained earlier, it can focus on low-frequency words that carry content or on high-frequency words that convey linguistic style.

Was this article helpful?

0 0

Post a comment