Quantitative text analysis approaches vary along a variety of different dimensions (Popping, 2000; Robins, 1997; Smith, 1992). The following section introduces four conceptual distinctions that provide a framework for organizing the existing approaches in psychology.
Aim: representational versus instrumental. On the broadest level, text analysis methods differ with regard to whether they are representational or instrumental in aim (Popping, 2000; Roberts, 1997). The role of the receiver in normal communication is to decode as accurately as possible the intended meaning of a message. This is what representational text analysis seeks to achieve. Its goal is to develop a representation of the sender's original intention of a message. In doing so, representational analysis is interested in the manifest content of a text.
Instrumental analyses focus mainly on latent content. Independent of the author's intention, a message is analyzed for occurrences of a set of themes (e.g., hostility, anxiety, need for power). The linguistic analysis at the beginning of the chapter, for instance, was instrumental because—rather than representing what the students intended to say—it focused on selected psychological aspects of language use (e.g., words hinting at emotional and social functioning).
So far, most existing text analysis applications in psychology have been instrumental. Compared to other sciences, psychology is highly deductive in its research. Instrumental analyses allow the specification of linguistic variables as the operationalizations of theoretical constructs and thus facilitate hypothesis testing. Also, psychology has a history of going beyond manifest content by reading between the lines to unravel the "unspoken" yet psychologically existing meaning—a task that only instrumental analyses accomplish. Finally, instrumental analyses can be performed on any desktop computer; a representational analysis' mimicking of natural syntax is computationally intensive and generally requires specialized machines (as well as users).
Approach: thematic versus semantic. The second conceptual distinction concerns the extent to which text analysis exclusively identifies themes or also models the relationships among them (Popping, 2000; Roberts, 1997). Until the 1980s, virtually all text analysis was thematic in nature. Thematic text analysis maps the occurrence of a set of concepts in a text and thus can technically be solved by counting the frequency of particular target words or phrases.
Semantic text analysis seeks to extract information on the conversational meaning of a theme. For example, it can be crucial to know not only that the theme "killing" is mentioned in a text but also whether it occurred in the context of "self' or "other people." Semantic text analysis solves this problem by specifying the concrete nature of relations among themes. Hence, the level of analysis in the semantic approach is typically the clause. Semantic text analysis first specifies a semantic grammar, a subject-verb-object (S-V-O) template, in which the concepts of interest are arranged like pull-down menus (e.g., [I/we] or [he/she/they] or [an object]; [S]-killed [V]-the dog [O]). It then determines the frequency with which certain concept constellations occur. In the example at the beginning of the chapter, semantic analysis could, for example, determine how often students call their mother and go to the doctor on her recommendation—as compared to the mother calling the student or the student calling the mother after returning from the doctor.
Recently, a new development in the field, latent semantic analysis (LSA), has received an increasing amount of attention (Folz et al., 1998; Landauer & Dumais, 1997). Compared to traditional semantic approaches where an investigator defines the context in a "top-down" manner, LSA constitutes a "bottom-up" approach, where information about the semantic similarity of words is extracted by analyzing their usage across a large body of texts. Because of its flexibility, computational power, and conceptual similarity to human cognition, it is a tool with great potential for the area of psychology (Campbell & Pennebaker, 2003).
In allowing the identification of themes and the relations that exist among them, semantic text analysis provides an additional degree of freedom. For evaluating its overall effectiveness, however, it is important to keep in mind that the meaning of a sentence is rarely revealed in its surface grammar. A powerful semantic analysis thus would need to identify the underlying deep structure—a task that is yet impossible to delegate entirely to a computer. Consequently, most semantic text analysis relies on human coders to parse large amounts of texts (Popping, 2000).
Bandwidth: broad versus specific. Text analysis approaches also differ in their bandwidth (Pennebaker et al., 2003). Some approaches focus on less than a handful of specific linguistic variables. Mergenthaler (1996), for example, analyzes therapy protocols exclusively for a client's use of emotion words and cognitive words and ignores other potentially relevant information, such as the content of the therapy session or a client's linguistic style. Other approaches intend to provide a broad linguistic profile of a text. LIWC, the text analysis program from our initial example, for instance, measures up to 82 grammatical and psychological language parameters.
Although specific approaches tend to have a stronger theoretical background, broad approaches usually are more inductive and phenomenon oriented. Researchers who find a text analysis program that captures exactly what they are interested in might prefer it to an "all-rounder" type of software because of its supposed better power. However, in those cases where a compromise needs to be made between what one is interested in and what is "out there," applications with broader bandwidth offer more flexibility.
Focus: content versus style. The fourth distinction concerns the "what" versus "how" in text analysis (Groom & Pennebaker, 2002). Conceptually, it dates back to Allport's (1961) distinction between adaptive and stylistic aspects of behavior. Whereas the adaptive components of a behavior are intended and purposeful in a given context (e.g., initiating a conversation), its stylistic aspects are mostly unintended, automatic, and serve expressive rather than instrumental functions (e.g., nervous gestures while initiating the conversation). Applied to verbal behavior, this distinction captures the difference between why a person is saying something, that is, the content of a statement (e.g., "When does the next number 5 bus pass by?"), and how the person is saying it (e.g., "Excuse me, would you possibly know when the next number 5 bus is supposed to pass by here, please?"). Looking "behind" a message for verbal mannerisms (Weintraub, 1981) or linguistic styles (Pennebaker & King, 1999) reveals more subtle aspects of a communication.
Historically, both strategies have been successful in psychology (Pennebaker et al., 2003; Smith, 1992). What makes stylistic language analyses particularly intriguing is that humans naturally attend to what people are saying or writing. It is cognitively quite demanding to tune out the meaning of a message for the sake of attending to particularities in word choice (cf. Hart, 2001). Consequently, for human judges linguistic styles are hard to detect and thus constitute the perfect target for computerized word count programs that are blind to meaning.
Summary. Conceptually, text analysis applications can be organized according to whether they are representational or instrumental in their aim, thematic or semantic in their approach, broad or specific in bandwidth, and focused on language content or style. Although these distinctions may not always be clear in practice, they offer a heuristic framework for deciding which text analysis strategy to use for a certain kind of research question. The following section provides a more concrete picture of how text analysis has been applied in psychology.
Was this article helpful?