How Far Can We Go With Counting Words

Given that word count-based measures possess rather good psychometric properties, how far can we go with counting words? Frequently researchers voice their scientific disdain for text analysis programs that are unable to distinguish between sentences as simple as "the dog bit the man" and "the man bit the dog" (Hart, 2001, p. 53). Its blindness to context makes word-count approaches sometimes appear painfully dumb. Not only are they unable to pick up irony or sarcasm (e.g., "Thanks a lot," accompanied by a roll of the eyes) and metaphoric language use (e.g., "He had the key to her heart"), but they also confuse words that have different meanings in different contexts (e.g., "What he did made me mad" vs. "I'm mad about the cute person in my class"). In a discussion of the shortcomings of a program such as the General Inquirer, Zeldow and McAdams (1993) went as far as to entirely question the value of lower-level word counts.

Over the last five decades, however, word-count approaches have repeatedly demonstrated their potentials in virtually all domains of psychology (e.g., Gottschalk, 1995; Hart, 1984; Martindale, 1990; Pennebaker et al., 2003; Stone et al., 1966; Weintraub, 1981). Often, to test psychological hypotheses, it is not necessary to specify grammatical relationships between themes; instead, it is sufficient to know that certain themes (co-) occur in a text. In fact, Hart (2001) even construed thematic text analysis' blindness toward context as its biggest advantage. Because humans so readily understand the communicative meaning of words, having a computer that counts themes under full neglect of their semantic surroundings provides researchers with information that is largely inaccessible to self-report or observational methods.

If one accepts that the study of words can be psychologically meaningful, which words should researchers focus on? It is interesting that virtually every text analysis approach has started from the assumption that emotional states can be detected by studying the use of emotion words (cf. Bestgen, 1994). The reality is that in daily speech, emotional writing, and even affect-laden poetry, less than 5% of the words can be classified as emotional (Mehl & Pennebaker, 2003; Pennebaker & King, 1999). From an evolutionary perspective, it is unlikely that language has evolved as a vehicle to express emotion. Instead, humans use intonation, facial expression, or other nonverbal cues to convey feelings. Emotional tone is also expressed through metaphor and other means not related to emotion words. Taken together, embarking on emotion words to study human emotions has not emerged as a particularly promising strategy (Pennebaker et al., 2003).

Content-based dictionaries are generally comprised of word categories that the researcher created based on more or less empirically supported intuitions of what words are indicative of certain themes (e.g., the word football is indicative of the theme sport). Hence, content dictionaries always have a subjective and culture-bound component (Shapiro, 1997). Markers of linguistic style, however, are generally associated with relatively common "contentfree" words, such as pronouns, articles, prepositions, conjunctives, and auxiliary words—also referred to as particles (Miller, 1995). Particles are easier to handle because their meaning is less ambiguous, less context bound, and more determined by grammatical rules. In the English language, there are fewer than 200 commonly used particles, yet they account for over half the words we use.

From a psychological perspective, not all particles are equal; personal pronouns have emerged as particularly revealing (Pennebaker et al., 2003). Although the use of the first-person singular ("I"), for example, indicates an explicit distinction that speakers make between themselves and their social world, the use of the first-person plural ("we") suggests speakers experience themselves as part of a larger social unit. Empirically, the use of the firstperson singular is associated with age, sex, neuroti-cism, depression, illness, and more broadly, attention focused on the self (Pennebaker et al., 2003). The use of second-person ("you") and third-person ("he," "she") pronouns, by definition, show that the speaker is socially engaged or aware. So, it becomes clear that in the conversational context, pronouns have important social implications. The empirical evidence to date underlines this by pointing to their role as powerful markers of psychological processes and predictors of mental and physical health (Pennebaker et al., 2003).

