Self-report evaluations of emotion. Self-report measures, where participants provide an evaluation of their emotional experience, form the most diverse yet most widely used set of assessment tools for measuring emotion (Larsen & Fredrickson, 1999). Measures range from rating scales and adjective checklists, to analog scales and real-time rating dials. Proponents of self-report measures (e.g., Baldwin, 2000) assume that participants are in a privileged position to monitor, assess, and integrate information about their own emotions, and therefore self-report measures should not be thought of as second-rate proxies for better measures. Critics of self-report measures (e.g., Schaeffer, 2000), on the other hand, argue that there are so many biases, distortions, and methodological limitations that self-reports of anything, even one's home address, are fraught with error and misinformation. Self-report measures are nevertheless the most widely used assessment tools in emotion research.
Although there are a great many self-report instruments, considerable similarities can be found among them. Here we present a few exemplars and highlight themes and issues common to self-report measures. Reviews of specific self-report instruments can be found in Larsen and Fredrickson (1999), MacKay (1980), and Stone (1995; Stone, Turkkan, Bachrach, Jobe, Kurtzman, & Cain, 2000).
An assessment strategy with a good deal of face validity is simply to ask participants to rate how they are or were feeling on a single emotion dimension. That dimension might be a global affective evaluation (e.g., How unpleasant are you feeling?) or a specific emotion (How angry do you feel?). And the response scale might be unipolar (not at all angry to extremely angry) or bipolar (unpleasant to pleasant), with response options that are Likert-type scales (e.g., 5-, 7-, or 9-point formats). Or the response might be in a checklist format, where the respondent indicates whether or not a specific emotion was experienced. Such measures are simple to construct, easily understood by participants, and brief to administer. Virtually any emotion term can anchor a scale or be put onto a checklist, making self-report indispensable for researchers targeting specific, discrete emotions, as well as those researchers using multiple items to reflect global dimensions of emotion.
A variation on self-report is the experience sampling method, where participants make frequent reports over an extended time period. Although this method allows researchers to ask unique theoretical questions about emotion (e.g., Larsen, 1987), the measurement concerns remain mainly those associated with simple self-report. See Bolger, Davis, and Rafaeli (2003) for a review of this method.
A variation on rating scales makes the response a visual analog that presents the participant with a horizontal line separating two opposing adjectives, which lessens stereotyped responding. A related technique is to make the question itself an analog of the emotion being assessed. For example, the participant might be presented with a series of five cartoon faces, going from a neutral expression on one face to an extreme frown on another. This has the advantages of being useful with participants for whom adjectives might not be meaningful, such as very young children or participants from different language groups.
Another useful strategy in self-report is to have the participants indicate, in real time, how they are feeling by turning a dial, moving a mouse, adjusting a computer display, or in some way modifying an analog display of the emotion on which they are reporting. The general strategy across these techniques is to collect self-reports of subjective experience on a moment-by-moment basis, either online as the emotion is experienced, or retrospectively as the original episode is "replayed."
Conceptually, the most basic real-time self-report measure can be viewed as a single-item measure with a temporal dimension added. Using some mechanical input device (e.g., a mouse or joystick), respondents adjust a computer display as often as necessary so that it always reflects how they are feeling each moment throughout an extended episode (e.g., Schuldberg & Gottlieb, 2002, used such a device to obtain 1,400 affect readings over 2.5 minutes for each subject). Several researchers have described continuous "rating dials" of this sort (Bradley & Lang, 2000; Bunce, Larsen, & Cruz, 1993; Fredrickson & Kahneman, 1993; Gottman & Levenson, 1985). Like rating scales more generally, rating dials may use either bipolar (very negative to very positive) or unipolar verbal anchors (no sadness at all to extreme sadness) and either Likert-type or visual analog scales.
Advantages to these procedures include automating self-report data collection, the ability to calibrate self-reports with other emotion measures (e.g., physiology, facial expressions) in the temporal stream, and the ability to use the technique "off line" to have participants continuously, though retrospectively, report on the emotions they were experiencing (e.g., Gottman & Levenson, 1985; Levenson & Gottman, 1983). The major disadvantage is the need for specialized equipment and the fact that the participant's attention is partially focused on the rating device. Moreover, it seems likely that continuously monitoring one's emotions may lead to a form of fatigue or may be so intrusive that it actually alters the respondent's emotions. Another drawback of this assessment strategy is that the techniques are limited to the self-report of just one or two dimensions. Although it is technically feasible, for example, to create a whole bank of rating dials (e.g., anger, fear, sadness, disgust, attraction, enjoyment, contentment), a limiting factor would be the respondent's ability to track the ebb and flow of multiple discrete emotions simultaneously.
Another category of self-report measures consists of the many standardized multi-item emotion inventories. Some of these inventories are checklists, whereas others are rating scales. These instruments are essentially variations on the self-report themes mentioned earlier, with differences having to do primarily with response scales, the number and nature of the emotion adjectives, the scoring and scale names, and the instructions that accompany the self-report tasks. The advantages of these inventories include their theory- or statistically guided development, empirical refinement and standardization, the development of norms (which allow cross-study comparisons and even metaanalysis; e.g., Larsen & Sinnett, 1991), and the accrual of research findings on specific measures and specific constructs-measure units.
One of the first self-report emotion inventories formally constructed was the 130-item Mood Adjective Checklist (MACL; Nowlis & Green, 1957). Not literally a checklist, the instructions ask the participant to rate how they feel on a Likert scale. Scoring results in 12 factor scores: aggression, anxiety, urgency, elation, concentration, fatigue, social affection, sadness, skepticism, egotism, vigor, and nonchalance. Other researchers have proposed a simpler positive-negative valence scoring scheme (Stone, 1981). The MACL has not become widely used, most likely because it was never formally published (the original version appeared in an unpublished Naval Technical Report, Nowlis & Green, 1957).
A self-report emotion measure that eclipsed the MACL is Zuckerman and Lubin's (1965) Multiple Affect Adjective Checklist (MAACL). It is very similar to the MACL in length, with the MAACL having 132 items. The majority of the items overlap between the two inventories. The MAACL has become the most widely used self-report emotion assessment instrument in the psychological literature (Larsen & Sinnett, 1991). The MAACLs success is likely due to the fact that it is distributed by a professional test publisher and comes with a user manual, annotated references, developmental history, and psychometric properties, along with scoring keys and answer sheets. Other reasons for its popularity might be the checklist format, which makes administering the MAACL much faster than the MACL. And finally, the MAACL has only 3 subscales (depression, anxiety, and hostility), compared to 12 on the MACL. In 1985 Zuckerman and Lubin published a revised version of the Multiple Affect Adjective Checklist (MAACL-R). The revision mainly concerns the scoring format, which now allows for several pleasant emotion scores as well as global positive and negative affect and sensation seeking.
This is a good point to mention the issue of response formats. The MAACL and its revision are in the form of checklists, in which the subject merely indicates the presence or absence of a particular emotion by checking a box. Some researchers have argued that checklists are particularly susceptible to response styles and other forms of nonran-dom error. Bender (1969) argued against using checklists in psychometric assessment. Green, Goldman, and Salovey (1993) demonstrated that checklist emotion assessments contain significant nonrandom error, and they advised caution when analyzing or interpreting checklist data. However, more recently, Schimmack, Bóckenholt, and Reisen-zein (2002) demonstrated that checklist and Likert-scale affect self-reports yield very similar covariance structures. The question of the impact of response format on affect ratings remains open.
Although several rating scales are available (see Stone, 1995), one of the more recent introductions is the Positive Affect Negative Affect Schedule (PANAS; Watson, Clark, & Tellegen, 1988). The PANAS is based on a dimensional model of emotion, in particular the circumplex model (Russell, 1980; Watson & Tellegen, 1985). Of the eight potential scores derivable from the circumplex model (Larsen & Diener, 1992), the PANAS focuses on two of these: Positive Affect (PA; high arousal pleasant), and Negative Affect (NA; high arousal unpleasant). The PANAS contains 10 items on each of the two scales. The items are mood adjectives and are rated on a 5-point scale, labeled as "not at all or slight," "a little," "moderately," "quite a bit," and "very much." The PA and NA scales were constructed to be uncorrelated, and they generally are (though see Zautra, Berkhof, & Nicolson, 2002, for exceptions).
Like most self-report measures, research on the validity of the PANAS has been primarily correlational. For example, extraversión correlates with frequent reports of PA, and neuroticism correlates with frequent reports of NA. In one of the first experimental studies of the PANAS, Larsen and Ketelaar (1991) induced emotions in the laboratory using guided imagery. They found that the positive induction increased PA but did not lower NA, and the negative induction increased NA but did not lower PA. Similar experimental findings on the independence of PA and NA under different inductions, using naturalistic success and failure feedback on exam performance in college students, were found by Goldstein and Strube (1994). This differential sensitivity to positive and negative emotion inductions supports the construct validity of the PANAS. Nevertheless, researchers should be very clear that the PANAS does not measure discrete emotions, which other scales do. The PANAS has its greatest utility in the assessment of the broad emotion dimensions of PA and NA.
Evaluation of self-report methods. Self-report methods are perhaps the most efficient techniques for measuring emotions. Nevertheless, they rely on assumptions that research participants are both able and willing to observe and report on their own emotions. Some issues concern a person's ability to self-report their emotions. Self-report requires memory, either working memory or longer-term memory, and so a variety of memory distortions may compromise a report (Feldman Barrett, 1997). Self-report also requires the perception of something on which to report. It is possible that a person may "have" an emotion in a nonverbal channel (e.g., autonomic activation or action tendency) yet never label that experience and hence not perceive it as an emotion at all (Tranel & Damasio, 1985). Moreover, some persons may repress emotional experiences, particularly negative or inappropriate emotional experiences, resulting in biased or incomplete report of emotions (Cutler, Bunce, & Larsen, 1996). Certain populations, for various reasons, may have meager or inaccurate comprehension of verbal information. For example, cultural psychologists have argued that some cultures have emotions, or emotion terms, that are not identifiable in other cultures (e.g., Mesquita & Frijda, 1992).
Regarding the second assumption—that participants must be willing to report on their emotions— the issue here is mainly one of response sets, where responses to items might be based, not on the emotion content of the items, but on some other factor, such as their social desirability. Here the participant is responding to the items in a manner that creates a positive impression. A different response set is extreme responding, where a participant may be motivated to use scale endpoints or large numbers in describing their emotions, a response set that can greatly distort the covariance structure of a set of ratings (Bentler, 1969).
Another potential problem with self-report is measurement reactivity, where the actual process of measurement alters the psychological construct being measured. Administering an emotion self-report may, in fact, influence the emotional state of interest. Another issue arises when researchers want to assess emotion two or more times, as in within-subject experimental designs or in experience sam pling studies of emotion. One potential effect of repeated emotion measurement is stereotypic responding (Stone, 1995), where participants settle into a response profile that does not change much across the assessment occasions.
Self-report emotion measures require that subjects engage in a number of psychological processes to arrive at a rating. Understanding these processes has both theoretical as well as measurement implications. For instance, providing a global self-report implicates memory processes, as respondents recall the targeted episode, as well as aggregation processes, as respondents in some manner integrate their multiple and often varied momentary experiences into an overall rating. Both of these mental processes may obscure or misrepresent dynamic changes in emotion as experienced over time. For instance, Kahneman and his colleagues have documented that people's global reports of pain episodes draw highly from the momentary affect experienced at the most intense point during the episode, as well as the final moments of the episode, with the duration of the emotional experience largely neglected in the global self-report (Fredrickson & Kahneman, 1993; Kahneman, 1999; Kahneman, Fredrickson, Schrieber, & Redelmeier, 1993; see also Thomas & Diener, 1990 for related issues).
Another language-related channel with potential as a measure of emotion is the voice. Vocalization may be sensitive to emotion-related changes in the body (e.g., muscle tension, respiration rate, and blood pressure). Vocal analysis for emotion has traditionally followed one of two possible strategies. The simplest strategy is to have humans listen to audiotaped speech and evaluate the speaker's affective state. A more technologically advanced strategy is to have audiotapes digitized and analyzed by computer.
The ability of untrained listeners to correctly recognize or infer speakers' emotional states has been evaluated in several studies (e.g., Scherer, 1986; Scherer, Banse, Wallbott, & Goldbeck, 1991; van Bezooijen, 1984). In these studies actors are used to produce sentences in a way that imparts a specific emotional tone (e.g., anger, fear, disgust, joy, sadness). The speech samples are then stripped of vocal content and are then played for naïve listeners who judge which emotion they perceive in the vocalization. Correct selection rates across these studies average around 55%, a rate four to five times what would be expected by chance (Pittam & Scherer, 1993). Some emotions are more easily recognized by naïve raters than others: Sadness and anger are best recognized, whereas disgust, contempt, and joy are least recognized in vocalization samples (Pittam & Scherer, 1993; van Bozooijen, Otto, & Heenan, 1983).
Studies also suggest that arousal level may be better transmitted by vocal cues than is specific hedonic content (i.e., Apple & Hecht, 1982; van Bozooijen et al., 1983). Reviews of recent research suggest that although perceivers are more accurate in judging nonspecific arousal from vocal parameters, they are nevertheless well above chance in judging pleasantness and specific emotions from speech samples that have had the verbal content removed (Bachorowski, 1999). A particularly impressive set of results is reported by Scherer, Banse, and Wallbott (2001). These researchers used professional German actors to produce vocal samples spoken in fear, anger, sadness, joy, and neutral vocal tones. The actual verbal content was then stripped away, leaving only vocalization. The samples were then taken to nine different countries in North America, Asia, and Europe, where participants from different language groups listened to the vocalizations and rated the likely emotions. Overall accuracy averaged 66%, a figure well above chance.
Researchers studying digital voice analysis are still searching for the parameters that best reflect emotion. Parameters typically assessed are (a) fundamental frequency, perceived as overall voice pitch; (b) small perturbations in the fundamental frequency; (c) intensity, indexed in decibels; and (d) speech rate or tempo (Scherer, 1986). Whereas acoustical analysis of speech most accurately reflects the nonspecific arousal of the speaker (Bachorowski & Owren, 1995), it falls far short of identifying specific emotions. For example, positive and negative emotional states are often not reliably distinguished with acoustical parameter (Scherer, 1986; see also Pittam & Scherer, 1993). Because untrained listeners can distinguish specific emo tions from voice samples, there must be some acoustical cues for affect. However, at this time, researchers are still searching for those cues. See Russell, Bachorowski, and Fernandez-Dols (2003) for a recent review of vocal measures of emotion.
Was this article helpful?