When an individual responds to a self-report measure, he or she must first make sense of the question being asked (Schwarz, 1999; Tourangeau et al.,
2000). To do this, the respondent must understand the literal meaning of the question, and anything that impedes this understanding (e.g., vague or unfamiliar words, complicated sentence structure) will undermine the quality of the self-report measure. Psychological assessment and survey methodology textbooks suggest that to avoid misunderstandings, question writers should keep items simple and avoid potentially unfamiliar words (see Tourangeau et al., 2000, and Schmitt, this volume, chap. 2, for more detailed recommendations). Careful pretesting of items can prevent misunderstandings from occurring (see Schwarz & Sudman, 1996, for discussions of these pretesting techniques).
Yet understanding the words themselves gets the respondent only so far. Respondents must then discern the pragmatic meaning of a question. Often, a question that is clear in a literal sense can be interpreted in many different ways. When interpreting questions, respondents may try to infer what the experimenter had in mind. As Schwarz (1996) and others (e.g., Clark & Shober, 1992; Tourangeau et al., 2000) have noted, these inferences are often based on norms regarding how a conversation should progress (see Grice, 1975, 1989, for a detailed discussion of these principles). For instance, conversation participants implicitly expect that their counterparts will not provide nor expect redundant information. Thus, respondents who come across two similar questions in the same questionnaire may assume that the experimenter meant something different with each question unless there is some plausible explanation for the repetition.
Strack, Schwarz, and Wânke (1991) demonstrated that this nonredundancy norm affects how individuals respond to questionnaire items. In their study, experimenters asked participants two questions about their subjective well-being. First, the experimenters asked participants how "happy" they were and then how "satisfied" they were. Strack et al. also varied the manner in which these questions were presented. In one condition, the happiness and satisfaction questions were presented as two questions within the same questionnaire. In a second condition, experimenters presented the two questions as the last question of one questionnaire and the first question of a separate, unrelated questionnaire. Responses to the two questions were less strongly correlated when presented as part of the same questionnaire than when the two questions were presented as the last question on one questionnaire and as the first question on a separate questionnaire. Presumably, respondents who were asked the two questions within the same questionnaire assumed that the experimenter believed that happiness and satisfaction formulated two distinct constructs, and therefore these respondents exaggerated the subtle difference in meaning when responding to the question.
Strack et al.'s (1991) study provides important insight into the processes that occur when respondents interpret and answer survey questions. Yet it is unclear whether these processes are likely to affect the validity of most self-report items—the same conversational norms that guide respondents' interpretation of questions may also guide questionnaire construction. It may seem unlikely that researchers would put two questions with nearly identical content side by side in a questionnaire unless the experimenter was actually interested in the subtle distinctions among similar items. However, there are a number of reasons why Strack et al.'s findings are important for researchers interested in self-report methods. First, researchers may include very similar questions in different parts of a questionnaire to check for careless responding, and the same conversational norms may still apply when the questions are not presented side by side.
But more important, Schwarz, Strack, and colleagues have demonstrated that these principles also apply in more subtle situations. For example, Schwarz, Strack, and Mai (1991) found similar effects with a more realistic example of questions that might be asked in a questionnaire. Specifically, they asked respondents two different questions about their life satisfaction, again varying the presentation of the questions. In one condition, respondents were first asked about their satisfaction with their marriage and then asked about their satisfaction with life. In a second condition, these two questions were preceded by a joint lead-in that informed participants that they would be asked two questions about their subjective well-being. With out the joint lead-in, responses to the two questions correlated .67; with the lead-in, responses correlated .18. Presumably, the joint lead-in activated the norm of nonredundancy and participants interpreted the life satisfaction question in such a way that they excluded satisfaction with marriage from the overall life satisfaction judgment. Conclusions about the role of marital satisfaction in life satisfaction will vary depending on this subtle difference in question presentation.
Respondents use a variety of contextual features to interpret the meaning of questions (see Schwarz, 1996, for a more comprehensive review). For instance, Winkielman, Knauper, and Schwarz (1998) manipulated the time frame of a survey question about the experience of anger. They found that people interpreted the question differently depending on the time frame that was used. Specifically, Winkielman et al. found that when respondents were asked about episodes in which they were angry "during the past week," they described less severe anger episodes than when the question asked about episodes occurring "during the past year."
Schwarz, Knauper, Hippler, Noelle-Neumann, and Clark (1991) also showed that the response options provided with a scale could influence interpretation of the question. In their studies, Schwarz et al. asked participants to rate how successful they have been in life. Some participants were presented with a response scale that ranged from 0 ("not at all successful") to 10 ("extremely successful") whereas other participants were presented with a response scale that ranged from -5 to +5, and which used the same response anchors. Although the anchors were identical, fewer participants responded with values between -5 and 0 on the -5 to +5 scale than with values between 0 and 5 on the 0 to 10 scale. Although researchers might treat these two scales as being identical (because both use 11 points), the specific numbers on the scale may influence the interpretation of the item.
Schimmack, Bockenholt, and Reisenzein (2002) demonstrated a similar phenomenon using affect ratings. However, in their study, Schimmack et al. showed that response scales do not just affect the number of participants who choose a particular response option, but also that these subtle differ ences can affect correlations with other variables. Specifically, Schimmack et al. sought to determine whether positive affect (which consists of positive emotions, e.g., joy, happiness, and excitement) and negative affect (which consists of negative emotions and moods, e.g., unhappiness, fear, and depression) formed a single bipolar dimension or two unipolar dimensions. The researchers investigated the correlations between positive affect and negative affect when various response options were used (e.g., "strongly disagree" to "strongly agree," "does not describe me at all" to "describes me perfectly," and "not at all" to "with maximum intensity"). In addition, they asked participants to indicate where on the response scale a person scores if they were in a neutral mood.
In accordance with their hypotheses, Schimmack et al. (2002) found that when respondents were asked whether they experienced a particular emotion (e.g., cheerful) using a scale that ranged from "strongly disagree" to "strongly agree," most participants indicated that the neutral point was in the middle of the scale at the point labeled "neither agree nor disagree." When participants were asked to indicate where the neutral point was on an intensity scale that ranged from "not" to "maximum intensity," a large minority indicated that the lowest score on the scale should reflect a neutral response. Schimmack et al. argued that when given an agree/disagree response scale, respondents infer that the experimenter is asking about a bipolar dimension that ranges from extremely happy to extremely unhappy. When given an intensity scale, on the other hand, respondents are more likely to infer that the experimenter is asking about a unipolar dimension that ranges from extremely happy to neutral. In accordance with this interpretation, positive and negative affect items correlated more strongly when an agree/disagree response scale was used than when an intensity scale was used. The Schimmack et al. study is important because it demonstrates that differences in conclusions about bipolarity that have been found across studies may be due to the subtle contextual information that respondents use to understand the content of self-report items.
The research reviewed above demonstrates that contextual factors play an important role in question comprehension. Subtle changes in question wording, question order, question presentation, and response options can influence the responses that respondents give. In discussing the effects of contextual variables in the context of self-reports of well-being, Schwarz and Strack (1999) argued that after seeing this evidence most people would conclude, "there is little to be learned from global self-reports of well-being" (p. 80). They went on to argue that "although these reports do reflect subjectively meaningful assessments, what is being assessed, and how, seems too context dependent to provide reliable information about a population's well-being" (p. 81). Although it is clear that in carefully controlled experimental settings, respondents' answers to self-report questions can be affected, very little research has examined how pervasive these effects are. It is possible that these subtle manipulations may add only a small amount of unwanted variance relative to the amount of true variance that these scales capture. For instance, when conducting multimethod research, researchers may find that contextual factors do not substantially change the correlation between self-reports and other indicators (e.g., informant reports). Contextual factors may influence self-reported assessments, but more work is needed in this area to determine the impact these factors have on existing self-report methods.
Was this article helpful?