Items and scales on an instrument must be evaluated to determine measurement equivalency (Drasgow & Hulin, 1987). That is, do the items and scales on the measure represent similar distributions across linguistic and cultural subgroups within the target population? Examining this type of equivalency has sequenced into two stages (Drasgow, 1984; Drasgow & Hulin, 1987). In the first stage, items that display biased properties can be removed because they exhibit different measurement properties from those observed in the original language. Additionally, it is important to determine if individuals from different language groups (original and translated) show significantly similar expected total scores on the target scale.
Statistically, the method of choice for examining measurement and relational equivalencies is based on Item Response Theory, which provides a typically accurate means for detecting item bias, though no means for explaining it (Drasgow & Hulin, 1987). Item Response Theory proposes that there should be a relationship between an individual's response to a particular item on a translated test and the trait measured by the test (Bontempo, 1993; Ellis, 1989). Item bias, or differential item functioning, is said to exist if individuals hold similar scores on the measured construct, but different scores on items contributing to that construct (Hulin et al., 1982). Statistical techniques are employed to remove item bias in translated instruments (Van de Vijver & Hambleton, 1996).
After biased items have been removed during stage one, the second stage focuses on the relationship between performance on the scale and other external variables of importance. Differences between linguistic groups' performances on the translated scale and identical external variables within the different cultures would suggest the scale might be measuring different characteristics across cultures. This culturally based difference implies a relationship between variables within different cultures that the translated instrument cannot adequately represent (Drasgow & Hulin, 1987), even though the translation may be a good one. That is, scale inequivalency would exist due to the lack of familiarity of scale concepts across cultural subgroups. In some cultures, particularly non-Western, there is a lack of familiarity with Western-type interview and survey formats (Marsella & Kameoka, 1989). Test takers from these cultures may not be familiar with Likert-type scales, Thurstone scales, or even true-false scales. An example of this type of bias was provided by Marsella and Kameoka (1989), describing a Filipino male rating his satisfaction with his living conditions using a five-point scale illustrated by a set of stairs, with response options ranging from "very dissatisfied" at the bottom step of the stairs, to "very satisfied" at the top. The illustration continues with the man placing himself on the bottom step. While the examiner interpreted this response as reflecting the man's poor living conditions; when queried for the rationale for his response, the man replied that he did not want to fall down the stairs and get hurt. In order for a test to display scale equivalence across cultures, it must be relevant to both cultures being examined.
Was this article helpful?