One of the primary ways that cultures differ is through language. In fact, in considering the level of acculturation of individuals, their language skills are often given dominant (and sometimes mistaken) importance. Even within countries and regions of the world where ostensibly the same language is spoken, accents can make oral communication difficult. Language skill is, ultimately, the ability to communicate. There are different types of language skills. Whereas some language scholars consider competencies in areas such as grammar, discourse, language strategy, and sociolinguistic facility (see Duran, 1988), the focus of many language scholars has been on more holistic approaches to language. Generally, language skills may be considered along two dimensions, one being the oral and written dimension, and the other being the understanding and expression dimension.
Depending upon the nature of the assessment to be performed, different kinds and qualities of language skills may be needed. Academically oriented, largely written language skills may require 6 to 8 years of instruction and use to develop, whereas the development of the spoken word for everyday situations is much faster. These issues are of critical importance for the assessment of immigrants and their children (Geisinger, 2002; Sandoval, 1998). Some cross-cultural comparisons are made using one language for both groups, even though the language may be the second language for one of the groups. In such cases, language skills may be a confounding variable. In the United States, the issue of language often obscures comparisons of Anglos and Hispanics. Pennock-Roman (1992) demonstrated that English-language tests for admissions to higher education may not be valid for language minorities when their English-language skills are not strong.
Culture and language may be very closely wedded. However, not all individuals who speak the same language come from the same culture or are able to take the same test validly. Also within the United States, the heterogeneous nature of individuals who might be classified as Hispanic Americans is underscored by the need for multiple Spanish-language translations and adaptations of tests (Handel & Ben-Porath, 2000). For example, the same Spanish-language measure may not be appropriate for Mexicans, individuals from the Caribbean, and individuals from the Iberian Peninsula. Individuals from different Latin American countries may also need different instruments.
Before making many assessments, we need to establish whether the individuals being tested have the requisite levels of language skills that will be used on the examination. We also need to develop better measures of language skills (Duran, 1988). In American schools, English-language skills should be evaluated early in the schooling of a child whose home language is not English to determine whether that child can profit from English-language instruction (Duran). Handel and Ben-Porath (2000) argue that a similar assessment should be made prior to administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) because studies have shown that it does not work as effectively with individuals whose language skills are not robust, and that the validity of at least some of the scales is compromised if the respondent does not have adequate English reading ability. Many school systems and other agencies have developed tests of language skills, sometimes equating tests of English- and Spanish-language skills so that the scores are comparable (e.g., O'Brien, 1992; Yansen & Shulman, 1996).
One of the primary and most evident ways that cultures often differ is by spoken and written language. They also differ, of course, in many other ways. Individuals from two different cultures who take part in a cross-cultural investigation are likely to differ according to other variables that can influence testing and the study. Among these are speed of responding, the amount of education that they have received, the nature and content of that schooling, and their levels of motivation. All of these factors may influence the findings of cross-cultural studies.
Culture may be envisioned as an antecedent of behavior. Culture is defined as a set of contextual variables that have either a direct or an indirect effect on thought and behavior (Cuellar, 2000). Culture provides a context for all human behavior, thought, and other mediating variables, and it is so pervasive that it is difficult to imagine human behavior that might occur without reference to culture. As noted at the beginning of this chapter, one of the goals of cross-cultural psychology is to investigate and differentiate those behaviors, behavior patterns, personalities, attitudes, worldviews, and so on that are universal from those that are culture specific (see van de Vijver & Poortinga, 1982). "The current field of cultural psychology represents a recognition of the cultural specificity of all human behavior, whereby basic psychological processes may result in highly diverse performance, attitude, self-concepts, and world views in members of different cultural populations" (Anastasi & Urbina, 1997, p. 341). Personality has sometimes been defined as an all-inclusive characteristic describing one's patterns of responding to others and to the world. It is not surprising that so many anthropologists and cross-cultural psychologists have studied the influence of culture and personality—the effects of the pervasive environment on the superordinate organization that mediates behavior and one's interaction with the world.
For much of the history of Western psychology, investigators and theorists believed that many basic psychological processes were universal, that is, that they transcended individual cultures (e.g., Moreland, 1996; Padilla & Medina, 1996). More psychologists now recognize that culture has the omnipresent impact. For example, the APA's current
Diagnostic and Statistical Manual of Mental Disorders (fourth edition, or DSM-IV) was improved over its predecessor, DSM-III-R, by "including an appendix that gives instructions on how to understand culture and descriptions of culture-bound syndromes" (Keitel, Kopala, & Adamson, 1996, p. 35).
Malgady (1996) has extended this argument. He has stated that we should actually assume cultural nonequivalence rather than cultural equivalence. Of course, much of the work of cross-cultural psychologists and other researchers is determining whether our measures are equivalent and appropriate (not biased). Clearly, if we are unable to make parallel tests that are usable in more than one culture and whose scores are comparable in the varying cultures, then we do not have an ability to compare cultures. Helms (1992), too, has asked this question with a particular reference to intelligence testing, arguing that intelligence tests are oriented to middle-class, White Americans.
What about working with individuals with similar-appearing psychological concerns in different cultures, especially where a useful measure has been identified in one culture? Can we use comparable, but not identical, psychological measures in different cultures? Indeed, we should probably use culture-specific measures, even though these measures cannot be used for cross-cultural comparisons (van de Vijver, 2000). If we use measures that have been translated or adapted from other cultures, we need to revalidate them in the new culture. In some circumstances, we may also need to use assessments of acculturation as well as tests of language proficiency before we use tests with clients requiring assessment. (See Geisinger, 2002, for a description of such an assessment paradigm.)
There are problems inherent in even the best translations of tests. For example, even when professional translators, content or psychological specialists, and test designers are involved, "direct translations of the tests are often not possible as psychological constructs may have relevance in one culture and not in another Just because the content of the items is preserved does not automatically insure that the item taps the same ability within the cultural context of the individual being tested" (Suzuki, Vraniak, & Kugler, 1996). This lack of parallelism is, of course, what has already been seen as construct bias and a lack of conceptual equivalence. Brislin (1980) has also referred to this issue as translatability. A test is poorly translated when salient characteristics in the construct to be assessed cannot be translated. The translation and adaptation of tests is considered later in this section.
Intelligence tests are among the most commonly administered types of tests. Kamin (1974) demonstrated in dramatic form how tests of intelligence were once used in U.S. governmental decision making concerning the status of immigrants. In general, immigration policies were affected by analyses of the intelligence of immigrants from varying countries and regions of the world. Average IQs were computed from country and region of origin; these data were shared widely and generally were believed to be the results of innate ability, even though one of the strongest findings of the time was that the longer immigrants lived in the United States, the higher their levels of intelligence were (Kamin). The tests used for many of these analyses were the famous Army Alpha and Army Beta, which were early verbal and nonverbal tests of intelligence, respectively. It was obvious even then that language could cause validity problems in the intelligence testing of those whose English was not proficient. Leaders in the intelligence-testing community have also attempted to develop tests that could be used cross-culturally without translation. These measures have often been called culture-free or culture-fair tests of intelligence.
Culture-Free and Culture-Fair Assessments of Intelligence
Some psychologists initially attempted to develop so-called culture-free measures of intelligence. In the 1940s, for example, Cattell (1940) attempted to use very simple geometric forms that were not reliant upon language to construct what he termed culture-free tests. These tests were based on a notion that Cattell (1963) conceptualized and later developed into a theory that intelligence could be decomposed into two types of ability: fluid and crystallized mental abilities. Fluid abilities were nonverbal and involved in adaptation and learning capabilities. Crystallized abilities were developed as a result of the use of fluid abilities and were based upon cultural assimilation (Sattler, 1992). These tests, then, were intended to measure only fluid abilities and, according to this theory, would hence be culture-free: that is, implicitly conceptually equivalent.
It was soon realized that it was not possible to eliminate the effects of culture from even these geometric-stimulus-based, nonverbal tests. "Even those designated as 'culture-free' do not eliminate the effects of previous cultural experiences, both of impoverishment and enrichment. Language factors greatly affect performance, and some of the tasks used to measure intelligence have little or no relevance for cultures very different from the Anglo-European" (Ritzler, 1996, p. 125). Nonlanguage tests may even be more culturally loaded than language-based tests. Larger group differences with nonverbal tests than with verbal ones have often been found. "Nonverbal, spatial-perceptual tests frequently require relatively abstract thinking processes and analytic cognitive styles characteristic of middle-class Western cultures" (Anastasi & Urbina, 1997, p. 344). In retrospect, "cultural influences will and should be reflected in test performance. It is therefore futile to try to devise a test that is free from cultural influences" (Anastasi & Urbina, p. 342).
Noting these and other reactions, Cattell (Cattell & Cattell, 1963) tried to balance cultural influences and build what he termed culture-fair tests. These tests also tend to use geometric forms of various types. The items frequently are complex patterns, classification tasks, or solving printed mazes; and although the tests can be paper-and-pencil, they can also be based on performance tasks and thus avoid language-based verbal questions. They may involve pictures rather than verbal stimuli. Even such tests were not seen as fully viable:
It is unlikely, moreover, that any test can be equally "fair" to more than one cultural group, especially if the cultures are quite dissimilar. While reducing cultural differentials in test performance, cross-cultural tests cannot completely eliminate such differentials. Every test tends to favor persons from the culture in which it was developed. (Anastasi & Urbina, 1997, p. 342)
Some cultures place greater or lesser emphases upon abstractions, and some cultures value the understanding of contexts and situations more than Western cultures (Cole & Bruner, 1971).
On the other hand, there is a substantial literature that suggests culture-fair tests like the Cattell fulfill not only theoretical and social concerns but practical needs as well. . . . Smith, Hays, and Solway (1977) compared the Cattell Culture-Fair Test and the WISC-R in a sample of juvenile delinquents, 53% of whom were black or Mexican-Americans. . . . The authors concluded that the Cattell is a better measure of intelligence for minority groups than the WISC-R, as it lessens the effect of cultural bias and presents a "more accurate" picture of their intellectual capacity. (Domino, 2000, p. 300)
Some of our top test developers continue to develop tests intended to be culture-fair (Bracken et al., 1999). Although such measures may not be so culture-fair that they would permit cross-cultural comparisons that would be free of cultural biases, they nevertheless have been used effectively in a variety of cultures and may be transported from culture to culture without many of the translation issues so incumbent on most tests of ability that are used in more than one culture. Such tests should, however, be evaluated carefully for what some have seen as their middle-class, Anglo-European orientation.
Cuellar (2000) has described acculturation as a moderator variable between personality and behavior, and culture as "learned behavior transmitted from one generation to the next" (p. 115). When an individual leaves one culture andjoins a second, a transition is generally needed. This transition is, at least in part, acculturation. "Most psychological research defines the construct of acculturation as the product of learning due to contacts between the members of two or more groups" (Marín, 1992, p. 345). Learning the language of the new culture is only one of the more obvious aspects of acculturation. It also involves learning the history and traditions of the new culture, changing personally meaningful behaviors in one's life (including the use of one's language), and changing norms, values, worldview, and interaction patterns (Marín).
In practice settings, when considering the test performance of an individual who is not from the culture in which the assessment instrument was developed, one needs to consider the individual's level of acculturation. Many variables have been shown to be influenced by the level of acculturation in the individual being assessed. Sue, Keefe, Enomoto, Durvasula, and Chao (1996), for example, found that acculturation affected scales on the MMPI-2. It has also been shown that one's level of acculturation affects personality scales to the extent that these differences could lead to different diagnoses and, perhaps, hospitalization decisions (Cuéllar, 2000).
Keitel et al. (1996) have provided guidelines for conducting ethical multicultural assessments. Included among these guidelines are assessing acculturation level, selecting tests appropriate for the culture of the test taker, administering tests in an unbiased fashion, and interpreting results appropriately and in a manner that a client can understand. Dana (1993) and Moreland (1996) concur that acculturation should be assessed as a part of an in-depth evaluation. They suggest, as well, that the psychologist first assess an individual's acculturation and then use instruments appropriate for the individual's dominant culture. Too often, they fear, psychologists use instruments from the dominant culture and with which the psychologist is more likely to be familiar. They also propose that a psychologist dealing with a client who is not fully acculturated should consider test results with respect to the individual's test sophistication, motivation, and other psychological factors that may be influenced by the level of his or her acculturation. Because of the importance of learning to deal with clients who are not from dominant cultures, it has been argued that in training psychologists and other humanservice professionals, practicums should provide students with access to clients from different cultures (Geisinger & Carlson, 1998; Keitel et al., 1996).
There are many measures of acculturation. Measurement is complex, in part because it is not a unidimensional characteristic (even though many measures treat it as such). Discussion of this topic is beyond the scope of the present chapter;
however, the interested reader is referred to Cuellar (2000), Marin (1992), or Olmeda (1979).
The translation and adaptation of tests was one of the most discussed testing issues in the 1990s. The decade ended with a major conference held in Washington, DC, in 1999 called the "International Conference on Test Adaptation: Adapting Tests for Use in Multiple Languages and Cultures." The conference brought together many of the leaders in this area of study for an exchange of ideas. In a decade during which many tests had been translated and adapted, and some examples of poor testing practice had been noted, one of the significant developments was the publication of the International Test Commission guidelines on the adapting of tests. These guidelines, which appear as the appendix to this chapter, summarize some of the best thinking on test adaptation. They may also be found in annotated form in Hamble-ton (1999) and van de Vijver and Leung (1997). The term test adaptation also took prominence during the last decade of the twentieth century; previously, the term test translation had been dominant. This change was based on the more widespread recognition that changes to tests were needed to reflect both cultural differences and language differences. These issues have probably long been known in the cross-cultural psychology profession, but less so in the field of testing. (For excellent treatments on the translation of research materials, see Brislin, 1980, 1986.)
There are a variety of qualitatively different approaches to test adaptation. Of course, for some cross-cultural testing projects, one might develop a new measure altogether to meet one's needs. Such an approach is not test adaptation per se, but nonetheless would need to follow many of the principles of this process. Before building a test for use in more than one culture, one would need to ask how universal the constructs to be tested are (Anastasi & Urbina, 1997; van de Vijver & Poortinga, 1982). One would also have to decide how to validate the measure in the varying cultures. If one imagines a simple approach to validation (e.g., the criterion-related approach), one would need equivalent criteria in each culture. This requirement is often formidable. A more common model is to take an existing and generally much-used measure from one culture and language to attempt to translate it to a second culture and language.
van de Vijver and Leung (1997) have identified three general approaches to adapting tests: back-translation, decenter-ing, and the committee approach. Each of these is described in turn in the following sections. Prior to the development of any of these general approaches, however, some researchers and test developers simply translated tests from one language to a second. For purposes of this discussion, this unadulterated technique is called direct translation; it has sometimes been called forward translation, but this writer does not prefer that name because the process is not the opposite of the back-translation procedure. The techniques embodied by these three general approaches serve as improvements over the direct translation of tests.
This technique is sometimes called the translation/backtranslation technique and was an initial attempt to advance the test adaptation process beyond a direct test translation (Brislin, 1970; Werner & Campbell, 1970). In this approach, an initial translator or team of translators alters the test materials from the original language to the target language. Then a second translator or team, without having seen the original test, begins with the target language translation, and renders this form back to the original language. At this point, the original test developer (or the individuals who plan to use the translated test, or their representatives) compares the original test with the back-translated version, both of which are in the original language. The quality of the translation is evaluated in terms of how accurately the back-translated version agrees with the original test. This technique was widely cited as the procedure of choice (e.g., Butcher & Pancheri, 1976) for several decades and it has been very useful in remedying certain translation problems (van de Vijver & Leung, 1997). It may be especially useful if the test user or developer lacks facility in the target language. It also provides an attempt to evaluate the quality of the translation. However, it also has other disadvantages. The orientation is on a language-only translation; there is no possibility of changes in the test to accommodate cultural differences. Thus, if there are culture-specific aspects of the test, this technique should generally not be used. In fact, this technique can lead to special problems if the translators know that their work will be evaluated through a back-translation procedure. In such an instance, they may use stilted language or wording to insure an accurate back-translation rather than a properly worded translation that would be understood best by test takers in the target language. In short, "a translation-back translation procedure pays more attention to the semantics and less to connotations, naturalness, and comprehensibility" (van de Vijver & Leung, 1997, p. 39).
The process of culturally decentering test materials is somewhat more complex than either the direct translation or translation/back-translation processes (Werner & Campbell, 1970). Cultural decentering does involve translation of an instrument from an original language to a target language. However, unlike direct translation, the original measure is changed prior to being adapted (or translated) to improve its translatability; those components of the test that are likely to be specific to the original culture are removed or altered. Thus, the cultural biases, both construct and method, are reduced. In addition, the wording of the original measure may be changed in a way that will enhance its translatability. The process is usually performed by a team composed of multilingual, multicultural individuals who have knowledge of the construct to be measured and, perhaps, of the original measure (van de Vijver & Leung, 1997). This team then changes the original measure so that "there will be a smooth, naturalsounding version in the second language" (Brislin, 1980, p. 433). If decentering is successful, the two assessment instruments that result, one in each language, are both generally free of culture-specific language and content. "Tanzer, Gittler, and Ellis (1995) developed a test of spatial ability that was used in Austria and the United States. The instructions and stimuli were simultaneously in German and English" (van de Vijver & Leung, 1997, pp. 39-40).
There are several reasons that cultural decentering is not frequently performed, however. First, of course, is that the process is time consuming and expensive. Second, data collected using the original instrument in the first language cannot be used as part of cross-cultural comparisons; only data from the two decentered methods may be used. This condition means that the rich history of validation and normative data that may be available for the original measure are likely to have little use, and the decentered measure in the original language must be used in regathering such information for comparative purposes. For this reason, this process is most likely to be used in comparative cross-cultural research when there is not plentiful supportive data on the original measure. When the technique is used, it is essentially two test-construction processes.
This approach was probably first described by Brislin (1980), has been summarized by van de Vijver and Leung (1997), and is explained in some detail by Geisinger (1994). In this method, a group of bilingual individuals translates the test from the original language to the target language. The members of the committee need to be not only bilingual, but also thoroughly familiar with both cultures, with the construct(s) measured on the test, and with general testing principles. Like most committee processes, this procedure has advantages and disadvantages. A committee will be more expensive than a single translator. A committee may not work well together, or may be dominated by one or more persons. Some members of the committee may not contribute fully or may be reticent to participate for personal reasons. On the other hand, members of the committee are likely to catch mistakes of others on the committee (Brislin, 1980). It is also possible that the committee members can cooperate and help each other, especially if their expertise is complementary (van de Vijver & Leung, 1997). This method, however, like the decentering method, does not include an independent evaluation of its effectiveness. Therefore, it is useful to couple the work of a committee with a back-translation.
Brislin (1980, p. 432) provided a listing of general rules for developing research documents and instruments that are to be translated. These are rules that generate documents written in English that are likely to be successfully translated or adapted, similar to decentering. Most appear as rules for good writing and effective communication, and they have considerable applicability. These 12 rules have been edited slightly for use here.
1. Use short, simple sentences of fewer than 16 words.
2. Employ active rather than passive words.
3. Repeat nouns instead of using pronouns.
4. Avoid metaphors and colloquialisms. Such phrases are least likely to have equivalents in the target language.
5. Avoid the subjunctive mood (e.g., verb forms with could or would).
6. Add sentences that provide context for key ideas. Reword key phrases to provide redundancy. This rule suggests that longer items and questions be used only in single-country research.
7. Avoid adverbs and prepositions telling where or when (e.g., frequently, beyond, around).
8. Avoid possessive forms where possible.
9. Use specific rather than general terms (e.g., the specific animal name, such as cows, chickens, or pigs, rather than the general term livestock).
10. Avoid words indicating vagueness regarding some event or thing (e.g., probably, frequently).
11. Use wording familiar to the translators where possible.
12. Avoid sentences with two different verbs if the verbs suggest two different actions.
Geisinger (1994) elaborated 10 steps that should be involved in any test-adaptation process. In general, these steps are an adaptation themselves of any test-development project. Other writers have altered these procedural steps to some extent, but most modifications are quite minor. Each step is listed and annotated briefly below.
1. Translate and adapt the measure. "Sometimes an instrument can be translated or adapted on a question-by-question basis. At other times, it must be adapted and translated only in concept" (Geisinger, 1994, p. 306). This decision must be made based on the concept of whether the content and constructs measured by the test are free from construct bias. The selection of translators is a major factor in the success of this stage, and Hambleton (1999) provides good suggestions in this regard. Translators must be knowledgeable about the content covered on the test, completely bilingual, expert about both cultures, and often able to work as part of a team.
2. Review the translated or adapted version of the instrument. Once the measure has been adapted, the quality of the new document must be judged. Back-translation can be employed at this stage, but it may be more effective to empanel individual or group reviews of the changed document. Geisinger (1994) suggested that members of the panel review the test individually in writing, share their comments with one another, and then meet to resolve differences of opinion and, perhaps, to rewrite portions of the draft test. The individual participants in this process must meet a number of criteria. They must be fluent in both languages and knowledgeable about both cultures. They must also understand the characteristics measured with the instrument and the likely uses to which the test is to be put. If they do not meet any one of these criteria, their assessment may be flawed.
3. Adapt the draft instrument on the basis of the comments of the reviewers. The individuals involved in the translation or adaptation process need to receive the feedback that arose in Step 2 and consider the comments. There may be reasons not to follow some of the suggestions of the review panel (e.g., reasons related to the validity of the instrument), and the original test author, test users, and the translator should consider these comments.
4. Pilot-test the instrument. It is frequently useful to have a very small number of individuals who can take the test and share concerns and reactions that they may have. They should be as similar as possible to the eventual test takers, and they should be interviewed (or should complete a questionnaire) after taking the test. They may be able to identify problems or ambiguities in wording, in the instructions, in timing, and so on. Any changes that are needed after the pilot test should be made, and if these alterations are extensive, the test may need to be pilot-tested once again.
5. Field-test the instrument. This step differs from the pilot test in that it involves a large and representative sample. If the population taking the test in the target language is diverse, all elements of that diversity should be represented and perhaps overrepresented. After collection of these data, the reliability of the test should be assessed and item analyses performed. Included in the item analyses should be analyses for item bias (both as compared to the original-language version and, perhaps, across elements of diversity within the target field-testing sample). van de Vijver and Poortinga (1991) describe some of the analyses that should be performed on an item-analysis basis.
6. Standardize the scores. If desirable and appropriate, equate them with scores on the original version. If the sample size is large enough, it would be useful (and necessary for tests to be used in practice) to establish norms. If the field-test sample is not large enough, and the test is to be used for more than cross-cultural research in the target language, then collection of norm data is necessary. Scores may be equated back to the score scale of the original instrument, just as may be performed for any new form of a test. These procedures are beyond the scope of the present chapter, but may be found in Angoff (1971), Holland and Rubin (1982), or Kolen and Brennan (1995).
7. Perform validation research as needed. The validation research that is needed includes at least research to establish the equivalence to the original measure. However, as noted earlier, the concepts of construct validation represent the ideal to be sought (Embretson, 1983). Some forms of appropriate revalidation are needed before the test can be used with clients in the target language. It is appropriate to perform validation research before the test is used in research projects, as well.
8. Develop a manual and other documents for users of the assessment device. Users of this newly adapted instrument are going to need information so that they may employ it effectively. A manual that describes administration, scoring, and interpretation should be provided. To provide information that relates to interpretation, summarization of norms, equating (if any), reliability analyses, validity analyses, and investigations of bias should all be provided. Statements regarding the process of adaptation should be also included.
9. Train users. New users of any instrument need instruction so that they may use it effectively. There may be special problems associated with adapted instruments because users may tend to use materials and to employ knowledge that they have of the original measure. Although transfer of training is often positive, if there are differences between the language versions negative consequences may result.
10. Collect reactions from users. Whether the instrument is to be used for cross-cultural research or with actual clients, it behooves the test adaptation team to collect the thoughts of users (and perhaps of test takers as well) and to do so on a continuing basis. As more individuals take the test, the different experiential backgrounds present may identify concerns. Such comments may lead to changes in future versions of the target-language form.
Was this article helpful?