Problems of Experimental Assessment Methods

Methods providing benefits often do not come without costs. This principle also holds for experimental assessment methods. First, because these methods require measurement models or laws accounting for behavior in several experimental conditions, applications of these methods are limited to those fields in which appropriate laws or models have already been developed. Hence, progress in theory development must always precede progress in experimental assessment methodology.

Second, if we want to rely on models or laws for purposes of psychological assessment, we have to make sure that these models and laws are actually valid. Measuring mental rotation speed using Equation (3) makes sense only if the law used to derive this equation holds for each person. Evidence from experimental psychology often only shows that laws of behavior hold at the aggregate level, however. If we use a law for the purpose of individual assessments, we need to make sure that this law also holds for single individuals.

Third, because experimental assessment methods require observations obtained under at least two different conditions, control techniques must be used that minimize effects of confounding variables. The choice among possible control techniques should depend on whether the unit of psychological assessment is (a) an individual or (b) a group of persons. If groups are the unit of assessment, experimental assessments typically require between-subjects designs. To control for nuisance effects of person attributes while comparing groups, randomization, the most important control technique in experimental psychology, should routinely be applied to keep unknown confounding influences constant across conditions.

If individuals are the units of assessment, a more frequent case in psychological assessment, within-subject designs are required in which each to-be-assessed individual is observed under each of the conditions of the design. In this situation, nuisance effects, particularly those associated with the order in which the experimental conditions are arranged, are most threatening. We can distinguish retest effects and carryover effects (Davis, 1995). Retest effects refer to systematic differences between early and late tests in a test sequence, irrespective of the type of treatment that precedes the other. Training and fatigue effects are examples of positive and negative retest effects, respectively: Training effects help the participants to perform better the longer the experiment lasts, and fatigue effects do just the opposite. Carryover effects, in contrast, are not simply additive effects of the treatment position in a treatment sequence. They refer to treatment-sequence interactions, that is, to the phenomenon that the effect of a treatment is modified by the treatment that precedes it. For example, carryover effects can take the form of positive or negative transfer effects: Practicing one task A trains skills that either facilitate or interfere with the performance in a subsequent task B. Another type of carryover effect refers to task comprehension: The way a specific task is understood depends on the experimental conditions experienced previously.

Whether randomization of the conditions is appropriate in within-subject assessment designs primarily depends on how often each individual is observed under each condition. Obviously, randomization would not make much sense if each individual is observed only once per condition. For example, if we are interested in assessing the mental rotation speed of an individual i by applying Equation (3) to just two response times measured under two different angles of rotation, we must assume that the order in which the two rotation conditions are arranged does not affect the response times systematically. If this assumption would be false, then the estimate of v; based on Equation (3) would be biased irrespective of whether the condition order was determined randomly or not. However, the situation would be different if we could observe each individual many times under each of the two angles of rotation. By randomizing the order of the two rotation conditions and applying Equation (3) to the average response times registered for each rotation angle, unbiased estimates of Vj could be obtained. More precisely, if each additional observation causes an additive, treatment-independent increment or decrement to the response times (e.g., representing fatigue effects or training effects), then this effect could be eliminated by repeating the treatments several times and by arranging them in a random order. Of course, the efficiency of randomization depends on the number of repetitions. For just a few repetitions, randomization does not necessarily eliminate nuisance effects in the sample.

Next to randomization, single-case experimental designs with a fixed, predefined order of the conditions (e.g., the A-B-B-A design) can be applied to control for additive retest effects. By counterbalancing the order of the conditions across replications, these designs can effectively eliminate additive retest effects even for few repetitions. In addition, retest effects can be reduced by practicing the relevant tasks prior to the experimental session so that further training effects are unlikely.

Unfortunately, however, all these techniques are not really helpful for controlling carryover effects. The only possible remedy against carryover effects is to select the order of the conditions and the breaks between them carefully Carryover effects may be strong for one treatment order AB and weak or absent for the reverse order BA. For example, assume that A is a yes-no recognition test for a set of words learned previously, whereas B is a free recall test for the same words. Obviously, the recognition test A would have a very strong impact on the subsequent free recall performance B, especially in the case of a short time lag between both tests. Reversing the order of conditions (recall-then-recognition procedure) might be a better idea, although it is not without problems either (Batchelder & Riefer, 1999).

We recommend a two-step strategy to cope with these problems. First, the experimenter should carefully select (a) the order of conditions and (b) the breaks between conditions so that the likelihood of order effects, especially carryover effects, is minimized. Second, the experimenter should perform a pilot experiment comparing the treatment sequence defined in the first step to several control groups lacking the first treatment(s) of the sequence. If the data patterns do not depend on whether other treatments had been undergone before, this provides evidence that carryover effects do not pose a major problem. Retest effects can be examined in a similar way, using several permutations of the original treatment sequence in the control groups.

Finally, next to the validity of the assessment procedure we also need to consider its reliability. Other things being equal, measures derived from experimental assessment procedures are likely to be less reliable than measures derived from single conditions because they typically combine several random influences. For example, using Equation (3)

for measuring the mental rotation speed involves the estimation of the difference between two mean response times. Because the variance of the difference between two independent random variables, V(X - Y), equals the sum of the single variances, V(X) + V(Y), the standard error of the difference between two independent sample means will always be larger than the standard error of a single mean. As a consequence, the measure of v. derived from Equation (3) will be less reliable than the mean response times from which it is derived.

Depending on the measurement model involved, there may be several ways to address the reliability problem. In general, increasing the number of observations is an effective remedy. One way of increasing the number of observations would be to keep the number of experimental conditions constant and increase m, the number of observations per condition. Another way would be to add more experimental conditions to the design. In case of the Shepard-Metzler law, for example, one could make use of a third or a fourth rotation angle and then estimate v( by inverting the slope bi of the regression line fitted through the response times for the three or four rotation angles.

Was this article helpful?

0 0

Post a comment