From the very beginning, the experimental method has been closely tied to hypothesis testing and theory evaluation in psychological research (see Boring, 1950; Bredenkamp, 2001; Calfee, 1985; Cook & Campbell, 1979; Davis, 1995; Shadish, Cook, & Campbell, 2002). During the past 50 to 60 years in particular, experiments have been used fairly routinely for testing hypotheses from different branches of psychology. Typical examples include "frustration causes aggression" (Berkowitz, 1989, p. 61), "elaborate semantic processing of information improves later recall memory" (Craik & Tulv-ing, 1975, p. 270), and "client-centered short-term psychotherapy is more effective than conflict-centered therapy" (Meyer, Stuhr, Wirth, & Ruester, 1988, p. 196). Common to all applications of the experimental method is the comparison of at least two conditions, treatments, or groups of participants with respect to the mean or some other aspect of the distribution of a so-called dependent variable. The experimental conditions define the levels of the so-called independent variable, and the causal effect of the independent variable on the dependent variables is the target of the research. For instance, in the three examples just presented, amount of frustration experienced (strong vs. no frustration), type of information processing during encoding (semantic vs. phonetic processing), and type of psychotherapy received (client-centered vs. conflict-centered), respectively, might be the independent variables, each manipulated in two levels preselected by the experimenter. In the same three experiments, amount of aggressive acts shown by the participants, proportion of items recalled 1 week later, and ratings of subjective well-being after psychotherapy, respectively, could serve as dependent variables. Different group means of the dependent variables are typically taken as evidence in favor of causal effects of the independent variable, as specified by the psychological hypothesis under investigation.
Of course, not every observed mean difference establishes a true causal effect, and not every comparison of two or more conditions meets the criteria of a psychological experiment. A defining feature of the experimental method is that the experimental conditions need to be comparable, that is, they should differ only with respect to the independent variable under scrutiny and not with respect to other variables, so-called confounding variables, that might also affect the dependent variable. Important candidates for confounding variables are person attributes such as aggressiveness, intelligence, or gender and features of the experimental situation such as presence versus absence of other people, background noise, or hour of the day. Ideally, all potential confounding variables should be kept constant to prevent possible nuisance effects by fixation. An example would be the elimination of background noise by using soundproof experimental booths. Typically, however, not all confounding variables can be controlled by fixation. Counterbalancing experimental conditions is often a good remedy in such situations. For example, in a within-subject experiment, each participant is observed under two or more treatment conditions. Thus, a potential confounding variable is the order in which treatments are applied. Nuisance effects can be controlled by assigning an equal number of participants to each of the possible treatment order permutations. This method guarantees that the treatments do not differ with respect to the average position of a treatment in the treatment sequence.
The control techniques of fixation and counterbalancing can only be applied to confounding variables known in advance. Effects of unknown variables have to be controlled by randomization, that is, by the method of random assignment of experimental units to experimental conditions. Randomization is the most powerful experimental control technique because it makes sure that the distribution of all confounding variables associated with the experimental units, including even the unknown ones, does not differ between experimental conditions. This minimizes the possibility that a treatment effect observed in a randomized experiment is "spurious," that is, artificially caused by one or more confounding variables rather than by the experimental independent variable itself (Shadish et al., 2002; Steyer, Gabler, & Rucai, 1996). It is for this reason that many researchers prefer to tie the definition of the psychological experiment to the method of randomization. Bre-denkamp's (2001) definition is a typical example: "An experiment can be defined by the following criteria: The experimenter creates the conditions, systematically varies them, and applies the principle of randomization" (pp. 8226-8227).
WHAT IS AN EXPERIMENTAL ASSESSMENT METHOD?
Compared to its important role in the context of psychological hypothesis testing, the experimental method has been largely neglected or even ignored in the field of psychological assessment and psychological testing. Indeed, if the term experimental has been used in this context, it typically has been associated with meanings different from that defined earlier. Historically, it has been used to refer to (a) new assessment instruments still in the phase of construction (e.g., Goldman & Saunders, 1974, p. xi; Graves, 1991); (b) measurement techniques based on technical equipment such as tachisto-scopes, millisecond timers, or response counters
(e.g., Kretschmer, 1928); or (c) assessment methods using paradigms, tasks, and measures typically used in experimental cognitive psychology (e.g., Trepag-nier, 2002).
In contrast to these historical meanings, we advocate a definition that is consistent with the notion of a psychological experiment discussed in the last section. We propose to call assessment methods "experimental" if and only if the following two conditions are met:
a. Predefined aspects of human behavior are observed under at least two experimental conditions manipulated by the experimenter.
b. A measurement model or law specifies how the to-be-measured psychological construct is related to the behavior observed in the different experimental conditions.
As a possible example, consider Shepard and Met-zler's mental rotation task. This task can be converted into a truly experimental assessment method by making use of the linear law of mental rotation proposed by Shepard and Metzler (1971). To illustrate this law, recall that each single mental rotation task consists of pictures of two abstract three-dimensional geometric figures that are either identical or not. If they are identical, then one of the figures can be rotated in three-dimensional space until it coincides with the other. Participants are asked to judge as quickly as possible whether the two figures can be brought to congruence or not. Typically, half the figure pairs to be judged consist of identical figures, whereas the other half consists of figures with identical features but different three-dimensional structures so that they cannot be rotated into each other.
If a denotes the angle of rotation for identical figures, the linear law of mental rotation states that E(T | a), and the conditional expectation of the response time T of participant i for identical figures separated by an angle of a, is given by
In other words, for a given angle a, the average response time of any participant i is a linear combi nation of a sensorimotor component, the simple reaction time ajt and a cognitive processing component, namely, the time required per degree of mental rotation br Let us assume now that one is interested in assessing the mental rotation speed = 1/ bj selectively, uncontaminated by the sensorimotor speed component. Obviously, a simple way of achieving this goal would be to observe the response times under two experimental conditions that differ only in the angles of rotation a , and a 2, such that a 2 > a r According to the linear law of mental rotation, the average difference in response times in these two conditions is lored to specific experimental designs. This class of models is likely to be very useful for solving problems of experimental assessment, at least at the group level (Batchelder, 1998).
These examples may suffice to illustrate that it is not important what particular type of law or model applies to the experimental task. Mandatory for experimental assessment techniques, as we understand them, is that there has to be at least some model or a law that precisely specifies how the to-be-assessed attribute or construct is related to the behavior observed under different experimental conditions.
Hence a pure measure of the mental rotation speed Vj of participant i can be derived by simple algebraic manipulation from the participant's mean response times observed under two experimental conditions:
^ = l/bj= (a 2 - a p / (E(Tj | a 2) - E(T. I a 1)). (3)
To preclude misunderstandings, it should be noted that the experimental approach to psychological assessment we advocate here does not necessarily require quantitative laws relating two or more physical variables such as the Shepard-Metzler law. Experimental assessment techniques can be applied to dichotomous categorical data (see Birnbaum, 1992, for examples in the context of utility measurement) and quite simple measurement models, too. For example, the speed of accessing letter meaning can be measured using a very simple additive measurement model in combination with an experimental paradigm suggested by Posner, Boies, Eichelman, and Taylor (1969). The response time of a "same" judgment to physically different letters with the same name (i.e., "Aa"), minus the response time for a "same" judgment to physically identical letters (i.e., "AA"), is generally regarded as a valid measure of the time needed for accessing the name of a letter (Petrill & Brody, 2002, p. 585). In addition, multinomial models for categorical data (Batchelder & Riefer, 1999) provide a very general framework for developing and testing psychologically motivated measurement models that are tai
Was this article helpful?