true status of an individual, for example with regard to his or her guilt in a criminal case, has to be determined. Cooperation cannot be expected under such circumstances because of the severe negative consequences a suspect often has to face in the case of a conviction. There are, however, experimental methods of psychological assessment that can be used to obtain sensitive information even under such adverse conditions. Unlike the RRT and the UCT, these techniques rely on within-subject experimental manipulations and are commonly referred to as the polygraph method.

Polygraphic measurement is certainly among the most controversial experimental methods of psychological assessment. This is perhaps not surprising, given that few psychological assessments have more profound consequences for those who take them. In the United States, the Employee Polygraph Protection Act of 1988 eliminated most private-sector uses of polygraph tests, but there still is a widespread reliance on polygraph testing by state and local police departments and national security and law enforcement agencies in the United States. Polygraph testing is also regularly used by the police forces in Canada, Israel, and Japan. In child custody and child abuse cases, they are also often used in Germany.

The most frequently used procedures in polygraph testing are the CQT (control question technique) and the GKT (guilty knowledge test). Both rely on a within-subjects experimental manipulation: Suspects are presented with different questions, and their reactions to these questions are used to either judge their guilt (in the case of the CQT) or to demonstrate their knowledge of details of the crime (in the case of the GKT). We will discuss the theoretical rationale of these two different approaches in turn.

The CQT is based on the assumption that guilty people will be more concerned about questions pertaining to their misdeed ("Did you rob the bank?") than to control questions also designed to elicit emotional reactions ("Have you ever taken something from someone who trusted you?")- Accordingly, their nervous system is expected to react more strongly to the relevant than to the control questions. On the other hand, innocent people are assumed to be less concerned about their responses to crime-relevant questions. Instead they are expected to respond more strongly to the control questions because they are led to believe that lying to these questions is also cause for failing the test. Thus the model equations underlying the CQT assume that autonomic responses to critical questions differ by a certain additive increment or decrement from the corresponding responses to control questions, depending on whether a person is actually guilty or not guilty, respectively. The most frequently used measures of autonomic reaction to the test questions are electrodermal responsivity (skin resistance or conductance), respiration, and blood pressure.

The standard scoring method for the CQT considers control and relevant questions in pairs. For each physiological channel, a decision is made whether the control or the relevant questions elicit the larger response. Scores are assigned for each channel depending on how much larger or smaller the response to the control questions is as compared to the response to the relevant questions. By summing the scores across all question pairs and the different psychophysiological channels, a total score is obtained that is interpreted as indicating either truthfulness or deception. Middle scores are usually considered inconclusive.

A major criticism of the CQT, forcefully advanced by Lykken (1974), is that it is biased against innocent people. To the extent that innocent individuals are more disturbed by the threatening accusations contained in the crime-related relevant questions than by the comparatively innocuous control questions, false positives will occur. Skeptics have even argued that the CQT hardly has much more than chance accuracy with innocent subjects. Contrary to these claims, CQT advocates have argued—often based on different studies using different criteria, but sometimes even on the same studies interpreted differently—that existing research supports the conclusion that CQT accuracy with innocent persons (i.e., its specificity) exceeds 90%. (For a more-detailed treatment of this controversy, see Iacono, 2000.) With regard to guilty suspects, proponents argue that the sensitivity of the

CQT exceeds 95% and that it is very difficult for guilty individuals to learn how to appear nondecep-tive by using appropriate countermeasures to defeat the test. In contrast, skeptics argue that the accuracy with guilty subjects is probably closer to 75% when no countermeasures are used and significantly less if countermeasures are used, such as biting the tongue or performing mental arithmetic during the presentation of the control questions. Skeptics also argue that information on how best to use countermeasures is easily accessible nowadays, and that it would be unrealistic to assume that a defendant would undergo polygraph testing in an important issue without trying to use appropriate countermeasures. Unfortunately, it is possible and indeed quite easy to train guilty examinees to "pass" a CQT examination (Ben-Shakhar & Dolev, 1996).

With regard to the objectivity of polygraph testing, interscorer agreement has been shown to be uniformly high across a wide variety of studies. For example, Honts (1996) reported the reliability of blind chart evaluation of numerically scored charts to be over 0.90. When blindly rescoring polygraph examinations conducted by Canada's national police force, however, Patrick and Iacono (1991) found that examiners often relied on information not contained in the original polygraph charts. In 93% of the cases in which they contradicted their own numerical scoring, they favored the truthfulness of the suspect in their report. This finding seems to suggest that examiners were at some level aware of the inherent bias of the CQT against innocent people and tried to counteract this bias by overriding the physiological data if it did not agree with extrapolygraphic information.

Almost nothing is known about polygraph test-retest reliability. This is unfortunate given the lack of standardization of applied polygraph tests and the extent to which subjective factors may influence the outcome. Research on possible differences in outcome between "adversarial" tests administered by law enforcement officials and "friendly" tests arranged by the suspect's attorney is also completely missing, a serious shortcoming, given that it is the results of friendly tests that are most often presented in court (Iacono, 2000).

To assess the validity of polygraph testing, two types of studies have typically been used. Laboratory studies required volunteers to act out a mock crime and then to lie about it on a polygraph test, whereas field studies used criminal suspects who had already taken a test and whose true status could reliably be determined on the basis of independent evidence. Both types of validation studies have serious limitations regarding their generaliz-ability to real-life circumstances, however (Ben-Shakhar & Furedy, 1990; Iacono & Patrick, 1999). Although one might reasonably assume that the embarrassing nature of the control questions is similar in the laboratory and in real-life situations, innocent laboratory subjects are likely to be relatively more responsive to control than to relevant questions because to them, the relevant questions have less emotional impact than in real-life investigations. Laboratory studies are therefore likely to overestimate the accuracy of polygraph tests for innocent individuals. Accordingly, permitting participants in a mock crime study to choose whether they wanted to be "innocent" or "guilty" (to win more money if they passed the CQT) has been shown to reduce CQT accuracy in laboratory tests, presumably due to the participants' increased sense of personal involvement in the mock crime (Forman & McCauley, 1986). Another factor potentially contributing to an overestimation of the validity of polygraph testing is that laboratory tests are usually carried out as part of a standardized experimental procedure, whereas field tests are likely to vary substantially across examiners and suspects.

Criminal investigations in which the suspect is later proved to be deceptive have been used as an alternative way of assessing the validity of polygraph testing. However, it is difficult to collect a sufficiently large number of cases in which the guilt or innocence of a suspect can be determined by a method that is independent of the outcome of the polygraph test. Patrick and Iacono (1991) found independent evidence for only 1 of 402 presumably guilty individuals. More important, the fact that failing the test leads to confessions in a substantial fraction of test administrations is no evidence for their validity. If a person passes a polygraph test, he or she will usually not be asked to confess, and the polygrapher will most likely never know if he just produced a false-negative outcome. The kind of feedback the polygrapher is most likely to receive— a confession of a suspect believing in the validity of the polygraph test he just failed or the conviction by a judge who himself is influenced by the outcome of the polygraph interrogation—constitutes a biased sample and will almost always confirm the test outcome (Fiedler, Schmid, & Stahl, 2002). However, the most severe criticism of the CQT certainly is the lack of convincing evidence for its core assumption that the occurrence of stronger reactions to crime-related than to control questions will always be limited to guilty suspects.

A serious and arguably superior competitor to the CQT that avoids this problematic assumption is the GKT (guilty knowledge test) developed by David Lykken (1959, 1960). Even though the GKT also uses a within-subjects manipulation of question content, it has been argued to have a sounder theoretical rationale and scientific foundation (MacLaren, 2001). The GKT consists of a series of multiple-choice questions, all dealing with facts only the true delinquent can be familiar with. Each question contains one critical crime-related item presented among homogeneous control items unrelated to the crime. If, for example, the amount of money that was stolen in a robbery is not known by the public and only the police and the robber know that the amount stolen was $10,000, the suspect could be asked, "What was the amount stolen . . . $5,000 . . . $10,000 . . . $15,000?" A suspect is incriminated if his or her physiological responses to the crime-related alternatives consistently differ in some way from those evoked by the unrelated control alternatives. In the preceding example, a guilty person's autonomic reaction can expected to be highest at $10,000, thus revealing his or her knowledge and likely involvement in the crime. Unlike the CQT, the GKT does not have to rely on the questionable assumption that only the guilty react more strongly to critical crime-related questions than to emotionally laden control questions. Rather, the very construction of the GKT ensures that "for the guilty subject only, the 'correct' alternative will have a special significance, an added 'signal value' which will tend to produce a stronger orienting reflex than that subject will show to other alternatives" (Lykken, 1974, p. 728). The special significance of the critical item in the GKT is mediated through simple recognition and need not be attributed to deception, motivation, or fear of punishment. The GKT has, therefore, been called the cognitive approach to psychophysiological detection (Ben-Shakhar & Furedy, 1990).

The power of the GKT to detect the guilty increases in a predictable manner with the number of items asked. Simultaneously, the probability of a false positive decreases with an increasing number of items. A particular strength of the GKT is that when it is competently performed and based on a sufficient number of questions, an innocent person very rarely fails. Therefore, it is not surprising that the GKT's specificity (proportion of innocent classified as innocent) has been reported to average from 94% (Ben-Shakhar & Furedy, 1990) to 98% (Elaad, 1990; Elaad, Ginton, &Jungman, 1992). The sensitivity of the GKT (proportion of guilty classified as guilty) was found to be 76% in a recent meta-analysis by MacLaren (2001). The most frequently cited disadvantage of the GKT, however, is that factual evidence must be available that can be developed into GKT items. Some crimes do not easily lend themselves to the GKT format because GKT items should best be based on information that is known to the police and the perpetrator, but not to innocent suspects. However, details of the crime that are already known by the public will also likely be known by innocent suspects. Accordingly, in a review of FBI case files, Podlesny (1993) concluded that only a minority of the case files could be used to develop GKT items. Moreover, several of the criticisms raised against the CQT also apply to the GKT. Most important, the GKT is also susceptible to countermeasures of suspects who are actually guilty based on the voluntary augmentation of reactions to the control items (Honts, Devitt, Winbush, & Kircher, 1996).

For several decades, an often-heated controversy surrounded the use of polygraph testing (e.g., Faig-man, Kaye, Saks, & Sanders, 1997). Today, virtually all professional polygraphers believe that the existing evidence, despite its limitations, supports the use of polygraph testing as a forensic tool. Social scientists rather tend to stress the need for more compelling evidence of validity before techniques are adopted that severely affect the judicial system and the civil liberties of those tested. Given the commitment of the large number of professional polygraphers in many countries and the often-fundamental criticism raised against their methodology by basic research scientists, polygraphic measurement will likely continue to be the most controversial experimental method of psychological assessment.

