One can expect statistical prediction rules to be more accurate than automated assessment programs and clinical judges. After all, statistical prediction rules are usually based on accurate feedback. That is, when deriving statistical prediction rules, accurate criterion scores are usually obtained. Put another way (Garb, 2000a), "In general, statistical prediction rules will do well because they make use of the inductive method. A statistical prediction rule will do well to the extent that one can generalize from a derivation sample to a new sample" (p. 32). In contrast, in the course of clinical practice, it is normally too expensive for clinicians to obtain good criterion scores. For example, clinicians are unable to follow up with patients after a 6 month time period to learn if they have become violent. Similarly, when writing a computer-based test interpretation program, an expert clinician will not normally collect criterion information.
There is another important reason one can expect statistical prediction rules to be more accurate than automated assessment programs and clinical judges. The use of statistical prediction rules can minimize the occurrence of errors and biases, including race bias and gender bias (Garb, 1997). Automated assessment programs may be biased (e.g., descriptions may be more accurate for White clients than Black clients), because criterion scores are not usually obtained to learn whether accuracy varies by client characteristic (e.g., race). Errors and biases that occur when clinicians make judgments will be described in a later section. Suffice it to say that a carefully derived statistical rule will not make predictions that vary as a function of race or gender unless race or gender has been shown to be related to the behavior one is predicting. To make sure that statistical predictions are unbiased, the effects of client characteristics (e.g., race, gender) need to be investigated.
Although there are reasons to believe that statistical prediction rules will transform psychological assessment, it is important to realize that present-day rules are of limited value (Garb, 1994, 1998, 2000a). For tasks involving diagnosis or describing personality traits or psychiatric symptoms, many statistical prediction rules make use of only limited information (e.g., results from only a single psychological test). This might be satisfactory if investigators first determined that the assessment information represents the best information that is available. However, this is not the case. For tasks involving diagnosis and describing personality traits and psychiatric symptoms, investigators rarely collect a large amount of information and identify optimal predictors.
There is a methodological reason why optimal information has rarely been used for the tasks of diagnosis and describing personality traits and psychiatric symptoms. When statistical prediction rules have been derived for these tasks, criterion ratings have usually been made by psychologists who use information that is available in clinical practice (e.g., history and interview information). If information used by criterion judges is also used as input information for statistical prediction rules, criterion contamination can occur. To avoid criterion contamination, information that is given to criterion judges is not used as input information for statistical prediction rules, even though this information may be optimal. Thus, in many studies, statistical predictions are made using results from a psychological test but not results from history and interview information.
To avoid criterion contamination, new methods need to be used to construct and validate statistical rules for the tasks of diagnosis and describing personality traits and psychiatric symptoms (Garb, 1994, 1998, 2000a). For example, by collecting longitudinal information, one can obtain criterion scores that are not based on information that is normally used by mental health professionals. Thus, if a statistical rule makes a diagnosis of major depression, but longitudinal data reveal that the client later developed a manic episode, then we could say that this diagnosis was incorrect.
Criterion contamination is not a problem for behavioral prediction (e.g., predicting suicide), so it is not surprising that statistical prediction rules that have been used to predict behavior have been based on optimal information. For behavioral prediction, outcome scores are obtained after assessment information has been collected and predictions have been made. All of the information that is normally available in clinical practice can be used by a statistical prediction rule without fear of criterion contamination.
Most present-day statistical prediction rules have not been shown to be powerful. As already noted, statistical prediction rules for making diagnoses and describing personality traits and psychiatric symptoms have almost always made use of limited information that has not been shown to be optimal (e.g., Carlin & Hewitt, 1990; Danet, 1965; Goldberg, 1965, 1969,1970; Grebstein, 1963; Hiler & Nesvig, 1965; Janzen & Coe, 1975; Kleinmuntz, 1967; Lindzey, 1965; Meehl, 1959; Oskamp, 1962; Stricker, 1967; Todd, 1954; Vanderploeg, Sison, & Hickling, 1987). Typically, the statistical prediction rules, and the clinicians to which they have been compared, have been given results from only a single test.
An example will be given. In one of the best known studies on clinical versus statistical prediction (Goldberg, 1965), MMPI (Hathaway & McKinley, 1942) results were used to discriminate between neurotic and psychotic clients. Goldberg constructed a formula that involves adding and subtracting MMPI T scores: Lie (L) + Paranoia (Pa) + Schizophrenia (Sc) - Hysteria (Hy) - Psychasthenia (Pt). Using data collected by Meehl (1959), hit rates were 74% for the Goldberg index and only 68% for the average clinician. Clinicians in this study were not given any information other than the MMPI protocols. The study is well known not so much because the statistical rule did better than clinicians, but because a simple linear rule was more accurate than complex statistical rules including regression equations, profile typologies, Bayesian techniques, density estimation procedures, the Per-ceptron algorithm, and sequential analyses. However, one can question whether the Goldberg index should be used by itself in clinical practice to make differential diagnoses of neurosis versus psychosis. As observed by Graham (2000), "It is important to note that the index is useful only when the clinician is relatively sure that the person being considered is either psychotic or neurotic. When the index is applied to the scores of normal persons or those with personality disorder diagnoses, most of them are considered to be psychotic" (p. 252). Thus, before using the Goldberg index, one needs to rule out diagnoses of normal and of personality disorder, either by relying on clinical judgment or another statistical prediction rule. Of course, the other limitation of the Goldberg index is that it is possible, and perhaps even likely, that clinicians could outperform the index if they were given history and interview information in addition to MMPI results.
In contrast to diagnosis and the description of personality traits and psychiatric symptoms, present-day rules are more promising for the task of prediction. Statistical prediction rules have been developed for predicting violence (e.g., Gardner et al., 1996; Lidz et al., 1993; Monahan et al., 2000), but they are not yet ready for clinical use. In commenting on their prediction rule, Monahan et al. (p. 318) noted that "the extent to which the accuracy of the actuarial tool developed here generalizes to other types of clinical settings (e.g., forensic hospitals) is unknown." One can anticipate (and hope) that actuarial rules for predicting violence will soon be available for widespread use in clinical practice.
Although valuable actuarial rules for predicting violence may soon be available, prospects are less promising for the prediction of suicide. This is such an important task that if a rule could obtain even a low level of accuracy, it might be of use in clinical practice. However, results for actuarial rules have been disappointing. For example, in one study (R. B. Goldstein, Black, Nasrallah, & Winokur, 1991), predictions were made for 1,906 patients who had been followed for several years. Forty-six of the patients committed suicide. Several risk factors for suicide were identified (e.g., history of suicide attempts, suicidal ideation on index admission, and gender). However, these risk factors could not be meaningfully used to make predictions. When the risk factors were incorporated into a statistical rule, five predictions of suicide were made, but only one of them was valid and predictions of no suicide were made for 45 of the 46 patients who did kill themselves. The statistical rule did not do well even though it was derived and validated on the same data set.
Among the most valuable statistical prediction rules currently available are those in the area of behavioral assessment. These rules are helpful for conducting functional analyses. As observed by Schlundt and Bell (1987), when clients keep a self-monitoring diary, a large amount of data is often generated. Typically, clinicians review the records and use clinical judgment to identify patterns and draw inferences about functional relationships among antecedents, behaviors, and consequences. Although the clinical review of self-monitoring records provides data that might not be otherwise obtained, clinical judgment is known to be subject to inaccuracies . . . and statistical prediction is typically more accurate and reliable than clinical judgment. (p. 216)
The shortcomings of clinical judgment for functional analyses were illustrated in a study by O'Brien (1995). In this study, the self-monitoring data for a client who complained of headaches were given to eight clinical psychology graduate students. Over a period of 14 days, the client monitored a number of variables including stress level, arguments, hours of sleep, number of headaches, headache severity, duration of headaches, and number of painkillers taken. The task for the graduate students was to estimate "the magnitude of functional relationships that existed between pairs of target behaviors and controlling factors by generating a subjective correlation" (p. 352). Results were both surprising and disappointing: The graduate students identified the controlling variables that were most strongly correlated with each headache symptom only 51% of the time.
Given the shortcomings of clinical judgment for describing functional relationships, it is important to note that sequential and conditional probability analyses have been used to analyze self-monitoring data. These statistical analyses have been used to clarify the functional relationships involved in a variety of problems including smoking addiction, bulimia, hypertension, and obesity (e.g., Schlundt & Bell, 1987; Shiffman, 1993).
In conclusion, there are reasons one can expect statistical prediction rules to be more accurate than automated assessment programs and clinical judges. However, relatively few statistical prediction rules can be recommended for clinical use. Substantial progress has occurred with predicting violence, child abuse and neglect among the offenders and it does seem likely that powerful statistical rules for these tasks will become available for use in clinical practice in the near future (see Wood, Garb, Lilienfeld, & Nezworski, 2002). Also, statistical rules for analyzing functional relationships are impressive. On the other hand, before powerful statistical rules become available for other tasks, such as diagnosis, the description of personality era psychopathology, and planning methodological barriers will have to be overcome.
Was this article helpful?
Are Headaches Taking Your Life Hostage and Preventing You From Living to Your Fullest Potential? Are you tired of being given the run around by doctors who tell you that your headaches or migraines are psychological or that they have no cause that can be treated? Are you sick of calling in sick because you woke up with a headache so bad that you can barely think or see straight?