The Limits Of Psychometrics

Psychological assessment is ultimately about the examinee. A test is merely a tool with which to understand the examinee, and psychometrics are merely rules with which to build the tools. The tools themselves must be sufficiently sound (i.e., valid and reliable) and fair that they introduce acceptable levels of error into the process of decision-making. Some guidelines have been described above for psychometrics of test construction and application that help us not only to build better tools, but to use these tools as skilled craftspersons.

As an evolving field of study, psychometrics still has some glaring shortcomings. A long-standing limitation of psycho-metrics is its systematic overreliance on internal sources of evidence for test validity and fairness. In brief, it is more expensive and more difficult to collect external criterion-based information, especially with special populations; it is simpler and easier to base all analyses on the performance of a normative standardization sample. This dependency on internal methods has been recognized and acknowledged by leading psychometricians. In discussing psychometric methods for detecting test bias, for example, Camilli and Shepard cautioned about circular reasoning: "Because DIF indices rely only on internal criteria, they are inherently circular" (p. 17). Similarly, there has been reticence among psychometricians in considering attempts to extend the domain of validity into consequential aspects of test usage (e.g., Lees-Haley, 1996). We have witnessed entire testing approaches based upon internal factor-analytic approaches and evaluation of content validity (e.g., McGrew & Flanagan, 1998), with negligible attention paid to the external validation of the factors against independent criteria. This shortcoming constitutes a serious limitation of psychometrics, which we have attempted to address by encouraging the use of both internal and external sources of psychometric evidence.

Another long-standing limitation is the tendency of test developers to wait until the test is undergoing standardization to establish its validity. A typical sequence of test development involves pilot studies, a content tryout, and finally a national standardization and supplementary studies (e.g.,

Robertson, 1992). Harkening back to the stages described by Loevinger (1957), the external criterion-based validation stage comes last in the process—after the test has effectively been built. It constitutes a limitation in psychometric practice that many tests only validate their effectiveness for a stated purpose at the end of the process, rather than at the beginning, as MMPI developers did over half a century ago by selecting items that discriminated between specific diagnostic groups (Hathaway & McKinley, 1943). The utility of a test for its intended application should be partially validated at the pilot study stage, prior to norming.

Finally, psychometrics has failed to directly address many of the applied questions of practitioners. Tests results often do not readily lend themselves to functional decision-making. For example, psychometricians have been slow to develop consensually accepted ways of measuring growth and maturation, reliable change (as a result of enrichment, intervention, or treatment), and atypical response patterns suggestive of lack of effort or dissimilation. The failure of treatment validity and assessment-treatment linkage undermines the central purpose of testing. Moreover, recent challenges to the practice of test profile analysis (e.g., Glutting, McDermott, & Konold, 1997) suggest a need to systematically measure test profile strengths and weaknesses in a clinically relevant way that permits a match to prototypal expectations for specific clinical disorders. The answers to these challenges lie ahead.

