Achenbach, T. M., & Howell, C. T. (1993). Are American children's problems getting worse? A 13-year comparison. Journal of the American Academy of Child and Adolescent Psychiatry, 32, 1145-1154.
American Educational Research Association. (1999). Standards for educational and psychological testing. Washington, DC: Author.
American Psychological Association. (1992). Ethical principles of psychologists and code of conduct. American Psychologist, 47, 1597-1611.
American Psychological Association. (1994). Publication manual of the American Psychological Association (4th ed.). Washington, DC: Author.
Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall.
Andrich, D. (1988). Rasch models for measurement. Thousand Oaks, CA: Sage.
Angoff,W. H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service.
Banaji, M. R., & Crowder, R. C. (1989). The bankruptcy of everyday memory. American Psychologist, 44, 1185-1193.
Barrios, B. A. (1988). On the changing nature of behavioral assessment. In A. S. Bellack & M. Hersen (Eds.), Behavioral assessment: A practical handbook (3rd ed., pp. 3-41). New York: Pergamon Press.
Bayley, N. (1993). Bayley Scales of Infant Development second edition manual. SanAntonio,TX: The Psychological Corporation.
Beutler, L. E. (1998). Identifying empirically supported treatments: What if we didn't? Journal of Consulting and Clinical Psychology, 66, 113-120.
Beutler, L. E., & Clarkin, J. F. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. Philadelphia, PA: Brunner/Mazel.
Beutler, L. E., & Harwood, T. M. (2000). Prescriptive psychotherapy: A practical guide to systematic treatment selection. New York: Oxford University Press.
Binet, A., & Simon, T. (1916). New investigation upon the measure of the intellectual level among school children. In E. S. Kite (Trans.), The development of intelligence in children (pp. 274329). Baltimore: Williams and Wilkins. (Original work published 1911).
Bracken, B. A. (1987). Limitations of preschool instruments and standards for minimal levels of technical adequacy. Journal of Psychoeducational Assessment, 4, 313-326.
Bracken, B. A. (1988). Ten psychometric reasons why similar tests produce dissimilar results. Journal of School Psychology, 26, 155-166.
Bracken, B. A., & McCallum, R. S. (1998). Universal Nonverbal Intelligence Test examiner's manual. Itasca, IL: Riverside.
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296-322.
Bruininks, R. H., Woodcock, R. W., Weatherman, R. F., & Hill, B. K. (1996). Scales of Independent Behavior—Revised comprehensive manual. Itasca, IL: Riverside.
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). Minnesota Multiphasic Personality Inventory-2 (MMPI-2): Manual for administration and scoring. Minneapolis: University of Minnesota Press.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items (Vol. 4). Thousand Oaks, CA: Sage.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Chicago: Rand-McNally.
Campbell, S. K., Siegel, E., Parr, C. A., & Ramey, C. T. (1986). Evidence for the need to renorm the Bayley Scales of Infant Development based on the performance of a population-based sample of 12-month-old infants. Topics in Early Childhood Special Education, 6, 83-96.
Carroll, J. B. (1983). Studying individual differences in cognitive abilities: Through and beyond factor analysis. In R. F. Dillon & R. R. Schmeck (Eds.), Individual differences in cognition (pp. 1-33). New York: Academic Press.
Cattell, R. B. (1986). The psychometric properties of tests: Consistency, validity, and efficiency. In R. B. Cattell & R. C. Johnson (Eds.), Functional psychological testing: Principles and instruments (pp. 54-78). New York: Brunner/Mazel.
Chudowsky, N., & Behuniak, P. (1998). Using focus groups to examine the consequential aspect of validity. Educational Measurement: Issues and Practice, 17, 28-38.
Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284-290.
Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7, 309-319.
Cleary, T. A. (1968). Test bias: Prediction of grades for Negro and White students in integrated colleges. Journal of Educational Measurement, 5, 115-124.
Cone, J. D. (1978). The behavioral assessment grid (BAG):A conceptual framework and a taxonomy. Behavior Therapy, 9, 882-888.
Cone, J. D. (1988). Psychometric considerations and the multiple models of behavioral assessment. In A. S. Bellack & M. Hersen (Eds.), Behavioral assessment: A practical handbook (3rd ed., pp. 42-66). New York: Pergamon Press.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand-McNally.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart, and Winston.
Cronbach, L. J. (1957). The two disciplines of scientific psychology. American Psychologist, 12, 671-684.
Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 443-507). Washington, DC: American Council on Education.
Cronbach, L. J., & Gleser, G. C. (1965). Psychological tests and personnel decisions. Urbana: University of Illinois Press.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of gen-eralizability scores and profiles. New York: Wiley.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.
Cronbach, L. J., Rajaratnam, N., & Gleser, G. C. (1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16, 137-163.
Daniel, M. H. (1999). Behind the scenes: Using new measurement methods on the DAS and KAIT. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every psychologist and educator should know (pp. 37-63). Mahwah, NJ: Erlbaum.
Elliott, C. D. (1990). Differential Ability Scales: Introductory and technical handbook. San Antonio, TX: The Psychological Corporation.
Embretson, S. E. (1995). The new rules of measurement. Psychological Assessment, 8, 341-349.
Embretson, S. E. (1999). Issues in the measurement of cognitive abilities. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every psychologist and educator should know (pp. 1-15). Mahwah, NJ: Erlbaum.
Embretson, S. E., & Hershberger, S. L. (Eds.). (1999). The new rules of measurement: What every psychologist and educator should know. Mahwah, NJ: Erlbaum.
Fiske, D. W., & Campbell, D. T. (1992). Citations do not solve problems. Psychological Bulletin, 112, 393-395.
Fleiss, J. L. (1981). Balanced incomplete block designs for interrater reliability studies. Applied Psychological Measurement, 5, 105-112.
Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7, 286-299.
Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932 to 1978. Psychological Bulletin, 95, 29-51.
Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101, 171-191.
Flynn, J. R. (1994). IQ gains over time. In R. J. Sternberg (Ed.), The encyclopedia of human intelligence (pp. 617-623). New York: Macmillan.
Flynn, J. R. (1999). Searching for justice: The discovery of IQ gains over time. American Psychologist, 54, 5-20.
Galton, F. (1879). Psychometric experiments. Brain: A Journal of Neurology, 2, 149-162.
Geisinger, K. F. (1992). The metamorphosis of test validation. Educational Psychologist, 27, 197-222.
Geisinger, K. F. (1998). Psychometric issues in test interpretation. In J. Sandoval, C. L. Frisby, K. F. Geisinger, J. D. Scheuneman, & J. R. Grenier (Eds.), Test interpretation and diversity: Achieving equity in assessment (pp. 17-30). Washington, DC: American Psychological Association.
Gleser, G. C., Cronbach, L. J., & Rajaratnam, N. (1965). Generaliz-ability of scores influenced by multiple sources of variance. Psychometrika, 30, 395-418.
Glutting, J. J., McDermott, P. A., & Konold, T. R. (1997). Ontology, structure, and diagnostic benefits of a normative subtest taxonomy from the WISC-III standardization sample. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 349372). New York: Guilford Press.
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.
Guilford, J. P. (1950). Fundamental statistics in psychology and education (2nd ed.). New York: McGraw-Hill.
Guion, R. M. (1977). Content validity: The source of my discontent. Applied Psychological Measurement, 1, 1-10.
Gulliksen, H. (1950). Theory of mental tests. New York: McGraw-Hill.
Hambleton, R. K., & Rodgers, J. H. (1995). Item bias review. Washington, DC: The Catholic University of America, Department of Education. (ERIC Clearinghouse on Assessment and Evaluation, No. EDO-TM-95-9)
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Hathaway, S. R., & McKinley, J. C. (1943). Manual for the Minnesota Multiphasic Personality Inventory. New York: The Psychological Corporation.
Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1987). The treatment utility of assessment: Afunctional approach to evaluating assessment quality. American Psychologist, 42, 963-974.
Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological Assessment, 7, 238-247.
Heinrichs, R. W. (1990). Current and emergent applications of neuropsychological assessment problems of validity and utility. Professional Psychology: Research and Practice, 21, 171-176.
Herrnstein, R. J., & Murray, C. (1994). The bell curve: Intelligence and class in American life. New York: Free Press.
Hills, J. (1999, May 14). Re: Construct validity. Educational Statistics Discussion List (EDSTAT-L). (Available from edstat-l @jse.stat.ncsu.edu)
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum.
Hopkins, C. D., & Antes, R. L. (1978). Classroom measurement and evaluation. Itasca, IL: F. E. Peacock.
Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage.
Hunter, J. E., Schmidt, F. L., & Jackson, C. B. (1982). Advanced meta-analysis: Quantitative methods of cumulating research findings across studies. San Francisco: Sage.
Ittenbach, R. F., Esters, I. G., &Wainer, H. (1997). The history of test development. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 17-31). New York: Guilford Press.
Jackson, D. N. (1971). A sequential system for personality scale development. In C. D. Spielberger (Ed.), Current topics in clinical and community psychology (Vol. 2, pp. 61-92). New York: Academic Press.
Jencks, C., & Phillips, M. (Eds.). (1998). The Black-White test score gap. Washington, DC: Brookings Institute.
Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometika, 36, 149-176.
Kalton, G. (1983). Introduction to survey sampling. Beverly Hills, CA: Sage.
Kaufman, A. S., & Kaufman, N. L. (1983). Kaufman Assessment Battery for Children. Circle Pines, MN: American Guidance Service.
Keith, T. Z., & Kranzler, J. H. (1999). The absence of structural fidelity precludes construct validity: Rejoinder to Naglieri on what the Cognitive Assessment System does and does not measure. School Psychology Review, 28, 303-321.
Knowles, E. S., & Condon, C. A. (2000). Does the rose still smell as sweet? Item variability across test forms and revisions. Psychological Assessment, 12, 245-252.
Kolen, M. J., Zeng, L., & Hanson, B. A. (1996). Conditional standard errors of measurement for scale scores using IRT. Journal of Educational Measurement, 33, 129-140.
Kuhn, T. (1970). The structure of scientific revolutions (2nd ed.). Chicago: University of Chicago Press.
Larry P. v. Riles, 343 F. Supp. 1306 (N.D. Cal. 1972) (order granting injunction), aff'd 502 F.2d 963 (9th Cir. 1974); 495 F. Supp. 926 (N.D. Cal. 1979) (decision on merits), aff'd (9th Cir. No. 80-427 Jan. 23, 1984). Order modifying judgment, C-71-2270 RFP, September 25, 1986.
Lazarus, A. A. (1973). Multimodal behavior therapy: Treating the BASIC ID. Journal of Nervous and Mental Disease, 156, 404411.
Lees-Haley, P. R. (1996). Alice in validityland, or the dangerous consequences of consequential validity. American Psychologist, 51, 981-983.
Levy, P. S., & Lemeshow, S. (1999). Sampling of populations: Methods and applications. New York: Wiley.
Li, H., Rosenthal, R., & Rubin, D. B. (1996). Reliability of measurement in psychology: From Spearman-Brown to maximal reliability. Psychological Methods, 1, 98-107.
Li, H., & Wainer, H. (1997). Toward a coherent view of reliability in test theory. Journal of Educational and Behavioral Statistics, 22, 478-484.
Linacre, J. M., & Wright, B. D. (1999). A user's guide to Winsteps/ Ministep: Rasch-model computer programs. Chicago: MESA Press.
Linn, R. L. (1998). Partitioning responsibility for the evaluation of the consequences of assessment programs. Educational Measurement: Issues and Practice, 17, 28-30.
Loevinger, J. (1957). Objective tests as instruments of psychological theory [Monograph]. Psychological Reports, 3, 635-694.
Loevinger, J. (1972). Some limitations of objective personality tests. In J. N. Butcher (Ed.), Objective personality assessment (pp. 4558). New York: Academic Press.
Lord, F. N., & Novick, M. (1968). Statistical theories of mental tests. New York: Addison-Wesley.
Maruish, M. E. (Ed.). (1999). The use of psychological testing for treatment planning and outcomes assessment. Mahwah, NJ: Erlbaum.
McAllister, P. H. (1993). Testing, DIF, and public policy. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 389-396). Hillsdale, NJ: Erlbaum.
McArdle, J. J. (1998). Contemporary statistical models for examining test-bias. In J. J. McArdle & R. W. Woodcock (Eds.), Human cognitive abilities in theory and practice (pp. 157-195). Mahwah, NJ: Erlbaum.
McGrew, K. S., & Flanagan, D. P. (1998). The intelligence test desk reference (ITDR): Gf-Gc cross-battery assessment. Boston: Allyn and Bacon.
McGrew, K. S., & Woodcock, R. W. (2001). Woodcock-Johnson III technical manual. Itasca, IL: Riverside.
Meehl, P. E. (1972). Reactions, reflections, projections. In J. N. Butcher (Ed.), Objective personality assessment: Changing perspectives (pp. 131-189). New York: Academic Press.
Mercer, J. R. (1984). What is a racially and culturally nondiscrimi-natory test? A sociological and pluralistic perspective. In C. R. Reynolds & R. T. Brown (Eds.), Perspectives on bias in mental testing (pp. 293-356). New York: Plenum Press.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525-543.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18, 5-11.
Messick, S. (1995a). Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues and Practice, 14, 5-8.
Messick, S. (1995b). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749.
Millon, T., Davis, R., & Millon, C. (1997). MCMI-III: Millon Clinical Multiaxial Inventory-III manual (3rd ed.). Minneapolis, MN: National Computer Systems.
Naglieri, J. A., & Das, J. P. (1997). Das-Naglieri Cognitive Assessment System interpretive handbook. Itasca, IL: Riverside.
Neisser, U. (1978). Memory: What are the important questions? In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 3-24). London: Academic Press.
Newborg, J., Stock, J. R., Wnek, L., Guidubaldi, J., & Svinicki, J. (1984). Battelle Developmental Inventory. Itasca, IL: Riverside.
Newman, J. R. (1956). The world of mathematics: A small library of literature of mathematics from A'h-mose the Scribe to Albert Einstein presented with commentaries and notes. New York: Simon and Schuster.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.
O'Brien, M. L. (1992). ARasch approach to scaling issues in testing Hispanics. In K. F. Geisinger (Ed.), Psychological testing of Hispanics (pp. 43-54). Washington, DC: American Psychological Association.
Peckham, R. F. (1972). Opinion, Larry P. v. Riles. Federal Supplement, 343, 1306-1315.
Peckham, R. F. (1979). Opinion, Larry P. v. Riles. Federal Supplement, 495, 926-992.
Pomplun, M. (1997). State assessment and instructional change: A path model analysis. Applied Measurement in Education, 10, 217-234.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Reckase, M. D. (1998). Consequential validity from the test developer's perspective. Educational Measurement: Issues and Practice, 17, 13-16.
Reschly, D. J. (1997). Utility of individual ability measures and public policy choices for the 21st century. School Psychology Review, 26, 234-241.
Riese, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision. Psychological Assessment, 12, 287-297.
Robertson, G. J. (1992). Psychological tests: Development, publication, and distribution. In M. Zeidner & R. Most (Eds.), Psychological testing: An inside view (pp. 159-214). Palo Alto, CA: Consulting Psychologists Press.
Salvia, J., & Ysseldyke, J. E. (2001). Assessment (8th ed.). Boston: Houghton Mifflin.
Samejima, F. (1994). Estimation of reliability coefficients using the test information function and its modifications. Applied Psychological Measurement, 18, 229-244.
Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychology, 62, 529-540.
Shealy, R., & Stout, W. F. (1993). A model-based standardization approach that separates true bias/DIF from group differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420-428.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 171-195.
Stinnett, T. A., Coombs, W. T., Oehler-Stinnett, J., Fuqua, D. R., & Palmer, L. S. (1999, August). NEPSY structure: Straw, stick, or brick house? Paper presented at the Annual Convention of the American Psychological Association, Boston, MA.
Suen, H. K. (1990). Principles of test theories. Hillsdale, NJ: Erlbaum.
Swets, J. A. (1992). The science of choosing the right decision threshold in high-stakes diagnostics. American Psychologist, 47, 522-532.
Terman, L. M. (1916). The measurement of intelligence: An explanation of and a complete guide for the use of the Stanford revision and extension of the Binet Simon Intelligence Scale. Boston: Houghton Mifflin.
Terman, L. M., & Merrill, M. A. (1937). Directions for administering: Forms L and M, Revision of the Stanford-Binet Tests of Intelligence. Boston: Houghton Mifflin.
Tiedeman, D. V. (1978). In O. K. Buros (Ed.), The eight mental measurements yearbook. Highland Park: NJ: Gryphon Press.
Tinsley, H. E. A., & Weiss, D. J. (1975). Interrater reliability and agreement of subjective judgments. Journal of Counseling Psychology, 22, 358-376.
Tulsky, D. S., & Ledbetter, M. F. (2000). Updating to the WAIS-III and WMS-III: Considerations for research and clinical practice. Psychological Assessment, 12, 253-262.
Uniform guidelines on employee selection procedures. (1978). Federal Register, 43, 38296-38309.
Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20.
Walker, K. C., & Bracken, B. A. (1996). Inter-parent agreement on four preschool behavior rating scales: Effects of parent and child gender. Psychology in the Schools, 33, 273-281.
Wechsler, D. (1939). The measurement of adult intelligence. Baltimore: Williams and Wilkins.
Wechsler, D. (1946). The Wechsler-Bellevue Intelligence Scale: Form II. Manual for administering and scoring the test. New York: The Psychological Corporation.
Wechsler, D. (1949). Wechsler Intelligence Scale for Children manual. New York: The Psychological Corporation.
Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for Children-Revised. New York: The Psychological Corporation.
Wechsler, D. (1991). Wechsler Intelligence Scale for Children (3rd ed.). San Antonio, TX: The Psychological Corporation.
Willingham, W. W. (1999). A systematic view of test fairness. In S. J. Messick (Ed.), Assessment in higher education: Issues of access, quality, student development, and public policy (pp. 213242). Mahwah, NJ: Erlbaum.
Wood, J. M., Nezworski, M. T., & Stejskal, W. J. (1996). The comprehensive system for the Rorschach: A critical examination. Psychological Science, 7, 3-10.
Woodcock, R. W. (1999). What can Rasch-based scores convey about a person's test performance? In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every psychologist and educator should know (pp. 105-127). Mahwah, NJ: Erlbaum.
Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every psychologist and educator should know (pp. 65-104). Mahwah, NJ: Erlbaum.
Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337-347). Hillsdale, NJ: Erlbaum.
Was this article helpful?