Achenbach, T. M., & Howell, C. T. (1993). Are American children's problems getting worse? A 13-year comparison. Journal of the American Academy of Child and Adolescent Psychiatry, 32, 1145-1154.
American Educational Research Association. (1999). Standards for educational and psychological testing. Washington, DC: Author.
American Psychological Association. (1992). Ethical principles of psychologists and code of conduct. American Psychologist, 47, 1597-1611.
American Psychological Association. (1994). Publication manual of the American Psychological Association (4th ed.). Washington, DC: Author.
Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall.
Andrich, D. (1988). Rasch models for measurement. Thousand Oaks, CA: Sage.
Angoff,W. H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service.
Banaji, M. R., & Crowder, R. C. (1989). The bankruptcy of everyday memory. American Psychologist, 44, 1185-1193.
Barrios, B. A. (1988). On the changing nature of behavioral assessment. In A. S. Bellack & M. Hersen (Eds.), Behavioral assessment: A practical handbook (3rd ed., pp. 3-41). New York: Pergamon Press.
Bayley, N. (1993). Bayley Scales of Infant Development second edition manual. SanAntonio,TX: The Psychological Corporation.
Beutler, L. E. (1998). Identifying empirically supported treatments: What if we didn't? Journal of Consulting and Clinical Psychology, 66, 113-120.
Beutler, L. E., & Clarkin, J. F. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. Philadelphia, PA: Brunner/Mazel.
Beutler, L. E., & Harwood, T. M. (2000). Prescriptive psychotherapy: A practical guide to systematic treatment selection. New York: Oxford University Press.
Binet, A., & Simon, T. (1916). New investigation upon the measure of the intellectual level among school children. In E. S. Kite (Trans.), The development of intelligence in children (pp. 274329). Baltimore: Williams and Wilkins. (Original work published 1911).
Bracken, B. A. (1987). Limitations of preschool instruments and standards for minimal levels of technical adequacy. Journal of Psychoeducational Assessment, 4, 313-326.
Bracken, B. A. (1988). Ten psychometric reasons why similar tests produce dissimilar results. Journal of School Psychology, 26, 155-166.
Bracken, B. A., & McCallum, R. S. (1998). Universal Nonverbal Intelligence Test examiner's manual. Itasca, IL: Riverside.
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296-322.
Bruininks, R. H., Woodcock, R. W., Weatherman, R. F., & Hill, B. K. (1996). Scales of Independent Behavior—Revised comprehensive manual. Itasca, IL: Riverside.
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). Minnesota Multiphasic Personality Inventory-2 (MMPI-2): Manual for administration and scoring. Minneapolis: University of Minnesota Press.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items (Vol. 4). Thousand Oaks, CA: Sage.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Chicago: Rand-McNally.
Campbell, S. K., Siegel, E., Parr, C. A., & Ramey, C. T. (1986). Evidence for the need to renorm the Bayley Scales of Infant Development based on the performance of a population-based sample of 12-month-old infants. Topics in Early Childhood Special Education, 6, 83-96.
Carroll, J. B. (1983). Studying individual differences in cognitive abilities: Through and beyond factor analysis. In R. F. Dillon & R. R. Schmeck (Eds.), Individual differences in cognition (pp. 1-33). New York: Academic Press.
Cattell, R. B. (1986). The psychometric properties of tests: Consistency, validity, and efficiency. In R. B. Cattell & R. C. Johnson (Eds.), Functional psychological testing: Principles and instruments (pp. 54-78). New York: Brunner/Mazel.
Chudowsky, N., & Behuniak, P. (1998). Using focus groups to examine the consequential aspect of validity. Educational Measurement: Issues and Practice, 17, 28-38.
Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284-290.
Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7, 309-319.
Cleary, T. A. (1968). Test bias: Prediction of grades for Negro and White students in integrated colleges. Journal of Educational Measurement, 5, 115-124.
Cone, J. D. (1978). The behavioral assessment grid (BAG):A conceptual framework and a taxonomy. Behavior Therapy, 9, 882-888.
Cone, J. D. (1988). Psychometric considerations and the multiple models of behavioral assessment. In A. S. Bellack & M. Hersen (Eds.), Behavioral assessment: A practical handbook (3rd ed., pp. 42-66). New York: Pergamon Press.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand-McNally.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart, and Winston.
Cronbach, L. J. (1957). The two disciplines of scientific psychology. American Psychologist, 12, 671-684.
Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 443-507). Washington, DC: American Council on Education.
Cronbach, L. J., & Gleser, G. C. (1965). Psychological tests and personnel decisions. Urbana: University of Illinois Press.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of gen-eralizability scores and profiles. New York: Wiley.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.
Cronbach, L. J., Rajaratnam, N., & Gleser, G. C. (1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16, 137-163.
Daniel, M. H. (1999). Behind the scenes: Using new measurement methods on the DAS and KAIT. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every psychologist and educator should know (pp. 37-63). Mahwah, NJ: Erlbaum.
Elliott, C. D. (1990). Differential Ability Scales: Introductory and technical handbook. San Antonio, TX: The Psychological Corporation.
Embretson, S. E. (1995). The new rules of measurement. Psychological Assessment, 8, 341-349.
Embretson, S. E. (1999). Issues in the measurement of cognitive abilities. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every psychologist and educator should know (pp. 1-15). Mahwah, NJ: Erlbaum.
Embretson, S. E., & Hershberger, S. L. (Eds.). (1999). The new rules of measurement: What every psychologist and educator should know. Mahwah, NJ: Erlbaum.
Fiske, D. W., & Campbell, D. T. (1992). Citations do not solve problems. Psychological Bulletin, 112, 393-395.
Fleiss, J. L. (1981). Balanced incomplete block designs for interrater reliability studies. Applied Psychological Measurement, 5, 105-112.
Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7, 286-299.
Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932 to 1978. Psychological Bulletin, 95, 29-51.
Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101, 171-191.
Flynn, J. R. (1994). IQ gains over time. In R. J. Sternberg (Ed.), The encyclopedia of human intelligence (pp. 617-623). New York: Macmillan.
Flynn, J. R. (1999). Searching for justice: The discovery of IQ gains over time. American Psychologist, 54, 5-20.
Galton, F. (1879). Psychometric experiments. Brain: A Journal of Neurology, 2, 149-162.
Geisinger, K. F. (1992). The metamorphosis of test validation. Educational Psychologist, 27, 197-222.
Geisinger, K. F. (1998). Psychometric issues in test interpretation. In J. Sandoval, C. L. Frisby, K. F. Geisinger, J. D. Scheuneman, & J. R. Grenier (Eds.), Test interpretation and diversity: Achieving equity in assessment (pp. 17-30). Washington, DC: American Psychological Association.
Gleser, G. C., Cronbach, L. J., & Rajaratnam, N. (1965). Generaliz-ability of scores influenced by multiple sources of variance. Psychometrika, 30, 395-418.
Glutting, J. J., McDermott, P. A., & Konold, T. R. (1997). Ontology, structure, and diagnostic benefits of a normative subtest taxonomy from the WISC-III standardization sample. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 349372). New York: Guilford Press.
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.
Guilford, J. P. (1950). Fundamental statistics in psychology and education (2nd ed.). New York: McGraw-Hill.
Guion, R. M. (1977). Content validity: The source of my discontent. Applied Psychological Measurement, 1, 1-10.
Gulliksen, H. (1950). Theory of mental tests. New York: McGraw-Hill.
Hambleton, R. K., & Rodgers, J. H. (1995). Item bias review. Washington, DC: The Catholic University of America, Department of Education. (ERIC Clearinghouse on Assessment and Evaluation, No. EDO-TM-95-9)
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Hathaway, S. R., & McKinley, J. C. (1943). Manual for the Minnesota Multiphasic Personality Inventory. New York: The Psychological Corporation.
Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1987). The treatment utility of assessment: Afunctional approach to evaluating assessment quality. American Psychologist, 42, 963-974.
Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological Assessment, 7, 238-247.
Heinrichs, R. W. (1990). Current and emergent applications of neuropsychological assessment problems of validity and utility. Professional Psychology: Research and Practice, 21, 171-176.
Herrnstein, R. J., & Murray, C. (1994). The bell curve: Intelligence and class in American life. New York: Free Press.
Hills, J. (1999, May 14). Re: Construct validity. Educational Statistics Discussion List (EDSTAT-L). (Available from edstat-l @jse.stat.ncsu.edu)
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum.
Hopkins, C. D., & Antes, R. L. (1978). Classroom measurement and evaluation. Itasca, IL: F. E. Peacock.
Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage.
Hunter, J. E., Schmidt, F. L., & Jackson, C. B. (1982). Advanced meta-analysis: Quantitative methods of cumulating research findings across studies. San Francisco: Sage.
Ittenbach, R. F., Esters, I. G., &Wainer, H. (1997). The history of test development. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 17-31). New York: Guilford Press.
Jackson, D. N. (1971). A sequential system for personality scale development. In C. D. Spielberger (Ed.), Current topics in clinical and community psychology (Vol. 2, pp. 61-92). New York: Academic Press.
Jencks, C., & Phillips, M. (Eds.). (1998). The Black-White test score gap. Washington, DC: Brookings Institute.
Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometika, 36, 149-176.
Kalton, G. (1983). Introduction to survey sampling. Beverly Hills, CA: Sage.
Kaufman, A. S., & Kaufman, N. L. (1983). Kaufman Assessment Battery for Children. Circle Pines, MN: American Guidance Service.
Keith, T. Z., & Kranzler, J. H. (1999). The absence of structural fidelity precludes construct validity: Rejoinder to Naglieri on what the Cognitive Assessment System does and does not measure. School Psychology Review, 28, 303-321.
Knowles, E. S., & Condon, C. A. (2000). Does the rose still smell as sweet? Item variability across test forms and revisions. Psychological Assessment, 12, 245-252.
Kolen, M. J., Zeng, L., & Hanson, B. A. (1996). Conditional standard errors of measurement for scale scores using IRT. Journal of Educational Measurement, 33, 129-140.
Kuhn, T. (1970). The structure of scientific revolutions (2nd ed.). Chicago: University of Chicago Press.
Larry P. v. Riles, 343 F. Supp. 1306 (N.D. Cal. 1972) (order granting injunction), aff'd 502 F.2d 963 (9th Cir. 1974); 495 F. Supp. 926 (N.D. Cal. 1979) (decision on merits), aff'd (9th Cir. No. 80-427 Jan. 23, 1984). Order modifying judgment, C-71-2270 RFP, September 25, 1986.
Lazarus, A. A. (1973). Multimodal behavior therapy: Treating the BASIC ID. Journal of Nervous and Mental Disease, 156, 404411.
Lees-Haley, P. R. (1996). Alice in validityland, or the dangerous consequences of consequential validity. American Psychologist, 51, 981-983.
Levy, P. S., & Lemeshow, S. (1999). Sampling of populations: Methods and applications. New York: Wiley.
Li, H., Rosenthal, R., & Rubin, D. B. (1996). Reliability of measurement in psychology: From Spearman-Brown to maximal reliability. Psychological Methods, 1, 98-107.
Li, H., & Wainer, H. (1997). Toward a coherent view of reliability in test theory. Journal of Educational and Behavioral Statistics, 22, 478-484.
Linacre, J. M., & Wright, B. D. (1999). A user's guide to Winsteps/ Ministep: Rasch-model computer programs. Chicago: MESA Press.
Linn, R. L. (1998). Partitioning responsibility for the evaluation of the consequences of assessment programs. Educational Measurement: Issues and Practice, 17, 28-30.
Loevinger, J. (1957). Objective tests as instruments of psychological theory [Monograph]. Psychological Reports, 3, 635-694.
Loevinger, J. (1972). Some limitations of objective personality tests. In J. N. Butcher (Ed.), Objective personality assessment (pp. 4558). New York: Academic Press.
Lord, F. N., & Novick, M. (1968). Statistical theories of mental tests. New York: Addison-Wesley.
Maruish, M. E. (Ed.). (1999). The use of psychological testing for treatment planning and outcomes assessment. Mahwah, NJ: Erlbaum.
McAllister, P. H. (1993). Testing, DIF, and public policy. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 389-396). Hillsdale, NJ: Erlbaum.
McArdle, J. J. (1998). Contemporary statistical models for examining test-bias. In J. J. McArdle & R. W. Woodcock (Eds.), Human cognitive abilities in theory and practice (pp. 157-195). Mahwah, NJ: Erlbaum.
McGrew, K. S., & Flanagan, D. P. (1998). The intelligence test desk reference (ITDR): Gf-Gc cross-battery assessment. Boston: Allyn and Bacon.
McGrew, K. S., & Woodcock, R. W. (2001). Woodcock-Johnson III technical manual. Itasca, IL: Riverside.
Meehl, P. E. (1972). Reactions, reflections, projections. In J. N. Butcher (Ed.), Objective personality assessment: Changing perspectives (pp. 131-189). New York: Academic Press.
Mercer, J. R. (1984). What is a racially and culturally nondiscrimi-natory test? A sociological and pluralistic perspective. In C. R. Reynolds & R. T. Brown (Eds.), Perspectives on bias in mental testing (pp. 293-356). New York: Plenum Press.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525-543.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18, 5-11.
Messick, S. (1995a). Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues and Practice, 14, 5-8.
Messick, S. (1995b). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749.
Millon, T., Davis, R., & Millon, C. (1997). MCMI-III: Millon Clinical Multiaxial Inventory-III manual (3rd ed.). Minneapolis, MN: National Computer Systems.
Naglieri, J. A., & Das, J. P. (1997). Das-Naglieri Cognitive Assessment System interpretive handbook. Itasca, IL: Riverside.
Neisser, U. (1978). Memory: What are the important questions? In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory (pp. 3-24). London: Academic Press.
Newborg, J., Stock, J. R., Wnek, L., Guidubaldi, J., & Svinicki, J. (1984). Battelle Developmental Inventory. Itasca, IL: Riverside.
Newman, J. R. (1956). The world of mathematics: A small library of literature of mathematics from A'h-mose the Scribe to Albert Einstein presented with commentaries and notes. New York: Simon and Schuster.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.
O'Brien, M. L. (1992). ARasch approach to scaling issues in testing Hispanics. In K. F. Geisinger (Ed.), Psychological testing of Hispanics (pp. 43-54). Washington, DC: American Psychological Association.
Peckham, R. F. (1972). Opinion, Larry P. v. Riles. Federal Supplement, 343, 1306-1315.
Peckham, R. F. (1979). Opinion, Larry P. v. Riles. Federal Supplement, 495, 926-992.
Pomplun, M. (1997). State assessment and instructional change: A path model analysis. Applied Measurement in Education, 10, 217-234.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Reckase, M. D. (1998). Consequential validity from the test developer's perspective. Educational Measurement: Issues and Practice, 17, 13-16.
Reschly, D. J. (1997). Utility of individual ability measures and public policy choices for the 21st century. School Psychology Review, 26, 234-241.
Riese, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision. Psychological Assessment, 12, 287-297.
Robertson, G. J. (1992). Psychological tests: Development, publication, and distribution. In M. Zeidner & R. Most (Eds.), Psychological testing: An inside view (pp. 159-214). Palo Alto, CA: Consulting Psychologists Press.
Salvia, J., & Ysseldyke, J. E. (2001). Assessment (8th ed.). Boston: Houghton Mifflin.
Samejima, F. (1994). Estimation of reliability coefficients using the test information function and its modifications. Applied Psychological Measurement, 18, 229-244.
Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychology, 62, 529-540.
Shealy, R., & Stout, W. F. (1993). A model-based standardization approach that separates true bias/DIF from group differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420-428.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 171-195.
Stinnett, T. A., Coombs, W. T., Oehler-Stinnett, J., Fuqua, D. R., & Palmer, L. S. (1999, August). NEPSY structure: Straw, stick, or brick house? Paper presented at the Annual Convention of the American Psychological Association, Boston, MA.
Suen, H. K. (1990). Principles of test theories. Hillsdale, NJ: Erlbaum.
Swets, J. A. (1992). The science of choosing the right decision threshold in high-stakes diagnostics. American Psychologist, 47, 522-532.
Terman, L. M. (1916). The measurement of intelligence: An explanation of and a complete guide for the use of the Stanford revision and extension of the Binet Simon Intelligence Scale. Boston: Houghton Mifflin.
Terman, L. M., & Merrill, M. A. (1937). Directions for administering: Forms L and M, Revision of the Stanford-Binet Tests of Intelligence. Boston: Houghton Mifflin.
Tiedeman, D. V. (1978). In O. K. Buros (Ed.), The eight mental measurements yearbook. Highland Park: NJ: Gryphon Press.
Tinsley, H. E. A., & Weiss, D. J. (1975). Interrater reliability and agreement of subjective judgments. Journal of Counseling Psychology, 22, 358-376.
Tulsky, D. S., & Ledbetter, M. F. (2000). Updating to the WAIS-III and WMS-III: Considerations for research and clinical practice. Psychological Assessment, 12, 253-262.
Uniform guidelines on employee selection procedures. (1978). Federal Register, 43, 38296-38309.
Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20.
Walker, K. C., & Bracken, B. A. (1996). Inter-parent agreement on four preschool behavior rating scales: Effects of parent and child gender. Psychology in the Schools, 33, 273-281.
Wechsler, D. (1939). The measurement of adult intelligence. Baltimore: Williams and Wilkins.
Wechsler, D. (1946). The Wechsler-Bellevue Intelligence Scale: Form II. Manual for administering and scoring the test. New York: The Psychological Corporation.
Wechsler, D. (1949). Wechsler Intelligence Scale for Children manual. New York: The Psychological Corporation.
Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for Children-Revised. New York: The Psychological Corporation.
Wechsler, D. (1991). Wechsler Intelligence Scale for Children (3rd ed.). San Antonio, TX: The Psychological Corporation.
Willingham, W. W. (1999). A systematic view of test fairness. In S. J. Messick (Ed.), Assessment in higher education: Issues of access, quality, student development, and public policy (pp. 213242). Mahwah, NJ: Erlbaum.
Wood, J. M., Nezworski, M. T., & Stejskal, W. J. (1996). The comprehensive system for the Rorschach: A critical examination. Psychological Science, 7, 3-10.
Woodcock, R. W. (1999). What can Rasch-based scores convey about a person's test performance? In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every psychologist and educator should know (pp. 105-127). Mahwah, NJ: Erlbaum.
Wright, B. D. (1999). Fundamental measurement for psychology. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement: What every psychologist and educator should know (pp. 65-104). Mahwah, NJ: Erlbaum.
Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337-347). Hillsdale, NJ: Erlbaum.
Was this article helpful?
Enchanted Learning Experiences -Why They Should Be The Norm For Our Children. The latter part of the twentieth century has seen more discoveries about the human brain than in all previous history of mankind. It is as though we have been paddling in the shallows of a vast ocean hitherto unaware of its existence.