Item response theory (IRT) may be traced to two separate lines of development. Its origins may be traced to the work of Danish mathematician Georg Rasch (1960), who developed a family of IRT models that separated person and item parameters. Rasch influenced the thinking of leading European and American psychometricians such as Gerhard Fischer and Benjamin Wright. A second line of development stemmed from research at the Educational Testing Service that culminated in Frederick Lord and Melvin Novick's (1968) classic textbook, including four chapters on IRT written by Allan Birnbaum. This book provided a unified statistical treatment of test theory and moved beyond Gulliksen's earlier classical test theory work.
IRT addresses the issue of how individual test items and observations map in a linear manner onto a targeted construct (termed latent trait, with the amount of the trait denoted by 0). The frequency distribution of a total score, factor score, or other trait estimates is calculated on a standardized scale with a mean 0 of 0 and a standard deviation of 1. An item characteristic curve (ICC) can then be created by plotting the proportion of people who have a score at each level of 0, so that the probability of a person's passing an item depends solely on the ability of that person and the difficulty of the item. This item curve yields several parameters, including item difficulty and item discrimination. Item difficulty is the location on the latent trait continuum corresponding to chance responding. Item discrimination is the rate or slope at which the probability of success changes with trait level (i.e., the ability of the item to differentiate those with more of the trait from those with less). A third parameter denotes the probability of guessing. IRT based on the one-parameter model (i.e., item difficulty) assumes equal discrimination for all items and negligible probability of guessing and is generally referred to as the Rasch model. Two-parameter models (those that estimate both item difficulty and discrimination) and three-parameter models (those that estimate item difficulty, discrimination, and probability of guessing) may also be used.
IRT posits several assumptions: (a) unidimensionality and stability of the latent trait, which is usually estimated from an aggregation of individual item; (b) local independence of items, meaning that the only influence on item responses is the latent trait and not the other items; and (c) item parameter invariance, which means that item properties are a function of the item itself rather than the sample, test form, or interaction between item and respondent. Knowles and Condon (2000) argue that these assumptions may not always be made safely. Despite this limitation, IRT offers technology that makes test development more efficient than classical test theory.
Was this article helpful?