Impact of Local Item Dependence on Item Response Theory Scoring in CAT (CT-98-08)

by Lynda M. Reese, Law School Admission Council
Executive Summary
In computerized adaptive testing (CAT) an attempt is made to
select items for individual test takers that are appropriate for
their ability level. This adaptation of the difficulty level of the
test to the ability level of the test taker is made possible through
the application of item response theory (IRT). IRT is a mathematical
model that relates the probability that a test taker will answer a
single test item (i.e., test question) correctly to the ability
level of the test taker and specific characteristics of the test
item. In applying IRT, a formal assumption of local item
independence is made. This assumption states that once the ability
level of the test taker is accounted for, the responses of test
takers to individual items on the test should be statistically
independent.
In a test-taking situation, many circumstances arise that cause the
local item independence assumption to be violated to some degree.
For instance, if a test section is especially difficult, fatigue may
adversely affect the performance of test takers on the items at the
end of the section. In this case, the difficulty level of the items
found at the beginning of the section affect performance on later
items, and so these items are said to exhibit some degree of local
item dependence (LID).
The impact of LID on various applications of IRT within the
paper-and-pencil mode of testing has been evaluated. Depending on
the particular test design, a computerized test may rely more
heavily upon IRT for such procedures as item selection and ability
estimation, and so the assumptions of the model become even more
important. This study represents a first evaluation of the impact of
LID for IRT scoring in CAT. As such, the most basic CAT design and a
simplified design for simulating CAT item pools with various degrees
of LID were applied. The results indicate that, for certain types of
scoring, an extreme amount of LID may adversely impact the final
score attained by the examinee (i.e., test taker). The estimated
precision of the test was also affected by the extreme LID level
studied here. For the medium level of LID, structured to display the
amount of LID typically displayed by the LSAT, the effects of the
LID were not troublesome.
Future research in this area should focus on some of the
computerized testing designs that are currently being evaluated for
the LSAT. Also, future research should be carried out to evaluate
LID levels that represent situations likely to arise in building an
item pool for computerized testing. For example, the effect of 100
items displaying an extreme level of LID within a medium LID CAT
pool should be evaluated.
Impact of Local Item Dependence on Item Response Theory Scoring in CAT (CT-98-08)

Research
Report Index
|