TY - JOUR
T1 - Fitting Polytomous Item Response Theory Models to Multiple-Choice Tests
AU - Drasgow, Fritz
AU - Levine, Michael v.
AU - Tsien, Sherman
AU - Williams, Bruce
AU - Mead, Alan D.
PY - 1995/6
Y1 - 1995/6
N2 - This study examined how well current software implementations of four polytomous item response theory models fit several multiple-choice tests. The models were Bock's (1972) nominal model, Samejima's (1979) multiple-choice Model C, Thissen & Steinberg's (1984) multiple-choice model, and Levine's (1993) maximum-likelihood formula scoring model. The parameters of the first three of these models were estimated with Thissen's (1986) MULTILOG computer program; Williams & Levine's (1993) FORSCORE program was used for Levine's model. Tests from the Armed Services Vocational Aptitude Battery,the Scholastic Aptitude Test, and the American College Test Assessment were analyzed. The models were fit in estimation samples of approximately 3,000; cross-validation samples of approximately 3,000 were used to evaluate goodness of fit. Both fit plots and X2 statistics were used to determine the adequacy of fit. Bock's model provided surprisingly good fit; adding parameters to the nominal model did not yield improvements in fit. FORSCORE provided generally good fit for Levine's nonparametric model across all tests. Index terms: Bock's nominal model, FORSCORE, maximum likelihood formula scoring, MULTILOG, polytomous IRT.
AB - This study examined how well current software implementations of four polytomous item response theory models fit several multiple-choice tests. The models were Bock's (1972) nominal model, Samejima's (1979) multiple-choice Model C, Thissen & Steinberg's (1984) multiple-choice model, and Levine's (1993) maximum-likelihood formula scoring model. The parameters of the first three of these models were estimated with Thissen's (1986) MULTILOG computer program; Williams & Levine's (1993) FORSCORE program was used for Levine's model. Tests from the Armed Services Vocational Aptitude Battery,the Scholastic Aptitude Test, and the American College Test Assessment were analyzed. The models were fit in estimation samples of approximately 3,000; cross-validation samples of approximately 3,000 were used to evaluate goodness of fit. Both fit plots and X2 statistics were used to determine the adequacy of fit. Bock's model provided surprisingly good fit; adding parameters to the nominal model did not yield improvements in fit. FORSCORE provided generally good fit for Levine's nonparametric model across all tests. Index terms: Bock's nominal model, FORSCORE, maximum likelihood formula scoring, MULTILOG, polytomous IRT.
UR - http://www.scopus.com/inward/record.url?scp=84976985197&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84976985197&partnerID=8YFLogxK
U2 - 10.1177/014662169501900203
DO - 10.1177/014662169501900203
M3 - Article
AN - SCOPUS:84976985197
SN - 0146-6216
VL - 19
SP - 143
EP - 166
JO - Applied Psychological Measurement
JF - Applied Psychological Measurement
IS - 2
ER -