Two psychometric models with very different parametric formulas and item response functions can make virtually the same predictions in all applications. By applying some basic results from the theory of hypothesis testing and from signal detection theory, the power of the most powerful test for distinguishing the models can be com puted. Measuring model misspecification by com puting the power of the most powerful test is proposed. If the power of the most powerful test is low, then the two models will make nearly the same prediction in every application. If the power is high, there will be applications in which the models will make different predictions. This measure, that is, the power of the most powerful test, places various types of model misspecifica tion— item parameter estimation error, multidi mensionality, local independence failure, learning and/or fatigue during testing—on a common scale. The theory supporting the method is presented and illustrated with a systematic study of misspecifica tion due to item response function estimation error. In these studies, two joint maximum likelihood estimation methods (LOGIST 2B and LOGIST 5) and two marginal maximum likelihood estimation methods (BILOG and ForScore) were contrasted by measuring the difference between a simulation model and a model obtained by applying an estimation method to simulation data. Marginal estimation was found generally to be superior to joint estimation. The parametric marginal method (BILOG) was superior to the nonparametric method only for three- parameter logistic models. The nonparametric mar ginal method (ForScore) excelled for more general models. Of the two joint maximum likelihood methods studied, LOGIST s appeared to be more accurate than LOGIST 2B.
- forced-choice experiment
- ideal observer method
- item response theory
- multilinear formula score theory
ASJC Scopus subject areas
- Social Sciences (miscellaneous)
- Psychology (miscellaneous)