Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Assessing Fit of Item Response Models for Performance Assessments using Bayesian Analysis

Zhu, Xiaowen (2009) Assessing Fit of Item Response Models for Performance Assessments using Bayesian Analysis. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (5MB) | Preview


Assessing IRT model-fit and comparing different IRT models from a Bayesian perspective is gaining attention. This research evaluated the performance of Bayesian model-fit and model-comparison techniques in assessing the fit of unidimensional Graded Response (GR) models and comparing different GR models for performance assessment applications.The study explored the general performance of the PPMC method and a variety of discrepancy measures (test-level, item-level, and pair-wise measures) in evaluating different aspects of fit for unidimensional GR models. Previous findings that the PPMC method is conservative were confirmed. In addition, PPMC was found to have adequate power in detecting different aspects of misfit when using appropriate discrepancy measures. Pair-wise measures were found more powerful in detecting violations of unidimensionality and local independence assumptions than test-level and item-level measures. Yen's Q3 measure appeared to perform best. In addition, the power of PPMC increased as the degree of multidimensionality or local dependence among item responses increased. Two classical item-fit statistics were found effective for detecting the item misfit due to discrepancies from GR model boundary curves.The study also compared the relative effectiveness of three Bayesian model-comparison indices (DIC, CPO, and PPMC) for model selection. The results showed that these indices appeared to perform equally well in selecting a preferred model for an overall test. However, the advantage of PPMC applications is that they can be used to compare the relative fit of different models, but also evaluate the absolute fit of each individual model. In contrast, the DIC and CPO indices only compare the relative fit of different models.This study further applied the Bayesian model-fit and model-comparison methods to three real datasets from the QCAI performance assessment. The results indicated that these datasets were essentially unidimensional and exhibited local independence among items. A 2P GR model provided better fit than a 1P GR model, and a two-dimensional model was also not preferred. These findings were consistent with previous studies, although Stone's fit statistics in the PPMC context identified less misfitting items compared to previous studies. Limitations and future research for Bayesian applications to IRT are discussed.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Zhu, Xiaowenxiz28@pitt.eduXIZ28
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairStone, Clement Acas@pitt.eduCAS
Committee MemberYe, Feifeifeifeiye@pitt.eduFEIFEIYE
Committee MemberBost, James
Committee MemberLane, Suzannesl@pitt.eduSL
Date: 11 December 2009
Date Type: Completion
Defense Date: 20 November 2009
Approval Date: 11 December 2009
Submission Date: 7 December 2009
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Institution: University of Pittsburgh
Schools and Programs: School of Education > Psychology in Education
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: IRT model-comparion; Local independence; MCMC; Multidimensional models; Polytomous IRT models; PPMC; Unidimensionality; WinBUGS; IRT model-fit; Item-fit
Other ID:, etd-12072009-163421
Date Deposited: 10 Nov 2011 20:09
Last Modified: 15 Nov 2016 13:53


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item