Zhu, Xiaowen
(2009)
Assessing Fit of Item Response Models for Performance Assessments using Bayesian Analysis.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Assessing IRT model-fit and comparing different IRT models from a Bayesian perspective is gaining attention. This research evaluated the performance of Bayesian model-fit and model-comparison techniques in assessing the fit of unidimensional Graded Response (GR) models and comparing different GR models for performance assessment applications.The study explored the general performance of the PPMC method and a variety of discrepancy measures (test-level, item-level, and pair-wise measures) in evaluating different aspects of fit for unidimensional GR models. Previous findings that the PPMC method is conservative were confirmed. In addition, PPMC was found to have adequate power in detecting different aspects of misfit when using appropriate discrepancy measures. Pair-wise measures were found more powerful in detecting violations of unidimensionality and local independence assumptions than test-level and item-level measures. Yen's Q3 measure appeared to perform best. In addition, the power of PPMC increased as the degree of multidimensionality or local dependence among item responses increased. Two classical item-fit statistics were found effective for detecting the item misfit due to discrepancies from GR model boundary curves.The study also compared the relative effectiveness of three Bayesian model-comparison indices (DIC, CPO, and PPMC) for model selection. The results showed that these indices appeared to perform equally well in selecting a preferred model for an overall test. However, the advantage of PPMC applications is that they can be used to compare the relative fit of different models, but also evaluate the absolute fit of each individual model. In contrast, the DIC and CPO indices only compare the relative fit of different models.This study further applied the Bayesian model-fit and model-comparison methods to three real datasets from the QCAI performance assessment. The results indicated that these datasets were essentially unidimensional and exhibited local independence among items. A 2P GR model provided better fit than a 1P GR model, and a two-dimensional model was also not preferred. These findings were consistent with previous studies, although Stone's fit statistics in the PPMC context identified less misfitting items compared to previous studies. Limitations and future research for Bayesian applications to IRT are discussed.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
11 December 2009 |
Date Type: |
Completion |
Defense Date: |
20 November 2009 |
Approval Date: |
11 December 2009 |
Submission Date: |
7 December 2009 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Education > Psychology in Education |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
IRT model-comparion; Local independence; MCMC; Multidimensional models; Polytomous IRT models; PPMC; Unidimensionality; WinBUGS; IRT model-fit; Item-fit |
Other ID: |
http://etd.library.pitt.edu/ETD/available/etd-12072009-163421/, etd-12072009-163421 |
Date Deposited: |
10 Nov 2011 20:09 |
Last Modified: |
15 Nov 2016 13:53 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/10162 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |