Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Predicting the Distribution of a Goodness-Of-Fit Statistic Appropriate For Use With Performance-Based Assessments

Hansen, Mary A (2004) Predicting the Distribution of a Goodness-Of-Fit Statistic Appropriate For Use With Performance-Based Assessments. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (1MB) | Preview


One aspect of evaluating model-data fit in the context of Item Response Theory involves assessing item fit using chi-square goodness-of-fit tests. In the current study, a goodness-of-fit statistic appropriate for assessing item fit on performance-based assessments was investigated. The statistic utilized a pseudo-observed score distribution, that used examinees' entire posterior distributions of ability to form item fit tables. Due to dependencies in the pseudo-observed score distribution, or pseudocounts, the statistic could not be tested for significance using a theoretical chi-square distribution. However, past research suggested that the Pearson and likelihood ratio forms of the pseudocounts-based statistic (c2* and G2*) may follow scaled chi-square distributions.The purpose of this study was to determine whether item and sample characteristics could be used to predict the scaling corrections needed to rescale c2* and G2* statistics, so that significance tests against theoretical chi-square distributions were possible. Test length (12, 24, and 36 items) and number of item score category levels (2 to 5-category items) were manipulated. Sampling distributions of c2* and G2* statistics were generated, and scaling corrections obtained using the method of moments were applied to the simulated distributions. Two multilevel equations for predicting the scaling corrections (a scaling factor and degrees of freedom value for each item) were then estimated from the simulated data.Overall, when scaling corrections were obtained with the method of moments, sampling distributions of rescaled c2* and G2* statistics closely approximated theoretical chi-square distributions across test configurations.Scaling corrections obtained using multilevel prediction equations did not adequately rescale simulated c2* distributions for 2- to 5-category tests, or simulated G2* distributions for 2- and 3- category tests. Applications to real items showed that the prediction equations were inadequate across score category levels when c2* was used, and for 2- and 3-category items when G2* was used. However, for 4- and 5-category tests, the predicted scaling corrections did adequately rescale empirical sampling distributions of G2* statistics. In addition, applications to real items indicated that use of the multilevel prediction equations with G2* would result in correct identification of item misfit for 5-category, and potentially 4-category items.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Hansen, Mary Amadst46@pitt.eduMADST46
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairStone, Clement Acas@pitt.eduCAS
Committee MemberBaker, Carol Eceb@pitt.eduCEB
Committee MemberIrrgang, James Jirrgangjj@upmc.eduJIRRGANG
Committee MemberLane, Suzannesl@pitt.eduSL
Date: 13 December 2004
Date Type: Completion
Defense Date: 14 July 2004
Approval Date: 13 December 2004
Submission Date: 11 December 2004
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Institution: University of Pittsburgh
Schools and Programs: School of Education > Psychology in Education
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Goodness-of-Fit; Graded Response Model; Item Fit; Item Response Theory
Other ID:, etd-12112004-230948
Date Deposited: 10 Nov 2011 20:10
Last Modified: 19 Dec 2016 14:38


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item