Pitt Logo LinkContact Us

Predicting the Distribution of a Goodness-Of-Fit Statistic Appropriate For Use With Performance-Based Assessments

Hansen, Mary A (2004) Predicting the Distribution of a Goodness-Of-Fit Statistic Appropriate For Use With Performance-Based Assessments. Doctoral Dissertation, University of Pittsburgh.

[img]
Preview
PDF - Primary Text
Download (994Kb) | Preview

    Abstract

    One aspect of evaluating model-data fit in the context of Item Response Theory involves assessing item fit using chi-square goodness-of-fit tests. In the current study, a goodness-of-fit statistic appropriate for assessing item fit on performance-based assessments was investigated. The statistic utilized a pseudo-observed score distribution, that used examinees' entire posterior distributions of ability to form item fit tables. Due to dependencies in the pseudo-observed score distribution, or pseudocounts, the statistic could not be tested for significance using a theoretical chi-square distribution. However, past research suggested that the Pearson and likelihood ratio forms of the pseudocounts-based statistic (c2* and G2*) may follow scaled chi-square distributions.The purpose of this study was to determine whether item and sample characteristics could be used to predict the scaling corrections needed to rescale c2* and G2* statistics, so that significance tests against theoretical chi-square distributions were possible. Test length (12, 24, and 36 items) and number of item score category levels (2 to 5-category items) were manipulated. Sampling distributions of c2* and G2* statistics were generated, and scaling corrections obtained using the method of moments were applied to the simulated distributions. Two multilevel equations for predicting the scaling corrections (a scaling factor and degrees of freedom value for each item) were then estimated from the simulated data.Overall, when scaling corrections were obtained with the method of moments, sampling distributions of rescaled c2* and G2* statistics closely approximated theoretical chi-square distributions across test configurations.Scaling corrections obtained using multilevel prediction equations did not adequately rescale simulated c2* distributions for 2- to 5-category tests, or simulated G2* distributions for 2- and 3- category tests. Applications to real items showed that the prediction equations were inadequate across score category levels when c2* was used, and for 2- and 3-category items when G2* was used. However, for 4- and 5-category tests, the predicted scaling corrections did adequately rescale empirical sampling distributions of G2* statistics. In addition, applications to real items indicated that use of the multilevel prediction equations with G2* would result in correct identification of item misfit for 5-category, and potentially 4-category items.


    Share

    Citation/Export:
    Social Networking:

    Details

    Item Type: University of Pittsburgh ETD
    ETD Committee:
    ETD Committee TypeCommittee MemberEmail
    Committee ChairStone, Clement Acas@pitt.edu
    Committee MemberBaker, Carol Eceb@pitt.edu
    Committee MemberIrrgang, James Jirrgangjj@upmc.edu
    Committee MemberLane, Suzannesl@pitt.edu
    Title: Predicting the Distribution of a Goodness-Of-Fit Statistic Appropriate For Use With Performance-Based Assessments
    Status: Unpublished
    Abstract: One aspect of evaluating model-data fit in the context of Item Response Theory involves assessing item fit using chi-square goodness-of-fit tests. In the current study, a goodness-of-fit statistic appropriate for assessing item fit on performance-based assessments was investigated. The statistic utilized a pseudo-observed score distribution, that used examinees' entire posterior distributions of ability to form item fit tables. Due to dependencies in the pseudo-observed score distribution, or pseudocounts, the statistic could not be tested for significance using a theoretical chi-square distribution. However, past research suggested that the Pearson and likelihood ratio forms of the pseudocounts-based statistic (c2* and G2*) may follow scaled chi-square distributions.The purpose of this study was to determine whether item and sample characteristics could be used to predict the scaling corrections needed to rescale c2* and G2* statistics, so that significance tests against theoretical chi-square distributions were possible. Test length (12, 24, and 36 items) and number of item score category levels (2 to 5-category items) were manipulated. Sampling distributions of c2* and G2* statistics were generated, and scaling corrections obtained using the method of moments were applied to the simulated distributions. Two multilevel equations for predicting the scaling corrections (a scaling factor and degrees of freedom value for each item) were then estimated from the simulated data.Overall, when scaling corrections were obtained with the method of moments, sampling distributions of rescaled c2* and G2* statistics closely approximated theoretical chi-square distributions across test configurations.Scaling corrections obtained using multilevel prediction equations did not adequately rescale simulated c2* distributions for 2- to 5-category tests, or simulated G2* distributions for 2- and 3- category tests. Applications to real items showed that the prediction equations were inadequate across score category levels when c2* was used, and for 2- and 3-category items when G2* was used. However, for 4- and 5-category tests, the predicted scaling corrections did adequately rescale empirical sampling distributions of G2* statistics. In addition, applications to real items indicated that use of the multilevel prediction equations with G2* would result in correct identification of item misfit for 5-category, and potentially 4-category items.
    Date: 13 December 2004
    Date Type: Completion
    Defense Date: 14 July 2004
    Approval Date: 13 December 2004
    Submission Date: 11 December 2004
    Access Restriction: No restriction; Release the ETD for access worldwide immediately.
    Patent pending: No
    Institution: University of Pittsburgh
    Thesis Type: Doctoral Dissertation
    Refereed: Yes
    Degree: PhD - Doctor of Philosophy
    URN: etd-12112004-230948
    Uncontrolled Keywords: Goodness-of-Fit; Graded Response Model; Item Fit; Item Response Theory
    Schools and Programs: School of Education > Psychology in Education
    Date Deposited: 10 Nov 2011 15:10
    Last Modified: 25 May 2012 11:39
    Other ID: http://etd.library.pitt.edu/ETD/available/etd-12112004-230948/, etd-12112004-230948

    Actions (login required)

    View Item

    Document Downloads