Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Equating with local dependence under the anchor test design

Xu, Ting (2017) Equating with local dependence under the anchor test design. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Primary Text

Download (984kB) | Preview


Item response theory (IRT) models are often used in test equating. The effectiveness of IRT equating depends upon how well test data meet the IRT model assumptions. When tests are composed of testlets (i.e., groups of items sharing a common stimulus), the assumption of local item independence is likely to be violated. When examinees are nested within groups (e.g., classrooms, schools, etc.), the assumption of local person independence (i.e., independence of subjects) is unlikely to hold. Multilevel models allow the flexibility of modeling item and person dependence structures simultaneously.
This research investigated the effectiveness of multilevel models as concurrent calibration models on test equating under the anchor test design with the presence of local dependence. The performance of multilevel models was compared to that of traditional IRT models and testlet response theory (TRT) model through two simulation studies. Local item dependence (LID) was considered in the first study, whereas both LID and person dependence were considered in the second study.
The first study compared the performance of four concurrent calibration approaches on equating testlet-based tests: (a) modeled LID using a three-level hierarchical generalized linear model (HGLM); (b) ignored LID and used a two-level HGLM; (c) ignored LID and used the Rasch model; and (d) used testlet scoring and applied the graded-response model (GRM). The results suggested that the two-level HGLM and the Rasch approaches were robust to the violation of the local item independence assumption, in terms of expected score recovery. In addition, the first three approaches provided better equating results than concurrent calibration using the GRM. Further research confirmed previous findings that degree of LID affected the precision of person parameter estimates.
The second study compared the performance of three models (i.e., 3PL IRT model, 3PL TRT model, and 3PL multilevel TRT model) as concurrent calibration models on equating testlet-based tests when examinees were nested within groups. The results showed that ignoring LID affected item parameter recovery. With the presence of both LID and person dependence, the 3PL multilevel TRT model provided the most accurate estimation for person parameters, especially with a high degree of person dependence.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairYe,
Committee MemberStone, Clement
Committee MemberLane,
Committee MemberYu,
Date: 13 January 2017
Date Type: Publication
Defense Date: 20 April 2016
Approval Date: 13 January 2017
Submission Date: 8 January 2017
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Number of Pages: 132
Institution: University of Pittsburgh
Schools and Programs: School of Education > Psychology in Education
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Item response theory, equating, testlet, local dependence, multilevel models, hierarchical generalized linear model (HGLM)
Date Deposited: 13 Jan 2017 22:38
Last Modified: 13 Jan 2022 06:15

Available Versions of this Item

  • Equating with local dependence under the anchor test design. (deposited 13 Jan 2017 22:38) [Currently Displayed]


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item