Kastango, Kari B.
(2006)
Assessing Agreement Among Raters And Identifying Atypical Raters Using A Log-Linear Modeling Approach.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
When an outcome is rated by several raters, ensuring consistency across raters increases the reliability of the measurement. Tanner and Young (1985) proposed a general class of log-linear models to assess agreement among K raters and a rating scale with C nominal categories. Their methodology can be used to assess pair-wise agreement among three or more raters. Rogel et al. (1996, 1998) extended this work by assessing various patterns of agreement among rater sub-groups of size K-1. These models can be used to test the assumption of rater exchangeability. Although parameters from these models can be used to identify atypical raters, no formal inferential procedures are available. I propose a formal inferential approach that can be used to test the assumption of rater exchangeability and to identify an atypical rater. The global and heterogeneous partial agreement model is fit to the data and pair-wise comparisons of the K partial agreement parameters are made, adjusting the p-values for the multiple comparisons made. The heterogeneous partial agreement parameter that is constantly involved in the pair-wise comparisons that are statistically significant is distinguished. The premise is that, if there is an atypical rater, at least one heterogeneous partial agreement parameter will differ from at least one of the remaining K-1 partial agreement parameters. The approach is illustrated using published data from an intestinal biopsy rating study with six raters (Rogel et al., 1998). Overall Type I error and the power of the inferential approach to correctly identify atypical raters are assessed via simulation with rater sub-groups of size 5. The Bonferroni, Sidak, and Holm's Step-down procedures using the Bonferroni and Sidak adjustments are used to control the overall Type I error. Being able to correctly identify an atypical rater, if present, and improving the consistency of ratings directly, influence the reliability of the measurement and the power of the study for a given sample size. Consequently, more informative studies can be conducted of interventions (e.g., behavioral, medicinal) that may have a significant positive impact on the public's health.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
6 June 2006 |
Date Type: |
Completion |
Defense Date: |
23 March 2006 |
Approval Date: |
6 June 2006 |
Submission Date: |
30 March 2006 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
heterogeneity; homogeneity; nominal; reliability |
Other ID: |
http://etd.library.pitt.edu/ETD/available/etd-03302006-125650/, etd-03302006-125650 |
Date Deposited: |
10 Nov 2011 19:33 |
Last Modified: |
19 Dec 2016 14:35 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/6648 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |