Chen, Huanyu
(2007)
Experimental Design for Unbalanced Data Involving a Two level Logistic Model.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
The multilevel logistic model is used to analyze hierarchical data with binary outcomes, to detect variation both between and within clusters. I extended explicit variance formulae for a fixed effect in two level model for balanced binary data to account for imbalance both between and within clusters. The derivation of the variance is based on a linearization of the two level logistic model using first order marginal quasilikelihood (MQL1) estimation. In a simulation study, I used second order propensity quasilikelihood (PQL2) estimation to collaborate the accuracy of the analytic variance formula based on the observed racial distribution in a multi-center study of racial disparities. Using the site specific racial distributions, I simulated the log odds ratio for black race that could be detected with 80% power. These methods are illustrated in the context of a multi-center study of racial disparities in 30-day mortality in the Veterans Affairs (VA) Healthcare System, where the racial distributions are dramatically unbalanced across the 149 sites. We also consider a subset of 42 sites that include a majority of the black hospitalizations. The same analytic variance is obtained when one has either equal numbers of observations per site and/or a constant proportion of black veterans across sites. The observed racial imbalance both within and across sites increases the variance of the race coefficient more in the Random Coefficient (RC) model than in the random intercept (RI) model. Compared to PQL2, the analytic variances using MQL1 are, severely downwardly biased with smaller variance components. The simulation variances are virtually identical to the analytic variances for these data. For a given power, somewhat smaller log odds ratios can be detected in the RI model than in the RC model. The derived formulas provide a basis for planning multi-center studies when a predictor of primary importance is highly imbalanced both between and within sites. In studies of racial disparities in health care, the site-specific population distributions are often known from administrative data. The public health relevance of this work is that these methods for unbalanced data may facilitate more effective planning of multi-center studies of racial disparities.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
21 June 2007 |
Date Type: |
Completion |
Defense Date: |
23 April 2007 |
Approval Date: |
21 June 2007 |
Submission Date: |
13 April 2007 |
Access Restriction: |
5 year -- Restrict access to University of Pittsburgh for a period of 5 years. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
random coefficient model; first order marginal quasi-likelihood estimation; health service research; random intercept model; second order propensity quasi-likelihood estimat; racial disparities |
Other ID: |
http://etd.library.pitt.edu/ETD/available/etd-04132007-121242/, etd-04132007-121242 |
Date Deposited: |
10 Nov 2011 19:37 |
Last Modified: |
15 Nov 2016 13:40 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/7108 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |