Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

A permutation-based correction for Pearson's chi-square test on data with an imputed complex outcome / A modified EM algorithm for contingency table analysis with missing data

Olson Hunt, Megan (2014) A permutation-based correction for Pearson's chi-square test on data with an imputed complex outcome / A modified EM algorithm for contingency table analysis with missing data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Submitted Version

Download (693kB) | Preview


Studies on human subjects often yield missing data, making progress in this field of inherent public health relevance. Here, two statistical methods are proposed for the analysis of discrete data with missing values. First, when one variable is subject to missingness, it was noted the application of Pearson’s chi-square test to singly-imputed data undermines the variability due to imputation, leading to a type-I error rate larger than the nominal level. This research concerns Pearson’s test on data with an imputed complex outcome, where one of its components suffers from missing values. Imputation in this context may be performed either directly through conditional imputation of the complex outcome given covariates, or indirectly through conditional imputation of its missing component given the covariates and the other, observed component. Although the latter imputation scheme is shown to be more efficient, an existing adjustment method cannot be extended to this scenario due to the lack of independence amongst the variables constituting the complex outcome. As a result, a novel permutation-based correction method for Pearson’s test is proposed. Simulation studies indicate it provides the nominal rejection rate under the null. Second, a modification of the expectation maximization (EM) algorithm for the analysis of discrete data with missing values is presented. In general, the update in the M-step requires either knowing or modeling the missing-data mechanism. However, misspecification of this mechanism may lead to biased estimates of model parameters. Given consistent initial estimates of the parameters (which may be obtained from an external, complete data set, or by recalling a random sample of subjects), the target function is approximated in the M-step with empirical estimates, allowing for unbiased estimation without specification or modeling of the often intangible missing-data mechanism. Simulation studies show this modified algorithm yields consistent estimates potentially more efficient than the initial estimates, even under non-ignorable missingness.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Olson Hunt,
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTang, Gonggot1@pitt.eduGOT1
Committee MemberBandos, Andriyanb61@pitt.eduANB61
Committee MemberBrooks, Mariambrooks@pitt.eduMBROOKS
Committee MemberChang,
Date: 27 June 2014
Date Type: Publication
Defense Date: 3 April 2014
Approval Date: 27 June 2014
Submission Date: 6 April 2014
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Number of Pages: 116
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: single imputation, discrete data, bias, consistency, efficiency, MNAR, empirical
Date Deposited: 27 Jun 2014 20:22
Last Modified: 01 May 2019 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item