Latent Variable Models for Analyses of Diagnostic Tests and Regression Analyses with Hierarchical Missing CovariatesWang, Xianling (2021) Latent Variable Models for Analyses of Diagnostic Tests and Regression Analyses with Hierarchical Missing Covariates. Doctoral Dissertation, University of Pittsburgh. (Unpublished) This is the latest version of this item.
AbstractThis dissertation concerns statistical analyses with latent variables under two scenarios. Many discrete diagnostic markers, such as breast cancer tumor grade, are important prognostic factors yet suffer from reproducibility because of their subjective nature. With multiple independent ratings, latent class models are the choice for statistical inference. However, model parameters are only estimable up to a permutation on the labels of the underlying truth. When an auxiliary variable associated with the underlying truth in a known trend is observed, we proposed a joint model that achieves global identification and yields more efficient estimates. Remedy to a specific violation of the conditional independence assumption in those classical models was also provided. The methods were illustrated in the analysis of a tumor grade reading dataset from the National Surgical Adjuvant Breast and Bowel Project (NSABP). The improved efficiency was also demonstrated through simulation studies. The second part of this dissertation concerns regression analyses when a covariate is subject to missing values with a hierarchical missing data mechanism. In electronic health records (EHR) data, some important biomarkers such as lab test results are missing due to various reasons. Patients in remission are less likely to take those specialized tests. Furthermore, records of tested patients may be missing due to how the EHR data are assembled. In practice, the exact nature of such missingness is unavailable to the investigators. Standard methods such as the maximum likelihood method and inverse probability weighting typically ignore such heterogeneity and may produce biased estimates. We introduced a latent variable model to model the hierarchical missing data process and yield valid parameter estimates. The maximum likelihood method was used for estimation and inference. The proposed method was applied to a motivating EHR dataset from an inflammatory bowel disease registry at the University of Pittsburgh Medical Center. The performance of the proposed method was evaluated by simulation studies. Public health significance: We proposed novel statistical methods to address missing data under two different scenarios. By yielding valid inference under those circumstances, application of the proposed methods has important public health implications. Share
Details
Available Versions of this Item
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|