Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

New statistical methods for complex survival data with high-dimensional covariates

Sun, Tao (2020) New statistical methods for complex survival data with high-dimensional covariates. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Submitted Version

Download (1MB) | Preview


Complex survival outcomes, such as multivariate and interval-censored endpoints, are becoming more commonly used in clinical trials. The revolutionary development of genetics technologies allows the generation of large-scale genetic data. This dissertation proposes new statistical methods for complex survival outcomes with high-dimensional covariates.

In the first part, to deal with bivariate interval-censored data, we propose a flexible two-parameter copula-based model with semiparametric transformation margins. We estimate the model parameters by the sieve likelihood approach and establish the asymptotic properties of the sieve estimators. We demonstrate satisfactory estimation and inference performance in simulation studies. Lastly, we apply our method to the Age-Related Macular Degeneration Study (AREDS) data and successfully identify novel genetic variants associated with the progression of Age-related Macular Degeneration (AMD). An R package CopulaCenR is published for analyzing bivariate censored data in a regression setting.

In the second part, we develop a novel information-ratio-based test statistic to evaluate the goodness-of-fit of copula survival models. We establish the asymptotic properties of our test statistic. The simulation studies demonstrate that our method performs well under interval and right censoring. Lastly, we evaluate our results in multiple real data sets. To the best of our knowledge, our method is the first approach that can test any parametric copula model under both interval and right censoring.

In the third part, motivated by recent demanding needs for developing accurate survival prediction models utilizing rich genetic data, we develop a novel framework for constructing and evaluating a deep neural network (DNN) based survival model. Our simulation results clearly demonstrate the high predictive power of the DNN survival model, especially in the presence of complex data structures. We also build an accurate and interpretable DNN survival prediction model for AMD progression using AREDS data.

Public health significance: This dissertation provides a comprehensive set of novel statistical and computational tools for analyzing bivariate survival outcomes with large-scale genetic data, which have the potential to fundamentally improve the current practice in analyzing such clinical studies, and thus to enhance the understanding of disease progression and to increase the success of individualized risk management and precision medicine.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Sun, Taotas184@pitt.edutas184
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairDing,
Committee MemberChen,
Committee MemberJeong,
Committee MemberCheng,
Committee MemberWeeks, Danielweeks@pitt.edu0000-0001-9410-7228
Date: 30 July 2020
Date Type: Publication
Defense Date: 7 April 2020
Approval Date: 30 July 2020
Submission Date: 20 March 2020
Access Restriction: 2 year -- Restrict access to University of Pittsburgh for a period of 2 years.
Number of Pages: 155
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: copula, deep learning, goodness-of-fit, interval-censored, sieve, survival prediction
Related URLs:
Date Deposited: 30 Jul 2020 21:23
Last Modified: 30 Jun 2022 15:18


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item