Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Prognostic biomarker detection, machine learning bias correction, and differential coexpression module detection

Ding, Ying (2014) Prognostic biomarker detection, machine learning bias correction, and differential coexpression module detection. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (2MB) | Preview


In this thesis, we present three projects on prognosis biomarker detection, machine learning bias correction and identification of differential coexpression modules in complex diseases. In the first project, we aimed to identify fusion transcripts that are of predictive value on prostate cancer prognosis, an important task to avoid overtreatment to patients. We discovered eight fusion transcripts from 19 RNA-seq datasets and validated its predictive value on >200 patients from three sites (Pittsburgh, Stanford and Wisconsin). The constructed prediction model showed consistently high accuracy on predicting prostate cancer recurrence and aggressiveness in all three cohorts. In the second project, we consider a common practice to apply many (up to several hundred) machine learning classifiers to a dataset and report the best cross-validated accuracy. We demonstrated a downward bias using this approach and proposed an inverse power law (IPL) method to correct the bias. The method was compared with several existing methods using simulation and real datasets and showed superior performance. For the third study, we developed a computational algorithm (MetaDiffNetwork) to identify coexpressioin modules that are consistently differential across disease conditions in multiple transcriptomic studies. We demonstrated good performance of the algorithm using simulated data and applied it to combine eight major depressive disorder studies (cases vs. controls) and four breast cancer studies (ER+ vs. ER-). The identified modules were validated by existing knowledge of disease pathways. These modules can be used to help generate new hypotheses regarding suspected disease genes. In conclusion, the three areas of research covered in this thesis are critical bioinformatic elements for biomedical applications and can be used to help understand the underlying disease mechanism and ultimately improve patient treatment.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Ding, Yingyid5@pitt.eduYID5
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorTseng, Georgectseng@pitt.eduCTSENG
Committee MemberBenos, Panayiotis V.benos@pitt.eduBENOS
Committee MemberSibille,
Committee MemberBar-Joseph,
Committee MemberWeeks, Daniel E.weeks@pitt.eduWEEKS0000-0001-9410-7228
Date: 6 May 2014
Date Type: Publication
Defense Date: 8 April 2014
Approval Date: 6 May 2014
Submission Date: 6 May 2014
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Number of Pages: 128
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Computational Biology
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: machine learning, prognosis biomarker, differential coexpression
Date Deposited: 06 May 2014 17:18
Last Modified: 30 Jun 2022 16:18


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item