Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

On Gains from Biomarker Optimization toward ROC-Related Targets in Real-Life Data

He, Jian (2021) On Gains from Biomarker Optimization toward ROC-Related Targets in Real-Life Data. Master's Thesis, University of Pittsburgh. (Unpublished)

Download (1MB) | Preview


In biomedical studies, it is often of interest to classify/predict a subject’s condition using a combination of multiple markers. With the introduction of additional markers, one could expect that the classification performance of a combined classification score is better than that of a single marker. However, this is not always the case. For example, the logistic regression combining two markers can be less discriminative than one of them. This phenomenon stems from the fact that logistic regression seeks to optimize a likelihood function that is not directly related to measures of classification performance. Because of these and other related problems, recent methods for marker development recommend matching the optimization targets to performance indices most relevant for the targeted application. Those optimization targets include the area under the curve (AUC), the partial AUC (pAUC) over a clinically relevant range, and the sensitivity at the lowest “tolerable” level of specificity.
In this work, I investigated and implemented several distribution-free approaches to optimizing linear combinations of prostate cancer biomarkers for a screening task, which requires high specificity of the decision rule. The primary objective is to study gains from using task-specific objective functions to optimize meaningful combinations of markers in a real-life dataset. The considered approaches range from combining markers sequentially with grid-search methods, up to combining multiple (more than 2) markers simultaneously using gradient-based optimization toward smooth approximations of classification-related objective functions.
The results indicate that combinations of real-life biomarkers can benefit substantially from optimizing the objective function tailored for the targeted classification task. The same phenomenon, possibly to a lesser degree, can be expected from less interpretable non-linear classification approaches. These findings are important in the fields of public health and medicine as a targeted optimization of biomarker combinations can substantially improve the performance of the resulting decision rules in specific tasks, such as screening a large population or triaging patients with symptoms.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
He, Jianjih81@pitt.edujih81
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairBandos, Andriyanb61@pitt.eduanb61
Committee MemberJeong, Jong-Hyeonjjeong@pitt.edujjeong
Committee MemberLee, Ju Hunjul78@pitt.edujul78
Date: 11 May 2021
Date Type: Publication
Defense Date: 20 April 2021
Approval Date: 11 May 2021
Submission Date: 30 April 2021
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 88
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: Optimization ROC Grid search Classification Cross-validation Prostate cancer biomarkers Logistic regression Random forests
Date Deposited: 11 May 2021 19:24
Last Modified: 11 May 2021 19:24


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item