He, Jian
(2021)
On Gains from Biomarker Optimization toward ROC-Related Targets in Real-Life Data.
Master's Thesis, University of Pittsburgh.
(Unpublished)
Abstract
In biomedical studies, it is often of interest to classify/predict a subject’s condition using a combination of multiple markers. With the introduction of additional markers, one could expect that the classification performance of a combined classification score is better than that of a single marker. However, this is not always the case. For example, the logistic regression combining two markers can be less discriminative than one of them. This phenomenon stems from the fact that logistic regression seeks to optimize a likelihood function that is not directly related to measures of classification performance. Because of these and other related problems, recent methods for marker development recommend matching the optimization targets to performance indices most relevant for the targeted application. Those optimization targets include the area under the curve (AUC), the partial AUC (pAUC) over a clinically relevant range, and the sensitivity at the lowest “tolerable” level of specificity.
In this work, I investigated and implemented several distribution-free approaches to optimizing linear combinations of prostate cancer biomarkers for a screening task, which requires high specificity of the decision rule. The primary objective is to study gains from using task-specific objective functions to optimize meaningful combinations of markers in a real-life dataset. The considered approaches range from combining markers sequentially with grid-search methods, up to combining multiple (more than 2) markers simultaneously using gradient-based optimization toward smooth approximations of classification-related objective functions.
The results indicate that combinations of real-life biomarkers can benefit substantially from optimizing the objective function tailored for the targeted classification task. The same phenomenon, possibly to a lesser degree, can be expected from less interpretable non-linear classification approaches. These findings are important in the fields of public health and medicine as a targeted optimization of biomarker combinations can substantially improve the performance of the resulting decision rules in specific tasks, such as screening a large population or triaging patients with symptoms.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
11 May 2021 |
Date Type: |
Publication |
Defense Date: |
20 April 2021 |
Approval Date: |
11 May 2021 |
Submission Date: |
30 April 2021 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
88 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
MS - Master of Science |
Thesis Type: |
Master's Thesis |
Refereed: |
Yes |
Uncontrolled Keywords: |
Optimization ROC Grid search Classification Cross-validation
Prostate cancer biomarkers Logistic regression Random forests |
Date Deposited: |
11 May 2021 19:24 |
Last Modified: |
11 May 2021 19:24 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/40969 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |