Ganchev, Philip
(2011)
Transfer rule learning for biomarker discovery and verification from related data sets.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Biomarkers are a critical tool for the detection, diagnosis,monitoring and prognosis of diseases, and for understandingdisease mechanisms in order to create treatments. Unfortunately,finding reliable biomarkers is often hampered by a number of practicalproblems, including scarcity of samples, the high dimensionality of the data, and measurement error. An important opportunity to make the most ofthese scarce data is to combine information from multiple relateddata sets for more effective biomarker discovery. Because the costsof creating large data sets for every disease of interest are likelyto remain prohibitive, methods for more effectively making use ofrelated biomarker data sets continues to be important.This thesis develops TRL, a novel framework for integrative biomarkerdiscovery from related but separate data sets, such as those generatedfor similar biomarker profiling studies. TRL alleviates the problemof data scarcity by providing a way to validateknowledge learned from one data set and simultaneously learn newknowledge on a related data set. Unlike other transfer learningapproaches, TRL takes prior knowledge in the form of interpretable,modular classification rules, and uses them to seed learning on a newdata set.We evaluated TRL on 13 pairs of real-world biomarker discovery datasets, and found TRL improves accuracy twice as often asdegrading it. TRL consists of four alternative methods for transferand three measures of the amount of information transferred. Byexperimenting with these methods, we investigate the kinds ofinformation necessary to preserve for transfer learning from relateddata sets. We found it is important to keep track of therelationships between biomarker values and disease state, and toconsider during learning how rules will interact in the final model.If the source and target data are drawn from the same distribution, wefound the performance improvement and amount of transfer increase withincreasing size of the source compared to the target data.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
30 January 2011 |
Date Type: |
Completion |
Defense Date: |
3 December 2010 |
Approval Date: |
30 January 2011 |
Submission Date: |
24 November 2010 |
Access Restriction: |
5 year -- Restrict access to University of Pittsburgh for a period of 5 years. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Intelligent Systems |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
transfer learning |
Other ID: |
http://etd.library.pitt.edu/ETD/available/etd-11242010-001158/, etd-11242010-001158 |
Date Deposited: |
10 Nov 2011 20:06 |
Last Modified: |
15 Nov 2016 13:52 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/9790 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |