Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Transfer rule learning for biomarker discovery and verification from related data sets

Ganchev, Philip (2011) Transfer rule learning for biomarker discovery and verification from related data sets. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (1MB) | Preview


Biomarkers are a critical tool for the detection, diagnosis,monitoring and prognosis of diseases, and for understandingdisease mechanisms in order to create treatments. Unfortunately,finding reliable biomarkers is often hampered by a number of practicalproblems, including scarcity of samples, the high dimensionality of the data, and measurement error. An important opportunity to make the most ofthese scarce data is to combine information from multiple relateddata sets for more effective biomarker discovery. Because the costsof creating large data sets for every disease of interest are likelyto remain prohibitive, methods for more effectively making use ofrelated biomarker data sets continues to be important.This thesis develops TRL, a novel framework for integrative biomarkerdiscovery from related but separate data sets, such as those generatedfor similar biomarker profiling studies. TRL alleviates the problemof data scarcity by providing a way to validateknowledge learned from one data set and simultaneously learn newknowledge on a related data set. Unlike other transfer learningapproaches, TRL takes prior knowledge in the form of interpretable,modular classification rules, and uses them to seed learning on a newdata set.We evaluated TRL on 13 pairs of real-world biomarker discovery datasets, and found TRL improves accuracy twice as often asdegrading it. TRL consists of four alternative methods for transferand three measures of the amount of information transferred. Byexperimenting with these methods, we investigate the kinds ofinformation necessary to preserve for transfer learning from relateddata sets. We found it is important to keep track of therelationships between biomarker values and disease state, and toconsider during learning how rules will interact in the final model.If the source and target data are drawn from the same distribution, wefound the performance improvement and amount of transfer increase withincreasing size of the source compared to the target data.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairGopalakrishnan, Vanathivanathi@pitt.eduVANATHI
Committee MemberTsui, FuchiangTSUI2@pitt.eduTSUI2
Committee MemberBowser,
Committee MemberVisweswaran, Shyamshv3@pitt.eduSHV3
Date: 30 January 2011
Date Type: Completion
Defense Date: 3 December 2010
Approval Date: 30 January 2011
Submission Date: 24 November 2010
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Intelligent Systems
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: transfer learning
Other ID:, etd-11242010-001158
Date Deposited: 10 Nov 2011 20:06
Last Modified: 15 Nov 2016 13:52


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item