Transfer rule learning for biomarker discovery and verification from related data sets

Ganchev, Philip (2011) Transfer rule learning for biomarker discovery and verification from related data sets. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Primary Text
Download (1MB) | Preview

Abstract

Biomarkers are a critical tool for the detection, diagnosis,monitoring and prognosis of diseases, and for understandingdisease mechanisms in order to create treatments. Unfortunately,finding reliable biomarkers is often hampered by a number of practicalproblems, including scarcity of samples, the high dimensionality of the data, and measurement error. An important opportunity to make the most ofthese scarce data is to combine information from multiple relateddata sets for more effective biomarker discovery. Because the costsof creating large data sets for every disease of interest are likelyto remain prohibitive, methods for more effectively making use ofrelated biomarker data sets continues to be important.This thesis develops TRL, a novel framework for integrative biomarkerdiscovery from related but separate data sets, such as those generatedfor similar biomarker profiling studies. TRL alleviates the problemof data scarcity by providing a way to validateknowledge learned from one data set and simultaneously learn newknowledge on a related data set. Unlike other transfer learningapproaches, TRL takes prior knowledge in the form of interpretable,modular classification rules, and uses them to seed learning on a newdata set.We evaluated TRL on 13 pairs of real-world biomarker discovery datasets, and found TRL improves accuracy twice as often asdegrading it. TRL consists of four alternative methods for transferand three measures of the amount of information transferred. Byexperimenting with these methods, we investigate the kinds ofinformation necessary to preserve for transfer learning from relateddata sets. We found it is important to keep track of therelationships between biomarker values and disease state, and toconsider during learning how rules will interact in the final model.If the source and target data are drawn from the same distribution, wefound the performance improvement and amount of transfer increase withincreasing size of the source compared to the target data.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Ganchev, Philip	phil.ganchev@gmail.com

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Gopalakrishnan, Vanathi	vanathi@pitt.edu	VANATHI
Committee Member	Tsui, Fuchiang	TSUI2@pitt.edu	TSUI2
Committee Member	Bowser, Robert	bowserrp@upmc.edu
Committee Member	Visweswaran, Shyam	shv3@pitt.edu	SHV3

Date:

30 January 2011

Date Type:

Completion

Defense Date:

3 December 2010

Approval Date:

30 January 2011

Submission Date:

24 November 2010

Access Restriction:

5 year -- Restrict access to University of Pittsburgh for a period of 5 years.

Institution:

University of Pittsburgh

Schools and Programs:

Dietrich School of Arts and Sciences > Intelligent Systems

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

transfer learning

Other ID:

http://etd.library.pitt.edu/ETD/available/etd-11242010-001158/, etd-11242010-001158

Date Deposited:

10 Nov 2011 20:06

Last Modified:

15 Nov 2016 13:52

URI:

http://d-scholarship.pitt.edu/id/eprint/9790

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Transfer rule learning for biomarker discovery and verification from related data sets

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds