Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

ROBUST PREDICTIVE MODELING OF RELATED GENE EXPRESSION DATA VIA MULTI-SOURCE TRANSFER RULE LEARNING

Ogoe, Henry A (2016) ROBUST PREDICTIVE MODELING OF RELATED GENE EXPRESSION DATA VIA MULTI-SOURCE TRANSFER RULE LEARNING. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

[img]
Preview
PDF
Download (1MB) | Preview

Abstract

The advent of high-throughput genomics has led to the accumulation of copious amounts of biomedical data such as gene expression, made available through public repositories like the NCBI’s GEO. Meanwhile, the digitization of biomedical literature into repositories such as PubMed, have motivated the creation of curated knowledge bases like the Gene Ontology. Pooling information from such repositories and integrating it with predictive modeling of similar biomedical data from multiple studies, could lead to models that are more robust. Most current methods are unable to leverage background knowledge, referred to herein as catastrophic forgetting, and often produce black-box models that are difficult for humans to interpret.

In this era of precision medicine, there is thus a critical need for effective methods that could incorporate background knowledge from multiple sources, and yet produce simple to understand models from biomedical datasets. This dissertation develops four novel frame-works: (i) TRL-FM, (ii) KARL, (iii) MS-TRL, and (iv) iTRL, which use transfer rule learning to incorporate background knowledge from multiple sources for predictive modeling of gene expression datasets. They provide significant extensions to an existing method, TRL that leveraged background knowledge from single sources. This work tests the hypothesis that “incorporating background knowledge from multiple sources into predictive modeling via the transfer rule learning approach leads to models that contain more robust propositional rule patterns than learning without any background knowledge or just from a single source.”

To test this hypothesis, I compared the accuracy and coverage of predictive models that were produced with the methods developed herein, to the baseline models, using 25 gene expression datasets from 5 studies of brain, breast, colon, lung, and prostate cancers. The results showed that the former, produce on average, statistically significantly more robust models than the latter. Also, KARL, MS-TRL, and iTRL provide mechanisms that could be used to discover both domain-specific and domain-independent robust rule patterns.

The methods developed herein augment extant capabilities of predictive modeling techniques to utilize and build robust, easy-to-interpret rule models from sparse, single, diverse sources of biomedical data and knowledge. These methods can be easily extended to other application domains beyond biomedicine.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Ogoe, Henry Ahao9@pitt.eduHAO9
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairGopalakrishnan, Vanathivanathi@pitt.eduVANATHI
Committee MemberCooper, Gregory Fgfc@pitt.eduGFC
Committee MemberLu, Xinghuaxinghua@pitt.eduXINGHUA
Committee MemberVisweswaran, Shyamshv3@pitt.eduSHV3
Date: 2 September 2016
Date Type: Publication
Defense Date: 5 July 2016
Approval Date: 2 September 2016
Submission Date: 16 August 2016
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Number of Pages: 296
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Biomedical Informatics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: predictive modeling, transfer learning, gene expression, multiple sources, knowledge discovery, rule learning
Date Deposited: 02 Sep 2016 18:05
Last Modified: 02 Sep 2021 05:15
URI: http://d-scholarship.pitt.edu/id/eprint/29428

Available Versions of this Item

  • ROBUST PREDICTIVE MODELING OF RELATED GENE EXPRESSION DATA VIA MULTI-SOURCE TRANSFER RULE LEARNING. (deposited 02 Sep 2016 18:05) [Currently Displayed]

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item