ROBUST PREDICTIVE MODELING OF RELATED GENE EXPRESSION DATA VIA MULTI-SOURCE TRANSFER RULE LEARNINGOgoe, Henry A (2016) ROBUST PREDICTIVE MODELING OF RELATED GENE EXPRESSION DATA VIA MULTI-SOURCE TRANSFER RULE LEARNING. Doctoral Dissertation, University of Pittsburgh. (Unpublished) This is the latest version of this item.
AbstractThe advent of high-throughput genomics has led to the accumulation of copious amounts of biomedical data such as gene expression, made available through public repositories like the NCBI’s GEO. Meanwhile, the digitization of biomedical literature into repositories such as PubMed, have motivated the creation of curated knowledge bases like the Gene Ontology. Pooling information from such repositories and integrating it with predictive modeling of similar biomedical data from multiple studies, could lead to models that are more robust. Most current methods are unable to leverage background knowledge, referred to herein as catastrophic forgetting, and often produce black-box models that are difficult for humans to interpret. In this era of precision medicine, there is thus a critical need for effective methods that could incorporate background knowledge from multiple sources, and yet produce simple to understand models from biomedical datasets. This dissertation develops four novel frame-works: (i) TRL-FM, (ii) KARL, (iii) MS-TRL, and (iv) iTRL, which use transfer rule learning to incorporate background knowledge from multiple sources for predictive modeling of gene expression datasets. They provide significant extensions to an existing method, TRL that leveraged background knowledge from single sources. This work tests the hypothesis that “incorporating background knowledge from multiple sources into predictive modeling via the transfer rule learning approach leads to models that contain more robust propositional rule patterns than learning without any background knowledge or just from a single source.” To test this hypothesis, I compared the accuracy and coverage of predictive models that were produced with the methods developed herein, to the baseline models, using 25 gene expression datasets from 5 studies of brain, breast, colon, lung, and prostate cancers. The results showed that the former, produce on average, statistically significantly more robust models than the latter. Also, KARL, MS-TRL, and iTRL provide mechanisms that could be used to discover both domain-specific and domain-independent robust rule patterns. The methods developed herein augment extant capabilities of predictive modeling techniques to utilize and build robust, easy-to-interpret rule models from sparse, single, diverse sources of biomedical data and knowledge. These methods can be easily extended to other application domains beyond biomedicine. Share
Details
Available Versions of this Item
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|