Jordan, Rick
(2016)
LITERATURE MINING SUSTAINS AND ENHANCES KNOWLEDGE DISCOVERY FROM OMIC STUDIES.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Genomic, proteomic and other experimentally generated data from studies of biological systems aiming to discover disease biomarkers are currently analyzed without sufficient supporting evidence from the literature due to complexities associated with automated processing. Extracting prior knowledge about markers associated with biological sample types and disease states from the literature is tedious, and little research has been performed to understand how to use this knowledge to inform the generation of classification models from ‘omic’ data. Using pathway analysis methods to better understand the underlying biology of complex diseases such as breast and lung cancers is state-of-the-art. However, the problem of how to combine literature-mining evidence with pathway analysis evidence is an open problem in biomedical informatics research.
This dissertation presents a novel semi-automated framework, named Knowledge Enhanced Data Analysis (KEDA), which incorporates the following components: 1) literature mining of text; 2) classification modeling; and 3) pathway analysis. This framework aids researchers in assigning literature-mining-based prior knowledge values to genes and proteins associated with disease biology. It incorporates prior knowledge into the modeling of experimental datasets, enriching the development process with current findings from the scientific community.
New knowledge is presented in the form of lists of known disease-specific biomarkers and their accompanying scores obtained through literature mining of millions of lung and breast cancer abstracts. These scores can subsequently be used as prior knowledge values in Bayesian modeling and pathway analysis. Ranked, newly discovered biomarker-disease-biofluid relationships which identify biomarker specificity across biofluids are presented. A novel method of identifying biomarker relationships is discussed that examines the attributes from the best-performing models. Pathway analysis results from the addition of prior information, ultimately lead to more robust evidence for pathway involvement in diseases of interest based on statistically significant standard measures of impact factor and p-values.
The outcome of implementing the KEDA framework is enhanced modeling and pathway analysis findings. Enhanced knowledge discovery analysis leads to new disease-specific entities and relationships that otherwise would not have been identified. Increased disease understanding, as well as identification of biomarkers for disease diagnosis, treatment, or therapy targets should ultimately lead to validation and clinical implementation.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
23 May 2016 |
Date Type: |
Publication |
Defense Date: |
2 December 2015 |
Approval Date: |
23 May 2016 |
Submission Date: |
20 May 2016 |
Access Restriction: |
5 year -- Restrict access to University of Pittsburgh for a period of 5 years. |
Number of Pages: |
227 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Medicine > Biomedical Informatics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
literature mining, text-mining, pathway analysis, Bayesian modeling, lung cancer, breast cancer |
Date Deposited: |
23 May 2016 14:15 |
Last Modified: |
23 May 2021 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/28062 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |