Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Novel Extensions of Label Propagation for Biomarker Discovery in Genomic Data

Stokes, Matthew (2014) Novel Extensions of Label Propagation for Biomarker Discovery in Genomic Data. Doctoral Dissertation, University of Pittsburgh.

[img]
Preview
PDF
Primary Text

Download (1MB) | Preview

Abstract

One primary goal of analyzing genomic data is the identification of biomarkers which may be causative of, correlated with, or otherwise biologically relevant to disease phenotypes. In this work, I implement and extend a multivariate feature ranking algorithm called label propagation (LP) for biomarker discovery in genome-wide single-nucleotide polymorphism (SNP) data. This graph-based algorithm utilizes an iterative propagation method to efficiently compute the strength of association between a SNP and a phenotype.
I developed three extensions to the LP algorithm, with the goal of tailoring it to genomic data. The first extension is a modification to the LP score which yields a variable-level score for each SNP, rather than a score for each SNP genotype. The second extension incorporates prior biological knowledge that is encoded as a prior value for each SNP. The third extension enables the combination of rankings produced by LP and another feature ranking algorithm.
The LP algorithm, its extensions, and two control algorithms (chi squared and sparse logistic regression) were applied to 11 genomic datasets, including a synthetic dataset, a semi-synthetic dataset, and nine genome-wide association study (GWAS) datasets covering eight diseases. The quality of each feature ranking algorithm was evaluated by using a subset of top-ranked SNPs to construct a classifier, whose predictive power was evaluated in terms of the area under the Receiver Operating Characteristic curve. Top-ranked SNPs were also evaluated for prior evidence of being associated with disease using evidence from the literature.
The LP algorithm was found to be effective at identifying predictive and biologically meaningful SNPs. The single-score extension performed significantly better than the original algorithm on the GWAS datasets. The prior knowledge extension did not improve on the feature ranking results, and in some cases it reduced the predictive power of top-ranked variants. The ranking combination method was effective for some pairs of algorithms, but not for others. Overall, this work’s main results are the formulation and evaluation of several algorithmic extensions of LP for use in the analysis of genomic data, as well as the identification of several disease-associated SNPs.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Published
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Stokes, Matthewmattstokes42@gmail.com
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairVisweswaran, Shyamshv3@pitt.eduSHV3
Committee MemberCooper, Gregory Fgfc@cbmi.pitt.eduGFC
Committee MemberHauskrecht, Milosmilos@cs.pitt.eduMILOS
Committee MemberBarmada, M. Michaelbarmada@pitt.eduBARMADA
Date: 25 September 2014
Date Type: Publication
Defense Date: 17 July 2014
Approval Date: 25 September 2014
Submission Date: 14 August 2014
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 134
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Intelligent Systems
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: feature selection, dimensionality reduction, bioinformatics, label propagation, SNP, genomics, biomarker discovery
Date Deposited: 25 Sep 2014 14:52
Last Modified: 15 Nov 2016 14:23
URI: http://d-scholarship.pitt.edu/id/eprint/22722

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item