Novel Extensions of Label Propagation for Biomarker Discovery in Genomic Data

Stokes, Matthew (2014) Novel Extensions of Label Propagation for Biomarker Discovery in Genomic Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Primary Text
Download (1MB) | Preview

Abstract

One primary goal of analyzing genomic data is the identification of biomarkers which may be causative of, correlated with, or otherwise biologically relevant to disease phenotypes. In this work, I implement and extend a multivariate feature ranking algorithm called label propagation (LP) for biomarker discovery in genome-wide single-nucleotide polymorphism (SNP) data. This graph-based algorithm utilizes an iterative propagation method to efficiently compute the strength of association between a SNP and a phenotype.
I developed three extensions to the LP algorithm, with the goal of tailoring it to genomic data. The first extension is a modification to the LP score which yields a variable-level score for each SNP, rather than a score for each SNP genotype. The second extension incorporates prior biological knowledge that is encoded as a prior value for each SNP. The third extension enables the combination of rankings produced by LP and another feature ranking algorithm.
The LP algorithm, its extensions, and two control algorithms (chi squared and sparse logistic regression) were applied to 11 genomic datasets, including a synthetic dataset, a semi-synthetic dataset, and nine genome-wide association study (GWAS) datasets covering eight diseases. The quality of each feature ranking algorithm was evaluated by using a subset of top-ranked SNPs to construct a classifier, whose predictive power was evaluated in terms of the area under the Receiver Operating Characteristic curve. Top-ranked SNPs were also evaluated for prior evidence of being associated with disease using evidence from the literature.
The LP algorithm was found to be effective at identifying predictive and biologically meaningful SNPs. The single-score extension performed significantly better than the original algorithm on the GWAS datasets. The prior knowledge extension did not improve on the feature ranking results, and in some cases it reduced the predictive power of top-ranked variants. The ranking combination method was effective for some pairs of algorithms, but not for others. Overall, this work’s main results are the formulation and evaluation of several algorithmic extensions of LP for use in the analysis of genomic data, as well as the identification of several disease-associated SNPs.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Stokes, Matthew	mattstokes42@gmail.com

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Visweswaran, Shyam	shv3@pitt.edu	SHV3
Committee Member	Cooper, Gregory F	gfc@cbmi.pitt.edu	GFC
Committee Member	Hauskrecht, Milos	milos@cs.pitt.edu	MILOS
Committee Member	Barmada, M. Michael	barmada@pitt.edu	BARMADA

Date:

25 September 2014

Date Type:

Publication

Defense Date:

17 July 2014

Approval Date:

25 September 2014

Submission Date:

14 August 2014

Access Restriction:

1 year -- Restrict access to University of Pittsburgh for a period of 1 year.

Number of Pages:

134

Institution:

University of Pittsburgh

Schools and Programs:

Dietrich School of Arts and Sciences > Intelligent Systems

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

feature selection, dimensionality reduction, bioinformatics, label propagation, SNP, genomics, biomarker discovery

Date Deposited:

25 Sep 2014 14:52

Last Modified:

15 Nov 2016 14:23

URI:

http://d-scholarship.pitt.edu/id/eprint/22722

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Novel Extensions of Label Propagation for Biomarker Discovery in Genomic Data

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds