Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality

Jiang, X and Neapolitan, RE (2012) Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality. PLoS ONE, 7 (10).

Preview

PDF
Published Version
Available under License : See the attached license file.
Download (514kB) | Preview

Plain Text (licence)
Available under License : See the attached license file.
Download (1kB)

Abstract

Background: The interaction between loci to affect phenotype is called epistasis. It is strict epistasis if no proper subset of the interacting loci exhibits a marginal effect. For many diseases, it is likely that unknown epistatic interactions affect disease susceptibility. A difficulty when mining epistatic interactions from high-dimensional datasets concerns the curse of dimensionality. There are too many combinations of SNPs to perform an exhaustive search. A method that could locate strict epistasis without an exhaustive search can be considered the brass ring of methods for analyzing high-dimensional datasets. Methodology/Findings: A SNP pattern is a Bayesian network representing SNP-disease relationships. The Bayesian score for a SNP pattern is the probability of the data given the pattern, and has been used to learn SNP patterns. We identified a bound for the score of a SNP pattern. The bound provides an upper limit on the Bayesian score of any pattern that could be obtained by expanding a given pattern. We felt that the bound might enable the data to say something about the promise of expanding a 1-SNP pattern even when there are no marginal effects. We tested the bound using simulated datasets and semi-synthetic high-dimensional datasets obtained from GWAS datasets. We found that the bound was able to dramatically reduce the search time for strict epistasis. Using an Alzheimer's dataset, we showed that it is possible to discover an interaction involving the APOE gene based on its score because of its large marginal effect, but that the bound is most effective at discovering interactions without marginal effects. Conclusions/Significance: We conclude that the bound appears to ameliorate the curse of dimensionality in high-dimensional datasets. This is a very consequential result and could be pivotal in our efforts to reveal the dark matter of genetic disease risk from high-dimensional datasets. © 2012 Jiang, Neapolitan.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

Article

Status:

Published

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Jiang, X	xij6@pitt.edu	XIJ6
Neapolitan, RE

Contributors:

Contribution	Contributors Name	Email	Pitt Username	ORCID
Editor	Wang, Xiaofeng	UNSPECIFIED	UNSPECIFIED	UNSPECIFIED

Date:

12 October 2012

Date Type:

Publication

Journal or Publication Title:

PLoS ONE

Volume:

Number:

DOI or Unique Handle:

10.1371/journal.pone.0046771

Schools and Programs:

School of Medicine > Biomedical Informatics

Refereed:

Yes

PubMed ID:

23071633

Date Deposited:

18 Oct 2012 20:47

Last Modified:

02 Feb 2019 14:55

URI:

http://d-scholarship.pitt.edu/id/eprint/16063

Metrics

Monthly Views for the past 3 years

Plum Analytics

Altmetric.com

Actions (login required)

View Item

My Account

Search

Browse

Information

Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Altmetric.com

Actions (login required)

Connect with us

Send Comments or Questions

Feeds