Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Meta-analysis framework for peak calling by combining multiple ChIP-seq algorithms and gene clustering by combining multiple transcriptomic studies

Chen, Rui (2015) Meta-analysis framework for peak calling by combining multiple ChIP-seq algorithms and gene clustering by combining multiple transcriptomic studies. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Submitted Version

Download (6MB)


With the availability of large amount of genomics studies, integrating information from multiple sources improves knowledge discovery. To address the complexity of genome and numerous genetic features, meta-analysis that aggregate information achieves higher statistical power for the measure of interest, and identify patterns among study results, sources of disagreement among those results.

As Next-Generation Sequencing (NGS) technologies are becoming affordable and can provide per-base resolution, NGS data serves as an appealing tool to analyze genomic fea-tures. Among various applications of NGS technologies, chromatin immunoprecipitation followed by high-throughput sequencing(ChIP-seq) is primarily used to provide quantitative, genome-wide mapping of target protein and DNA interaction events. Signal peak calling algorithms identified target regions of interest enriched in vitro. Despite the existing pro-grams for previous ChIP-Chip platforms, peak calling of putative protein binding sites from large, sequencing based data-sets presents a bioinformatic challenge that has required considerable computational innovation. Popular peak calling algorithms, such as MACS, SPP, CisGenome, SISSRs, USeq, and PeakSeq, are widely applied but each of them has different emphasis on sensitivity, specificity or different size and shape selection of peaks. In the first project of this dissertation, we propose a meta-analysis framework, ChIP-MetaCaller, to combine multiple top-performing algorithms to identify and reprioritize the peaks. We provide a forward selection algorithm to decide best combination of algorithms’ output to perform meta-analysis and showed that the result improves motif enrichment and sensitivity. The results are more trackable by biologists for further validation and hypothesis generation.

The mechanisms of complex diseases like cancers involve changes in multiple genes, each conferring small and incremental risk that potentially converge in deregulated biological pathways, cellular functions and local circuit changes. To understand this complex network requires discovery of co-expression gene modules. Literature shows using meta-analysis can improve performance of identifying these modules from machine learning techniques in some pilot studies. In the second project of this dissertation, we proposed approach which is based on the clustering results of each individual study. Combining standardized distances from genes to the medoids lead to an integrated distance matrix and perform the meta-clustering. We compared the performance of proposed approach and Meta Clustering combining distance under three simulation settings and three real data sets and provide guidance for practitioners.

Two projects included in this dissertation tackles different biological questions based on genomics data. Both of them improve performance from existing methods by information integration applying meta-analysis frameworks, and provide comprehensive biomarker detection.This work could improve public health by providing more effective methodologies for biomarker detection in the integration of multiple genomic studies.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Chen, Ruiruc9@pitt.eduRUC9
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng, Georgectseng@pitt.eduCTSENG
Committee MemberTang, Gonggot1@pitt.eduGOT1
Committee MemberBarmada, M. Michaelbarmada@pitt.eduBARMADA
Committee MemberPark, Yong Seokyongpark@pitt.eduYONGPARK
Date: 28 January 2015
Date Type: Publication
Defense Date: 17 December 2014
Approval Date: 28 January 2015
Submission Date: 6 January 2015
Access Restriction: 2 year -- Restrict access to University of Pittsburgh for a period of 2 years.
Number of Pages: 93
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: meta-analysis, ChIP-seq, co-expression modules
Date Deposited: 28 Jan 2015 16:04
Last Modified: 01 Jan 2017 06:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item