Santana dos Santos, Lucas
(2017)
COMPUTATIONAL METHODS FOR THE FUNCTIONAL ANALYSIS OF DNA SEQUENCE VARIANTS.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Complex diseases, such as cancer and inflammatory bowel disease, are caused by a combination of genetic and environmental factors. The advent of next-generation sequencing (NGS) technology allowed the genome-wide investigation of the underlying genetic causes of complex disorders. Analysis of the large amount of data generated by NGS is computationally intensive and require new computational methods. One of the current problems in genomic data analysis is the lack of computational methods for functional annotation of DNA sequence variants (DSVs), especially regulatory DNA sequence variants (rDSVs). In recent years, rDSVs have been shown to be the primary cause of complex diseases, supported by the fact that functional regulatory sites are more polymorphic than coding regions, and that rDSVs vastly outnumber coding variants. Also, GWAS studies of complex traits have shown that SNPs with the strongest association signals lie outside known genes in non-coding regions of the genome.
This dissertation contributes to a solution to the lack of computational methods for the analysis of DNA sequence variants. Two novel computational methods for the analysis of DSVs are proposed here: 1) an algorithm, called is-miRSNP, DSVs on miRNA binding, 2) a pipeline for the functional annotation of DSVs using NGS. The is-miRSNP algorithm uses a binding-energy approach for the prediction of DSVs effects on miRNA binding. The algorithm is flexible enough to process large amounts of data and can be easily integrated into existing pipelines. Experiments using a manually curated set of experimentally validated DSVs-miRNA showed that is-miRSNP outperforms all most popular existing methods. The pipeline for functional annotation of functional DSVs utilizes state-of-the- art existing computational methods. The pipeline has been applied to an effector memory T cell RNA-Seq dataset that is related to inflammatory bowel disease and has identified biologically relevant genes and isoforms that are differentially expressed upon treatment with Prostaglandin E2. Important pathways and biologically relevant DSVs were also identified and recovered. These methods have the potential to help clinicians and researchers analyze and interpret genomic datasets, and might in the future help the development of new diagnostics methods and treatments.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
18 May 2017 |
Date Type: |
Publication |
Defense Date: |
4 April 2017 |
Approval Date: |
18 May 2017 |
Submission Date: |
16 May 2017 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
109 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Medicine > Biomedical Informatics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Genomics, Bioinformatics, inflammatory bowel disease, miRNA, regulatory variants,
DNA sequence variants, NGS, RNA-Seq |
Date Deposited: |
18 May 2017 14:06 |
Last Modified: |
18 May 2017 14:06 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/32008 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |