Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form


Santana dos Santos, Lucas (2017) COMPUTATIONAL METHODS FOR THE FUNCTIONAL ANALYSIS OF DNA SEQUENCE VARIANTS. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (4MB) | Preview


Complex diseases, such as cancer and inflammatory bowel disease, are caused by a combination of genetic and environmental factors. The advent of next-generation sequencing (NGS) technology allowed the genome-wide investigation of the underlying genetic causes of complex disorders. Analysis of the large amount of data generated by NGS is computationally intensive and require new computational methods. One of the current problems in genomic data analysis is the lack of computational methods for functional annotation of DNA sequence variants (DSVs), especially regulatory DNA sequence variants (rDSVs). In recent years, rDSVs have been shown to be the primary cause of complex diseases, supported by the fact that functional regulatory sites are more polymorphic than coding regions, and that rDSVs vastly outnumber coding variants. Also, GWAS studies of complex traits have shown that SNPs with the strongest association signals lie outside known genes in non-coding regions of the genome.
This dissertation contributes to a solution to the lack of computational methods for the analysis of DNA sequence variants. Two novel computational methods for the analysis of DSVs are proposed here: 1) an algorithm, called is-miRSNP, DSVs on miRNA binding, 2) a pipeline for the functional annotation of DSVs using NGS. The is-miRSNP algorithm uses a binding-energy approach for the prediction of DSVs effects on miRNA binding. The algorithm is flexible enough to process large amounts of data and can be easily integrated into existing pipelines. Experiments using a manually curated set of experimentally validated DSVs-miRNA showed that is-miRSNP outperforms all most popular existing methods. The pipeline for functional annotation of functional DSVs utilizes state-of-the- art existing computational methods. The pipeline has been applied to an effector memory T cell RNA-Seq dataset that is related to inflammatory bowel disease and has identified biologically relevant genes and isoforms that are differentially expressed upon treatment with Prostaglandin E2. Important pathways and biologically relevant DSVs were also identified and recovered. These methods have the potential to help clinicians and researchers analyze and interpret genomic datasets, and might in the future help the development of new diagnostics methods and treatments.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Santana dos Santos, Lucaslss19@pitt.edulss19@pitt.edu0000-0002-0872-5431
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairBenos, Panayiotisbenos@pitt.edubenos
Committee MemberDuerr, Richardrduerr@pitt.eduruder
Committee MemberGopalakrishnan, Vanathivanathi@pitt.eduvanathi
Committee MemberJiang, Xiaxij6@pitt.eduxij6
Date: 18 May 2017
Date Type: Publication
Defense Date: 4 April 2017
Approval Date: 18 May 2017
Submission Date: 16 May 2017
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 109
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Biomedical Informatics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Genomics, Bioinformatics, inflammatory bowel disease, miRNA, regulatory variants, DNA sequence variants, NGS, RNA-Seq
Date Deposited: 18 May 2017 14:06
Last Modified: 18 May 2017 14:06


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item