Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Unsupervised methods for pattern discovery in high-throughput genomic data

Buschur, Kristina (2019) Unsupervised methods for pattern discovery in high-throughput genomic data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (4MB) | Preview

Abstract

Large –omics experiment datasets are being generated at an increasingly fast pace. They present bountiful opportunities for insight into complex diseases and systems but also new challenges in analysis. Novel approaches are needed to make sense of these high-throuput data and especially to consider them jointly for a more complete picture of the system’s biology. In this dissertation, we have focused on improving clustering in high-throughput biological datasets by developing a variety of new features that are specifically tailored to reflect the biological properties of the systems we are trying to understand. We started by proposing new features for representing transcription factor binding sites that capture both the DNA sequence composition of the binding region and the TF-DNA binding strength. We observed that these new features aided clustering for improved DNA binding motif discovery. Next, we presented a new method, single sample network perturbation assessment (ssNPA), and demonstrated how causal network learning algorithms could be used to build features that capture the complex interactions of variables within biological systems such as gene regulatory networks and cluster samples based on how these networks are deregulated in different subtypes. We validated this method in a murine liver cell development dataset and with transcriptomic datasets comparing breast cancer and lung adenocarcinoma tumor samples to normal tissue. Then we used ssNPA to describe new subtypes of chronic obstructive pulmonary disease (COPD) that were based on their relative gene network deregulation compared to normal samples. Finally, we applied causal network modeling techniques to two datasets of chronic lung diseases, exploring the systems biology of lung function decline in COPD at the body systems level and cell type interactions in idiopathic pulmonary fibrosis (IPF) at the scale of the gene expression in single cells.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Buschur, Kristinaklbuschur@gmail.comklb170
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairKostka, Denniskostka@pitt.edu
Committee MemberHinman, Veronicavhinman@andrew.cmu.edu
Committee MemberOesterreich, Steffioesterreichs@upmc.edu
Thesis AdvisorBenos, Panayiotisbenos@pitt.edu
Date: 29 May 2019
Date Type: Publication
Defense Date: 26 April 2019
Approval Date: 29 May 2019
Submission Date: 15 May 2019
Access Restriction: 2 year -- Restrict access to University of Pittsburgh for a period of 2 years.
Number of Pages: 119
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Computational Biology
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: genomics, clustering
Date Deposited: 29 May 2019 19:52
Last Modified: 29 May 2021 05:15
URI: http://d-scholarship.pitt.edu/id/eprint/36728

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item