Buschur, Kristina
(2019)
Unsupervised methods for pattern discovery in high-throughput genomic data.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Large –omics experiment datasets are being generated at an increasingly fast pace. They present bountiful opportunities for insight into complex diseases and systems but also new challenges in analysis. Novel approaches are needed to make sense of these high-throuput data and especially to consider them jointly for a more complete picture of the system’s biology. In this dissertation, we have focused on improving clustering in high-throughput biological datasets by developing a variety of new features that are specifically tailored to reflect the biological properties of the systems we are trying to understand. We started by proposing new features for representing transcription factor binding sites that capture both the DNA sequence composition of the binding region and the TF-DNA binding strength. We observed that these new features aided clustering for improved DNA binding motif discovery. Next, we presented a new method, single sample network perturbation assessment (ssNPA), and demonstrated how causal network learning algorithms could be used to build features that capture the complex interactions of variables within biological systems such as gene regulatory networks and cluster samples based on how these networks are deregulated in different subtypes. We validated this method in a murine liver cell development dataset and with transcriptomic datasets comparing breast cancer and lung adenocarcinoma tumor samples to normal tissue. Then we used ssNPA to describe new subtypes of chronic obstructive pulmonary disease (COPD) that were based on their relative gene network deregulation compared to normal samples. Finally, we applied causal network modeling techniques to two datasets of chronic lung diseases, exploring the systems biology of lung function decline in COPD at the body systems level and cell type interactions in idiopathic pulmonary fibrosis (IPF) at the scale of the gene expression in single cells.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
29 May 2019 |
Date Type: |
Publication |
Defense Date: |
26 April 2019 |
Approval Date: |
29 May 2019 |
Submission Date: |
15 May 2019 |
Access Restriction: |
2 year -- Restrict access to University of Pittsburgh for a period of 2 years. |
Number of Pages: |
119 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Medicine > Computational Biology |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
genomics, clustering |
Date Deposited: |
29 May 2019 19:52 |
Last Modified: |
29 May 2021 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/36728 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |