Liang, Lifan
(2021)
Integrative Analysis of Modular Structure of Genes in High-throughput Tumor Profiles.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Cellular functions, such as signal transduction, transportation, cell cycle, and various metabolism, require cooperation of many gene products. Following the central dogma, such large-scale cooperation within and across cells often leave traces on different omics profiles. One major clue would be the strong correlation among genes in genomics, epigenetics, transcriptomics, and proteomics. Based on this premise, we started to identify functional modules by integrating pairwise correlation among genes from different information sources into the form of multiplex networks. Although all the layers of the multiplex shared the same protein interactome as the skeleton, edge weights in each layer represents pairwise correlation from a different type of information sources. This formation allows information flow from one data source to another. We also designed a novel graph clustering algorithm to detect gene sets with strong correlations inside.
However, the multiplex integration only yields marginal improvement against single omics. We turn to the mutual exclusivity patterns in cancer genomics. This pattern suggests that a single somatic alteration event may be sufficient to promote tumorigenesis. We pushed the assumption further to state that disruption of a single pathway could lead to differential expression of a large set of genes, which is supported by our work on Boolean matrix factorization. Then we proposed the OR-gate network (ORN) to model the causal mechanism from somatic alterations to transcriptomics. Results showed that it is able to recover the heterogeneity among cancer samples and functional modules responsible for certain dysregulation in cancer transcriptomics.
Still, ORN has two major limitations. One is the issue of co-amplification. ORN cannot distinguish passengers in the same copy number variation hotspot as the drivers. To this end, we applied the word2vec model to extract gene embedding from biomedical literature. Another issue is the transcriptional regulation module may not be accurate. To this end, we developed a novel algorithm (peak2vec) to uncover transcriptional motif patterns and coregulation from the chromatic accessibility profiles.
In the future, we will integrate gene embedding and peak2vec into the ORN framework to better understand the causal impact of somatic alteration as functional modules.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
17 December 2021 |
Date Type: |
Publication |
Defense Date: |
31 September 2021 |
Approval Date: |
17 December 2021 |
Submission Date: |
13 October 2021 |
Access Restriction: |
1 year -- Restrict access to University of Pittsburgh for a period of 1 year. |
Number of Pages: |
165 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Medicine > Biomedical Informatics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
somatic genomic alteration; multi-omics analysis; high-throughput technology; word embedding; ATAC-seq; RNA-seq; matrix factorization; deep learning |
Date Deposited: |
17 Dec 2021 14:09 |
Last Modified: |
17 Dec 2022 06:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/41853 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |