Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Integrative Analysis of Modular Structure of Genes in High-throughput Tumor Profiles

Liang, Lifan (2021) Integrative Analysis of Modular Structure of Genes in High-throughput Tumor Profiles. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

PDF (Third revision according to editor's comment)
Updated Version

Download (4MB) | Preview


Cellular functions, such as signal transduction, transportation, cell cycle, and various metabolism, require cooperation of many gene products. Following the central dogma, such large-scale cooperation within and across cells often leave traces on different omics profiles. One major clue would be the strong correlation among genes in genomics, epigenetics, transcriptomics, and proteomics. Based on this premise, we started to identify functional modules by integrating pairwise correlation among genes from different information sources into the form of multiplex networks. Although all the layers of the multiplex shared the same protein interactome as the skeleton, edge weights in each layer represents pairwise correlation from a different type of information sources. This formation allows information flow from one data source to another. We also designed a novel graph clustering algorithm to detect gene sets with strong correlations inside.
However, the multiplex integration only yields marginal improvement against single omics. We turn to the mutual exclusivity patterns in cancer genomics. This pattern suggests that a single somatic alteration event may be sufficient to promote tumorigenesis. We pushed the assumption further to state that disruption of a single pathway could lead to differential expression of a large set of genes, which is supported by our work on Boolean matrix factorization. Then we proposed the OR-gate network (ORN) to model the causal mechanism from somatic alterations to transcriptomics. Results showed that it is able to recover the heterogeneity among cancer samples and functional modules responsible for certain dysregulation in cancer transcriptomics.
Still, ORN has two major limitations. One is the issue of co-amplification. ORN cannot distinguish passengers in the same copy number variation hotspot as the drivers. To this end, we applied the word2vec model to extract gene embedding from biomedical literature. Another issue is the transcriptional regulation module may not be accurate. To this end, we developed a novel algorithm (peak2vec) to uncover transcriptional motif patterns and coregulation from the chromatic accessibility profiles.
In the future, we will integrate gene embedding and peak2vec into the ORN framework to better understand the causal impact of somatic alteration as functional modules.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Liang, Lifanlil115@pitt.edulil1150000-0002-2495-4779
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairLu,
Thesis AdvisorLu,
Committee MemberGeorge,
Committee MemberGreg,
Date: 17 December 2021
Date Type: Publication
Defense Date: 31 September 2021
Approval Date: 17 December 2021
Submission Date: 13 October 2021
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 165
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Biomedical Informatics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: somatic genomic alteration; multi-omics analysis; high-throughput technology; word embedding; ATAC-seq; RNA-seq; matrix factorization; deep learning
Date Deposited: 17 Dec 2021 14:09
Last Modified: 17 Dec 2022 06:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item