Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Generative Models of Biological Variations in Bulk and Single-cell RNA-seq

Mao, Weiguang (2020) Generative Models of Biological Variations in Bulk and Single-cell RNA-seq. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (7MB) | Preview


The explosive growth of next-generation sequencing data enhances our ability to understand biological process at an unprecedented resolution. Meanwhile organizing and utilizing this tremendous amount of data becomes a big challenge. High-throughput technology provides us a snapshot of all underlying biological activities, but this kind of extremely high-dimensional data is hard to interpret. Due to the curse of dimensionality, the measurement is sparse and far from enough to shape the actual manifold in the high-dimensional space. On the other hand, the measurements may contain structured noise such as technical or nuisance biological variation which can interfere downstream interpretation. Generative modeling is a powerful tool to make sense of the data and generate compact representations summarizing the embedded biological information. This thesis introduces three generative models that help amplifying biological signals buried in the noisy bulk and single-cell RNA-seq data.

In Chapter 2, we propose a semi-supervised deconvolution framework called PLIER which can identify regulations in cell-type proportions and specific pathways that control gene expression. PLIER has inspired the development of MultiPLIER and has been used to infer context-specific genotype effects in the brain.

In Chapter 3, we construct a supervised transformation named DataRemix to normalize bulk gene expression profiles in order to maximize the biological findings with respect to a variety of downstream tasks. By reweighing the contribution of hidden factors, we are able to reveal the hidden biological signals without any external dataset-specific knowledge. We apply DataRemix to the ROSMAP dataset and report the first replicable trans-eQTL effect in human brain.

In Chapter 4, we focus on scRNA-seq and introduce NIFA which is an unsupervised decomposition framework that combines the desired properties of PCA, ICA and NMF. It simultaneously models uni- and multi-modal factors isolating discrete cell-type identity and continuous pathway-level variations into separate components.

The work presented in Chapter 2 has been published as a journal article. The work in Chapter 3 and Chapter 4 are under submission and they are available as preprints on bioRxiv.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Mao, Weiguangmwg10.thu@gmail.comwem260000-0002-5288-4309
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorChikina, Maria
Committee ChairKostka, Dennis
Committee MemberChennubhotla, Chakra
Committee MemberGreene, Casey
Committee MemberMa, Jian
Date: 18 May 2020
Date Type: Publication
Defense Date: 21 February 2020
Approval Date: 18 May 2020
Submission Date: 4 May 2020
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 145
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Computational and Systems Biology
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Generative Models Bulk RNA-seq Single-cell RNA-seq
Date Deposited: 19 May 2020 01:56
Last Modified: 19 May 2020 01:56


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item