Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Novel statistical methods in analyzing single cell sequencing data

Sun, Zhe (2019) Novel statistical methods in analyzing single cell sequencing data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Submitted Version

Download (13MB) | Preview


Understanding biological systems requires the knowledge of their individual components. Single cell RNA sequencing (scRNA-Seq) becomes a revolutionary tool to investigate cell-to-cell transcriptomic heterogeneity, which cannot be obtained in population-averaged measurements such as the bulk RNA-Seq. This dissertation focuses on developing novel statistical methods for analyzing droplet-based single cell data, which includes clustering methods to identify cell types from single or multiple individuals, and a joint clustering approach to analyze paired data from Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-Seq), a new state-of-art technology that allows the detection of cell surface proteins and transcriptome profiling within the same cell simultaneously.
In the first part of this dissertation, I developed DIMM-SC, a Dirichlet mixture model which explicitly models the raw UMI count for clustering droplet-based scRNA-Seq data and produces cluster membership with uncertainties. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other clustering methods.
In the second part, I developed BAMM-SC, a novel Bayesian hierarchical Dirichlet mixture model to cluster droplet-based scRNA-Seq data from population studies. BAMM-SC takes raw count data as input and accounts for data heterogeneity and batch effect among multiple individuals in a unified Bayesian hierarchical model framework. Extensive simulation studies and applications to multiple in house scRNA-Seq datasets demonstrated that BAMM-SC outperformed existing clustering methods with improved clustering accuracy.
In the third part, I developed BREM-SC, a novel random effects model that jointly cluster the paired data from CITE-Seq simultaneously. Simulations and analysis of in-house real data sets were performed, which successfully demonstrated the validity and advantages of our method in understanding the heterogeneity and dynamics of various cell populations.

Contribution to public health:
Recent droplet-based single cell sequencing technology and its extensions have brought revolutionary insights to the understanding of cell heterogeneity and molecular processes at single cell resolution. I believe the proposed statistical approaches in this dissertation for single cell data will help us fully understand cell identity and function. This will promote the innovation for the traditional public health and medical research.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Sun, Zhezhs31@pitt.eduzhs31
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairDing,
Committee CoChairChen,
Committee MemberPark,
Committee MemberChen,
Committee MemberHu,
Date: 26 September 2019
Date Type: Publication
Defense Date: 25 July 2019
Approval Date: 26 September 2019
Submission Date: 23 July 2019
Access Restriction: 2 year -- Restrict access to University of Pittsburgh for a period of 2 years.
Number of Pages: 136
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Bioinformatics
Date Deposited: 26 Sep 2019 16:30
Last Modified: 01 Sep 2021 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item