Sun, Zhe
(2019)
Novel statistical methods in analyzing single cell sequencing data.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Understanding biological systems requires the knowledge of their individual components. Single cell RNA sequencing (scRNA-Seq) becomes a revolutionary tool to investigate cell-to-cell transcriptomic heterogeneity, which cannot be obtained in population-averaged measurements such as the bulk RNA-Seq. This dissertation focuses on developing novel statistical methods for analyzing droplet-based single cell data, which includes clustering methods to identify cell types from single or multiple individuals, and a joint clustering approach to analyze paired data from Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-Seq), a new state-of-art technology that allows the detection of cell surface proteins and transcriptome profiling within the same cell simultaneously.
In the first part of this dissertation, I developed DIMM-SC, a Dirichlet mixture model which explicitly models the raw UMI count for clustering droplet-based scRNA-Seq data and produces cluster membership with uncertainties. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other clustering methods.
In the second part, I developed BAMM-SC, a novel Bayesian hierarchical Dirichlet mixture model to cluster droplet-based scRNA-Seq data from population studies. BAMM-SC takes raw count data as input and accounts for data heterogeneity and batch effect among multiple individuals in a unified Bayesian hierarchical model framework. Extensive simulation studies and applications to multiple in house scRNA-Seq datasets demonstrated that BAMM-SC outperformed existing clustering methods with improved clustering accuracy.
In the third part, I developed BREM-SC, a novel random effects model that jointly cluster the paired data from CITE-Seq simultaneously. Simulations and analysis of in-house real data sets were performed, which successfully demonstrated the validity and advantages of our method in understanding the heterogeneity and dynamics of various cell populations.
Contribution to public health:
Recent droplet-based single cell sequencing technology and its extensions have brought revolutionary insights to the understanding of cell heterogeneity and molecular processes at single cell resolution. I believe the proposed statistical approaches in this dissertation for single cell data will help us fully understand cell identity and function. This will promote the innovation for the traditional public health and medical research.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
26 September 2019 |
Date Type: |
Publication |
Defense Date: |
25 July 2019 |
Approval Date: |
26 September 2019 |
Submission Date: |
23 July 2019 |
Access Restriction: |
2 year -- Restrict access to University of Pittsburgh for a period of 2 years. |
Number of Pages: |
136 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Bioinformatics |
Date Deposited: |
26 Sep 2019 16:30 |
Last Modified: |
01 Sep 2021 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/37171 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |