Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Statistical Learning and Analysis of Single-Cell Multi-Omics Data

Wang, Xinjun (2022) Statistical Learning and Analysis of Single-Cell Multi-Omics Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (6MB) | Preview


Droplet-based single-cell transcriptome sequencing (scRNA-seq) technology can measure the gene expression from tens of thousands of single cells simultaneously. More recently, single-cell bi-modal assays such as Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) and 10x Multiome and tri-modal assays such as transcription, epitopes, and chromatin accessibility by sequencing (TEA-seq) and DOGMA-seq have been developed, which allow for immunophenotyping of single cells and/or analyzing chromatin accessibility together with transcriptome profiling in the same cell. Although these single-cell technologies have gained much popularity, novel methods for analyzing these new types of single-cell multi-omics data are in urgent need.

In the first part, a model-based data-driven approach, namely BREM-SC, is proposed for joint clustering single-cell multi-omics data. Specifically, cell-specific random effects are introduced in the model to integrate multiple data modalities, and MCMC is utilized for optimization. Both simulation studies and real data applications have shown outstanding performance of BREM-SC. In addition, a new algorithm with GPU acceleration has been developed to speed up BREM-SC.

In the second part, a model-based biology-driven approach, namely SECANT, is proposed to analyze CITE-seq data or jointly analyze paired CITE-seq and scRNA-seq data, where we consider surface protein data provide general guidance for cell clustering with RNA data and build a novel statistical model under semi-supervised learning framework. The performance of SECANT is demonstrated through both simulation studies and real data applications.

In the third part, a new computational pipeline under supervised learning setting, which utilizes k*-Nearest Neighbors, is proposed for identifying cell doublets in single-cell multi-omics experiments. The superiority of the proposed pipeline over existing methods is demonstrated through the analysis of multiple tri-modal in-house DOGMA-seq data with approximate ground truth label generated from additional cell hashing experiment.

Public Health Significance: The development of cutting-edge single-cell bi-modal and tri-modal multi-omics methods allows a more complete understanding of the complex gene regulatory networks compared to scRNA-seq. In this dissertation, I propose three novel statistical methods for analyzing single-cell multi-omics data, with different goals and approaches, which would be useful for biological and clinical researchers to understand cell identity and function in complex tissue types.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Wang, Xinjunxiw119@pitt.eduxiw119
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairChen,
Committee CoChairDing, Yingyingding@pitt.eduyingding
Committee MemberWang, Jiebiaojbwang@pitt.edujbwang
Committee MemberForno,
Date: 1 July 2022
Date Type: Publication
Defense Date: 3 June 2022
Approval Date: 1 July 2022
Submission Date: 23 June 2022
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 185
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Single Cell, Multi-omics, Statistical Learning
Date Deposited: 01 Jul 2022 18:36
Last Modified: 01 Jul 2023 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item