Statistical Learning and Analysis of Single-Cell Multi-Omics Data

Wang, Xinjun (2022) Statistical Learning and Analysis of Single-Cell Multi-Omics Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Download (6MB) | Preview

Abstract

Droplet-based single-cell transcriptome sequencing (scRNA-seq) technology can measure the gene expression from tens of thousands of single cells simultaneously. More recently, single-cell bi-modal assays such as Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) and 10x Multiome and tri-modal assays such as transcription, epitopes, and chromatin accessibility by sequencing (TEA-seq) and DOGMA-seq have been developed, which allow for immunophenotyping of single cells and/or analyzing chromatin accessibility together with transcriptome profiling in the same cell. Although these single-cell technologies have gained much popularity, novel methods for analyzing these new types of single-cell multi-omics data are in urgent need.

In the first part, a model-based data-driven approach, namely BREM-SC, is proposed for joint clustering single-cell multi-omics data. Specifically, cell-specific random effects are introduced in the model to integrate multiple data modalities, and MCMC is utilized for optimization. Both simulation studies and real data applications have shown outstanding performance of BREM-SC. In addition, a new algorithm with GPU acceleration has been developed to speed up BREM-SC.

In the second part, a model-based biology-driven approach, namely SECANT, is proposed to analyze CITE-seq data or jointly analyze paired CITE-seq and scRNA-seq data, where we consider surface protein data provide general guidance for cell clustering with RNA data and build a novel statistical model under semi-supervised learning framework. The performance of SECANT is demonstrated through both simulation studies and real data applications.

In the third part, a new computational pipeline under supervised learning setting, which utilizes k*-Nearest Neighbors, is proposed for identifying cell doublets in single-cell multi-omics experiments. The superiority of the proposed pipeline over existing methods is demonstrated through the analysis of multiple tri-modal in-house DOGMA-seq data with approximate ground truth label generated from additional cell hashing experiment.

Public Health Significance: The development of cutting-edge single-cell bi-modal and tri-modal multi-omics methods allows a more complete understanding of the complex gene regulatory networks compared to scRNA-seq. In this dissertation, I propose three novel statistical methods for analyzing single-cell multi-omics data, with different goals and approaches, which would be useful for biological and clinical researchers to understand cell identity and function in complex tissue types.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Wang, Xinjun	xiw119@pitt.edu	xiw119

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Chen, Wei	wei.chen@chp.edu
Committee CoChair	Ding, Ying	yingding@pitt.edu	yingding
Committee Member	Wang, Jiebiao	jbwang@pitt.edu	jbwang
Committee Member	Forno, Erick	erick.forno@chp.edu

Date:

1 July 2022

Date Type:

Publication

Defense Date:

3 June 2022

Approval Date:

1 July 2022

Submission Date:

23 June 2022

Access Restriction:

1 year -- Restrict access to University of Pittsburgh for a period of 1 year.

Number of Pages:

185

Institution:

University of Pittsburgh

Schools and Programs:

School of Public Health > Biostatistics

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Single Cell, Multi-omics, Statistical Learning

Date Deposited:

01 Jul 2022 18:36

Last Modified:

01 Jul 2023 05:15

URI:

http://d-scholarship.pitt.edu/id/eprint/43208

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Statistical Learning and Analysis of Single-Cell Multi-Omics Data

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds