Statistical Learning and Analysis of Single-Cell Multi-Omics DataWang, Xinjun (2022) Statistical Learning and Analysis of Single-Cell Multi-Omics Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)
AbstractDroplet-based single-cell transcriptome sequencing (scRNA-seq) technology can measure the gene expression from tens of thousands of single cells simultaneously. More recently, single-cell bi-modal assays such as Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) and 10x Multiome and tri-modal assays such as transcription, epitopes, and chromatin accessibility by sequencing (TEA-seq) and DOGMA-seq have been developed, which allow for immunophenotyping of single cells and/or analyzing chromatin accessibility together with transcriptome profiling in the same cell. Although these single-cell technologies have gained much popularity, novel methods for analyzing these new types of single-cell multi-omics data are in urgent need. In the first part, a model-based data-driven approach, namely BREM-SC, is proposed for joint clustering single-cell multi-omics data. Specifically, cell-specific random effects are introduced in the model to integrate multiple data modalities, and MCMC is utilized for optimization. Both simulation studies and real data applications have shown outstanding performance of BREM-SC. In addition, a new algorithm with GPU acceleration has been developed to speed up BREM-SC. In the second part, a model-based biology-driven approach, namely SECANT, is proposed to analyze CITE-seq data or jointly analyze paired CITE-seq and scRNA-seq data, where we consider surface protein data provide general guidance for cell clustering with RNA data and build a novel statistical model under semi-supervised learning framework. The performance of SECANT is demonstrated through both simulation studies and real data applications. In the third part, a new computational pipeline under supervised learning setting, which utilizes k*-Nearest Neighbors, is proposed for identifying cell doublets in single-cell multi-omics experiments. The superiority of the proposed pipeline over existing methods is demonstrated through the analysis of multiple tri-modal in-house DOGMA-seq data with approximate ground truth label generated from additional cell hashing experiment. Public Health Significance: The development of cutting-edge single-cell bi-modal and tri-modal multi-omics methods allows a more complete understanding of the complex gene regulatory networks compared to scRNA-seq. In this dissertation, I propose three novel statistical methods for analyzing single-cell multi-omics data, with different goals and approaches, which would be useful for biological and clinical researchers to understand cell identity and function in complex tissue types. Share
Details
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|