Kang, Rui
(2024)
Sparse Heteroskedastic PCA in High Dimensions.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Principal component analysis (PCA) is one of the most commonly used techniques for dimension reduction and feature extraction. Though it has been well-studied for high-dimensional sparse PCA, little is known when the noise is heteroskedastic, which turns out to be ubiquitous in many scenarios. We propose an iterative algorithm, called SparseHPCA, for the sparse PCA problem in the presence of heteroskedastic noise, which alternatively updates the estimates of the sparse eigenvectors using orthogonal iteration with adaptive thresholdings in one step, and imputes the diagonal values of the sample covariance matrix to reduce the estimation bias due to heteroskedastic noise in the other step. Our procedure is computationally fast and provably optimal under the generalized spiked covariance model, assuming the leading eigenvectors are sparse. A comprehensive simulation study shows its robustness and effectiveness under various settings. The application of our new method to two high-dimensional genomics datasets, i.e., microarray and single-cell RNA-sequencing (scRNA-seq) data, demonstrates its ability to preserve inherent cluster structures in downstream analyses. Additionally, we extend SparseHPCA to address the sparse singular value decomposition (sparse SVD) problem in the presence of heteroskedastic noise, further showcasing its versatility.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
18 December 2024 |
Date Type: |
Publication |
Defense Date: |
20 September 2024 |
Approval Date: |
18 December 2024 |
Submission Date: |
16 September 2024 |
Access Restriction: |
2 year -- Restrict access to University of Pittsburgh for a period of 2 years. |
Number of Pages: |
105 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Statistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Principal component analysis (PCA); High-dimensional statistics; Heteroskedastic data; Minimax optimality; Sparse SVD; Dimensionality reduction; Eigenspace estimation; Generalized spiked covariance model; Adaptive estimation |
Date Deposited: |
18 Dec 2024 20:44 |
Last Modified: |
18 Dec 2024 20:44 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/46944 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |