Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Sparse Heteroskedastic PCA in High Dimensions

Kang, Rui (2024) Sparse Heteroskedastic PCA in High Dimensions. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img] PDF
Primary Text
Restricted to University of Pittsburgh users only until 18 December 2026.

Download (2MB) | Request a Copy

Abstract

Principal component analysis (PCA) is one of the most commonly used techniques for dimension reduction and feature extraction. Though it has been well-studied for high-dimensional sparse PCA, little is known when the noise is heteroskedastic, which turns out to be ubiquitous in many scenarios. We propose an iterative algorithm, called SparseHPCA, for the sparse PCA problem in the presence of heteroskedastic noise, which alternatively updates the estimates of the sparse eigenvectors using orthogonal iteration with adaptive thresholdings in one step, and imputes the diagonal values of the sample covariance matrix to reduce the estimation bias due to heteroskedastic noise in the other step. Our procedure is computationally fast and provably optimal under the generalized spiked covariance model, assuming the leading eigenvectors are sparse. A comprehensive simulation study shows its robustness and effectiveness under various settings. The application of our new method to two high-dimensional genomics datasets, i.e., microarray and single-cell RNA-sequencing (scRNA-seq) data, demonstrates its ability to preserve inherent cluster structures in downstream analyses. Additionally, we extend SparseHPCA to address the sparse singular value decomposition (sparse SVD) problem in the presence of heteroskedastic noise, further showcasing its versatility.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Kang, Ruiruk18@pitt.eduruk18
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairRen, Zhaozren@pitt.eduZREN
Committee MemberMcKennan, Christopher Gordonchm195@pitt.educhm195
Committee MemberIyengar, Satishssi@pitt.edussi
Committee MemberWei, Chenwei.chen@pitt.eduwei.chen
Date: 18 December 2024
Date Type: Publication
Defense Date: 20 September 2024
Approval Date: 18 December 2024
Submission Date: 16 September 2024
Access Restriction: 2 year -- Restrict access to University of Pittsburgh for a period of 2 years.
Number of Pages: 105
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Statistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Principal component analysis (PCA); High-dimensional statistics; Heteroskedastic data; Minimax optimality; Sparse SVD; Dimensionality reduction; Eigenspace estimation; Generalized spiked covariance model; Adaptive estimation
Date Deposited: 18 Dec 2024 20:44
Last Modified: 18 Dec 2024 20:44
URI: http://d-scholarship.pitt.edu/id/eprint/46944

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item