Zhou, Xueping
(2023)
Biological Network Guided Variable Selection and Outcome Prediction for High-Dimensional Multi-Omics Data.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Developing efficient feature selection and accurate outcome prediction algorithms is a major and often difficult task in analyzing high-dimensional data. This dissertation focuses on grouping and correlation structure guided feature selection and outcome prediction for data with potentially high-dimensional predictors with applications in multi-omics data.
In Chapter 2, we propose a novel feature selection and prediction algorithm for binary outcomes, which is called local-network guided logistic regression. Our method adopts a multivariate screening procedure by integrating the inter-feature correlations into the feature selection process. In Chapter 3, we propose a novel supervised learning algorithm to perform feature selection and multivariate outcome prediction for data with potentially high-dimensional predictors and responses. The method incorporates known genome hierarchy grouping and correlation structures into feature selection, regression coefficient estimation, and outcome prediction under a penalized multivariate multiple linear regression model. In Chapter 4, within the framework of Fisher's discriminant analysis, we propose a binary classification method which incorporates variable selection for high-dimensional predictors. The proposed method can select both the differential expressed genes and the differential connected genes between different biological states. In Chapter 5, we propose a novel feature selection and prediction pipeline for binary outcomes with application in age-related macular degeneration, which is a progressive neurodegenerative disease and the leading cause of blindness in developed countries.
Contribution to public health: The dissertation proposes several methods for feature selection and outcome prediction using high-dimensional omics data. 1) Weak signal detection for outcome classification. 2) Genome grouping structure and correlation guided feature selection for multivariate outcome prediction. 3) Differential connected feature detection for outcome classification. All of the proposed methods in this dissertation are valuable tools to select outcome related features which help to uncover health-related mechanisms as well as predicting disease status or health-related outcomes of interest.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
24 August 2023 |
Date Type: |
Publication |
Defense Date: |
27 July 2023 |
Approval Date: |
24 August 2023 |
Submission Date: |
14 June 2023 |
Access Restriction: |
1 year -- Restrict access to University of Pittsburgh for a period of 1 year. |
Number of Pages: |
133 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Association study; Biomarker discovery; Cell-type deconvolution; DNA methylation; Feature selection; Grouping structure; High-dimensional data; Multivariate regression; Prediction, Weak signal detection |
Date Deposited: |
24 Aug 2023 13:22 |
Last Modified: |
24 Aug 2024 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/44990 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |