Singhabahu, Dilrukshika M
(2013)
Robust Partial Least Squares Regression and Outlier Detection Using Minimum Covariance Determinant Method and A Resampling Method.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Partial Least Squares Regression (PLSR) is often used for high dimensional data analysis where the sample size is limited, the number of variables is large, and the variables are collinear. Like other types of regression, PLSR is influenced by outliers and/or influential observations. Since PLSR is based on the covariance matrix of the outcome and the predictor variables, this is a natural starting point for the development of techniques that can be used to identify outliers and to provide stable estimates in the presence of outliers. We focus on the use of the minimum covariance determinant (MCD) method for robust estimation of the covariance matrix when n >> p and modify this method for application to a magnetic resonance imaging (MRI) data set. We extend this approach by applying the MCD to generate robust Mahalanobis squared distances (RMSD) in the Y vector and the X matrix separately and then identify the outliers based on the RMSD. We then remove these observations from the data set and apply PLSR to the remaining data. This approach is applied iteratively until no new outliers are detected. Simulation studies demonstrate that the PLSR results are improved when using this approach.
Another approach to outlier detection is explored for the setting where n < p. This approach, resampling by half-means (RHM), was introduced in 1998 by William Egan and Stephen Morgan. We adapt this method for use in MRI data to detect outliers and then to develop a robust PLSR model. This method can be used for small or large datasets overcoming the limitation of the leading multivariate outlier detection methods such as the MSD method that cannot be used for small sample sizes (n < p).
The two methods proposed improve the accuracy of predictions on brain imaging data (MRI in our example). Thus the public health significance is increasing the accuracy in brain imaging diagnosis and predictions.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
30 September 2013 |
Date Type: |
Publication |
Defense Date: |
5 June 2013 |
Approval Date: |
30 September 2013 |
Submission Date: |
9 July 2013 |
Access Restriction: |
5 year -- Restrict access to University of Pittsburgh for a period of 5 years. |
Number of Pages: |
56 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Partial Least Squares Regression, Multivariate Outlier Detection, Minimum Covariance Determinant |
Date Deposited: |
30 Sep 2013 14:12 |
Last Modified: |
01 Sep 2018 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/19282 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |