Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Robust Partial Least Squares Regression and Outlier Detection Using Minimum Covariance Determinant Method and A Resampling Method

Singhabahu, Dilrukshika M (2013) Robust Partial Least Squares Regression and Outlier Detection Using Minimum Covariance Determinant Method and A Resampling Method. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (407kB) | Preview


Partial Least Squares Regression (PLSR) is often used for high dimensional data analysis where the sample size is limited, the number of variables is large, and the variables are collinear. Like other types of regression, PLSR is influenced by outliers and/or influential observations. Since PLSR is based on the covariance matrix of the outcome and the predictor variables, this is a natural starting point for the development of techniques that can be used to identify outliers and to provide stable estimates in the presence of outliers. We focus on the use of the minimum covariance determinant (MCD) method for robust estimation of the covariance matrix when n >> p and modify this method for application to a magnetic resonance imaging (MRI) data set. We extend this approach by applying the MCD to generate robust Mahalanobis squared distances (RMSD) in the Y vector and the X matrix separately and then identify the outliers based on the RMSD. We then remove these observations from the data set and apply PLSR to the remaining data. This approach is applied iteratively until no new outliers are detected. Simulation studies demonstrate that the PLSR results are improved when using this approach.
Another approach to outlier detection is explored for the setting where n < p. This approach, resampling by half-means (RHM), was introduced in 1998 by William Egan and Stephen Morgan. We adapt this method for use in MRI data to detect outliers and then to develop a robust PLSR model. This method can be used for small or large datasets overcoming the limitation of the leading multivariate outlier detection methods such as the MSD method that cannot be used for small sample sizes (n < p).
The two methods proposed improve the accuracy of predictions on brain imaging data (MRI in our example). Thus the public health significance is increasing the accuracy in brain imaging diagnosis and predictions.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Singhabahu, Dilrukshika
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairWeissfeld, Lisa Alweis@pitt.eduLWEIS
Committee MemberAizenstein, Howard Jaizensteinhj@upmc.eduAIZEN
Committee MemberChang, Chung-Chou Hochangj@pitt.eduCHANGJ
Committee MemberLin, Yanyal14@pitt.eduYAL14
Date: 30 September 2013
Date Type: Publication
Defense Date: 5 June 2013
Approval Date: 30 September 2013
Submission Date: 9 July 2013
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Number of Pages: 56
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Partial Least Squares Regression, Multivariate Outlier Detection, Minimum Covariance Determinant
Date Deposited: 30 Sep 2013 14:12
Last Modified: 01 Sep 2018 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item