Using the dimension reduction method FAMD in the data pre-processing step for risk prediction and for unsupervised clustering

Ran, Xinhui (2019) Using the dimension reduction method FAMD in the data pre-processing step for risk prediction and for unsupervised clustering. Master's Thesis, University of Pittsburgh. (Unpublished)

Preview

PDF
Submitted Version
Download (1MB) | Preview

Abstract

High-dimensional data generated from various resources including the electronic health records (EHRs), Medicare, and Medicaid, are used in multiple research areas such as public health and medical research. However, working with high-dimensional data is a no easy task because of methodological challenges. Dimensionality reduction technique has been used to transform high-dimensional data into a lower dimensional space while preserving meaningful characteristics of the original data. Principal component Analysis (PCA) is the most widely used method for dimension reduction. However, it has its limitation on linearity assumption and is unsuitable for data containing both numeric and categorical types. Factor analysis of mixed data (FAMD) is a dimension reduction method that can be used for data with mixed types of variables. Dimension reduction is often used as a data pre-processing step prior to further analyses. However, this approach should be used with caution as it depends on the purpose of the application. In this thesis, I demonstrate that using the dimension reduction method FAMD in the data pre-processing step for risk prediction can achieve comparable prediction performance as the traditional variable selection procedure; however, when classifying individuals into similar groups using the unsupervising clustering techniques, the clustering results of using principal components generated from FAMD are substantially different from those of using the original variables.
PUBLIC HEALTH SIGNIFICANCE: High-dimensional data often present challenges in building a risk prediction model or in classifying individuals into groups with more homogeneous characteristics. Dimension reduction techniques, such as incorporating dimension reduction tools, can be incorporated in the data pre-processing step for high-dimensional data collected from public health or medical records. The results of the thesis show that using dimension reduction method (e.g., FAMD for mixed variable types) as a data pre-processing step should be used with caution.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Ran, Xinhui	xir7@pitt.edu	xir7

ETD Committee:

Title	Member	Email Address
Committee Chair	Chang, Chung-Chou	changj@pitt.edu
Committee Member	Yabes, Jonathan	jgy2@pitt.edu
Committee Member	Mayr, Florian	mayrfb@upmc.edu

Date:

25 June 2019

Date Type:

Publication

Defense Date:

12 April 2019

Approval Date:

25 June 2019

Submission Date:

19 April 2019

Access Restriction:

3 year -- Restrict access to University of Pittsburgh for a period of 3 years.

Number of Pages:

Institution:

University of Pittsburgh

Schools and Programs:

School of Public Health > Biostatistics

Degree:

MS - Master of Science

Thesis Type:

Master's Thesis

Refereed:

Yes

Uncontrolled Keywords:

Dimension reduction, high dimensional analysis

Date Deposited:

25 Jun 2019 17:24

Last Modified:

01 May 2022 05:15

URI:

http://d-scholarship.pitt.edu/id/eprint/36564

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Using the dimension reduction method FAMD in the data pre-processing step for risk prediction and for unsupervised clustering

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds