Enhancements of Sparse Clustering with Resampling and Considerations on Tuning Parameter

Bi, Wenzhu (2012) Enhancements of Sparse Clustering with Resampling and Considerations on Tuning Parameter. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Preview

PDF
Primary Text
Download (1MB) | Preview

Abstract

Clustering methods are widely used to explore subgroupings in data when the true group membership is unknown. These techniques are very useful when identifying potential subpopulations of interest in the medical and public health setting. Examples of these types of subpopulations include subjects who have certain gene expression profiles related to a cancer subtype, and subjects who are in the very early, asymptomatic phase, of a chronic illness. All of these examples are of great public health relevance.
Many of the datasets of interest arise from the development of new technologies and are subject to the common problem where p, the number of variables, is significantly larger than the sample size, n. The relatively small sample size, n, may result from the difficulties of subject recruitment and/or the financial burden of the actual data collection in fields such as imaging and genetic analysis. The earlier approaches to clustering treat all of the variables equally, which may not work well when not all of them are relevant to the subgroupings. Clustering methods with variable selection, also called sparse clustering, have been recently developed to deal with this problem. We propose a method to add resampling onto sparse clustering to improve upon the current clustering methodology. The addition of resampling methods to sparse clustering results in variable selection that is more accurate. The method is also used to assign an “observed proportion of cluster membership” to each observation, providing a new metric by which to measure membership certainty. The performance of the method is studied via simulation and illustrated in the motivating data example.
We also propose an alternative approach for the choice of tuning parameter based on an adjusted Bayesian Information Criterion (BIC). Variable selection in sparse clustering is realized by applying Lasso or related penalties and the tuning parameter for these penalties has to be determined beforehand. The gap statistic, a distance-based approach, is used to choose the tuning parameter through permutation and it may behave poorly at times. The proposed BIC approach is an alternative developed under the more sophisticated model-based likelihood framework. Its performance is evaluated with simulations.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Bi, Wenzhu	web10@pitt.edu	WEB10

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Weissfeld, Lisa A.	lweis@pitt.edu	LWEIS
Committee Member	Tseng, George C.	ctseng@pitt.edu	CTSENG
Committee Member	Lin, Yan	yal14@pitt.edu	YAL14
Committee Member	Price, Julie C.	pricjc@UPMC.EDU

Date:

29 June 2012

Date Type:

Completion

Defense Date:

9 April 2012

Approval Date:

29 June 2012

Submission Date:

2 April 2012

Access Restriction:

5 year -- Restrict access to University of Pittsburgh for a period of 5 years.

Number of Pages:

Institution:

University of Pittsburgh

Schools and Programs:

School of Public Health > Biostatistics

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Sparse clustering, Resampling, High-dimension but small sample size, Imaging, Microarray, Tuning Parameter

Date Deposited:

29 Jun 2012 18:15

Last Modified:

29 Jun 2017 05:15

URI:

http://d-scholarship.pitt.edu/id/eprint/11885

Available Versions of this Item

Enhancements of Sparse Clustering with Resampling and Considerations on Tuning Parameter. (deposited 29 Jun 2012 18:15) [Currently Displayed]

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Enhancements of Sparse Clustering with Resampling and Considerations on Tuning Parameter

Abstract

Share

Details

Available Versions of this Item

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds