Variable Selection when Confronted with Missing Data

Ziegler, Melissa Lynn (2006) Variable Selection when Confronted with Missing Data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Primary Text
Download (825kB) | Preview

Abstract

Variable selection is a common problem in linear regression.Stepwise methods, such as forward selection, are popular and areeasily available in most statistical packages. The models selectedby these methods have a number of drawbacks: they are oftenunstable, with changes in the set of variable selected due to smallchanges in the data, and they provide upwardly biased regressioncoefficient estimates. Recently proposed methods, such as the lasso,provide accurate predictions via a parsimonious, interpretablemodel.Missing data values are also a common problem, especially inlongitudinal studies. One approach to account for missing data ismultiple imputation. The simulation studies were conducted comparingthe lasso to standard variable selection methods under differentmissing data conditions, including the percentage of missing valuesand the missing data mechanism. Under missing at random mechanisms,missing data were created at the 25 and 50 percent levels with twotypes of regression parameters, one containing large effects and onecontaining several small, but nonzero, effects. Five correlationstructures were used in generating the data: independent,autoregressive with correlation 0.25 and 0.50, and equicorrelatedagain with correlation 0.25 and 0.50. Three different missing datamechanisms were used to create the missing data: linear, convex andsinister. These mechanismsLeast angle regression performed well under all conditions when thetrue regression parameter vector contained large effects, with itsdominance increasing as the correlation between the predictorvariables increased. This is consistent with complete datasimulations studies suggesting the lasso performed poorly insituations where the true beta vector contained small, nonzeroeffects. When the true beta vector contained small, nonzero effects,the performance of the variable selection methods considered wassituation dependent.Ordinary least squares had superior performance in terms confidenceinterval coverage under the independent correlation structure andwith correlated data when the true regression parameter vectorconsists of small, nonzero effects. A variety of methods performedwell when the regression parameter vector consisted of large effectsand the predictor variables were correlated depending on the missingdata situation.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Ziegler, Melissa Lynn	melissa.ziegler@usa.dupont.com

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Iyengar, Satish	si@stat.pitt.edu	SSI
Committee Member	WIlliamson, Douglas E	williamsonde@upmc.edu
Committee Member	Block, Henry	hwb@stat.pitt.edu	HWB
Committee Member	Gleser, Leon J	ljg@stat.pitt.edu	GLESER

Date:

2 October 2006

Date Type:

Completion

Defense Date:

2 June 2006

Approval Date:

2 October 2006

Submission Date:

16 August 2006

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Institution:

University of Pittsburgh

Schools and Programs:

Dietrich School of Arts and Sciences > Statistics

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

MICE; missing data; multiple imputation; lasso; variable selection

Other ID:

http://etd.library.pitt.edu/ETD/available/etd-08162006-001054/, etd-08162006-001054

Date Deposited:

10 Nov 2011 19:59

Last Modified:

15 Nov 2016 13:49

URI:

http://d-scholarship.pitt.edu/id/eprint/9122

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Variable Selection when Confronted with Missing Data

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds