# optimal procedures in high-dimensional variable selection

Zhang, Qi (2013) optimal procedures in high-dimensional variable selection. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

 Preview
PDF
Primary Text

## Abstract

Motivated by the recent trend in Big data", we are interested in the case where both $p$, the number of variables, and $n$, the number of subjects are large, and probably $p \gg n$. When $p \gg n$, the signals are usually rare and weak, and the observation units are correlated in a complicated way. When the signals are rare and weak, it may be hard to recover them individually. In this thesis, we are interested in the problem of recovering the rare and weak signals with the assistance of correlation structure of the data.

We consider the helps from two types of correlation structures, the correlation structure of the observed units, and the dependency among the unobserved factors. In Chapter \ref{chapter:gs}, in a setting of high dimensional linear regression, we study the variable selection problem when the observed predictors are correlated. In Chapter \ref{chapter:gmas}, we consider recovering the sparse mean vector of a Stein's normal means model, where the elements of the unobserved mean vector are dependent through an Ising model. In each chapter, we study the optimality in variable selection, discover the non-optimality of the conventional methods such as the lasso, subset selection and hard thresholding, and propose {\it Screen and Clean} type of variable selection procedures which are optimal in terms of the Hamming distance. The theoretical findings is supported by the simulation results and applications.

## Share

Citation/Export: Select format... Citation - Text Citation - HTML Endnote BibTex Dublin Core OpenURL MARC (ISO 2709) METS MODS EP3 XML Reference Manager Refer

## Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
Zhang, Qikarlmzhang@hotmail.com
ETD Committee:
Committee ChairGleser, Leongleser@pitt.eduGLESER
Committee CoChairJin, Jiashunjiashun@stat.cmu.edu
Committee MemberCheng, Yuyucheng@pitt.eduYUCHENG
Committee MemberKrafty, Robertkrafty@pitt.edu KRAFTY
Date: 3 July 2013
Date Type: Publication
Defense Date: 12 April 2013
Approval Date: 3 July 2013
Submission Date: 17 April 2013
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 104
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Statistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Asymptotic minimaxity, Hamming distance, predictor correlation, signal dependency, Graphlet Screening, Graphical Model Assisted Selection, phase diagram, Rare and Weak signal model, Screen and Clean, structured sparsity, Ising model, sparse graphical model
Date Deposited: 03 Jul 2013 14:07