Zhang, Qi
(2013)
optimal procedures in high-dimensional variable selection.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Motivated by the recent trend in ``Big data", we are interested in the case where both $p$, the number of variables, and $n$, the number of subjects are large, and probably $p \gg n$. When $p \gg n$, the signals are usually rare and weak, and the observation units are correlated in a complicated way. When the signals are rare and weak, it may be hard to recover them individually. In this thesis, we are interested in the problem of recovering the rare and weak signals with the assistance of correlation structure of the data.
We consider the helps from two types of correlation structures, the correlation structure of the observed units, and the dependency among the unobserved factors. In Chapter \ref{chapter:gs}, in a setting of high dimensional linear regression, we study the variable selection problem when the observed predictors are correlated. In Chapter \ref{chapter:gmas}, we consider recovering the sparse mean vector of a Stein's normal means model, where the elements of the unobserved mean vector are dependent through an Ising model. In each chapter, we study the optimality in variable selection, discover the non-optimality of the conventional methods such as the lasso, subset selection and hard thresholding, and propose {\it Screen and Clean} type of variable selection procedures which are optimal in terms of the Hamming distance. The theoretical findings is supported by the simulation results and applications.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
3 July 2013 |
Date Type: |
Publication |
Defense Date: |
12 April 2013 |
Approval Date: |
3 July 2013 |
Submission Date: |
17 April 2013 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
104 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Statistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Asymptotic minimaxity, Hamming distance, predictor correlation, signal dependency, Graphlet Screening, Graphical Model Assisted Selection, phase diagram, Rare and Weak signal model, Screen and Clean, structured sparsity, Ising model, sparse graphical model |
Date Deposited: |
03 Jul 2013 14:07 |
Last Modified: |
15 Nov 2016 14:11 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/18454 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |