Zhang, Peiliang
(2023)
Robust Estimation and Inference under Huber’s Contamination Model.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Huber's contamination model is widely used for analyzing distributional robustness when the shape of the real underlying data distribution deviates from the assumed model. Specifically, it models that the observed data are contaminated by some arbitrary unknown distribution with a small fraction. In this dissertation, we study the robust regression and robust density estimation under Huber’s contamination model.
In the regression setting, we assume that the noise has a heavy-tailed distribution and may be arbitrarily contaminated with a small fraction under an increasing dimension regime. We show that robust M-estimators can achieve the minimax convergence rate (except for the intercept if the uncontaminated distribution of the noise is asymmetric). We develop a multiplier bootstrap technique to construct confidence intervals for linear functionals of the coefficients. When the contamination proportion is relatively large, we further provide a bias correction procedure to alleviate the bias due to contamination. The robust estimation and inference framework can be extended to a distributed learning setting. Specifically, we demonstrate that a communication-efficient M-estimator can attain the centralized minimax rate (as if one has access to the entire data). Moreover, based on this communication-efficient M-estimator, a distributed multiplier bootstrap method is proposed only on the master machine, which is able to generate confidence intervals with optimal widths. A comprehensive simulation study demonstrates the effectiveness of our proposed procedures.
In the density estimation setting, we aim to robustly estimate a multivariate density function on $\mathbb{R}^d$ with $L_p$ loss functions from contaminated data. To investigate the contamination effect on the optimal estimation of the density, we first establish the minimax rate with the assumption that the density is in an anisotropic Nikol’skii class. We then develop a data-driven bandwidth selection procedure for kernel estimators via a robust generalization of the Goldenshluger-Lepski method. We show that the proposed bandwidth selection rule can lead to the estimator being minimax adaptive to either the smoothness parameter or the contamination proportion. When both of them are unknown, we prove that finding any minimax-rate adaptive method is impossible. Extensions to smooth contamination cases are also discussed.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
26 January 2023 |
Date Type: |
Publication |
Defense Date: |
21 November 2022 |
Approval Date: |
26 January 2023 |
Submission Date: |
5 December 2022 |
Access Restriction: |
1 year -- Restrict access to University of Pittsburgh for a period of 1 year. |
Number of Pages: |
163 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Statistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Robust statistics; Huber’s contamination model; heavy-tailed distribution; M-estimation; multiplier bootstrap; communication-efficient estimator; distributed inference; minimax rate; adaptive density estimation; the Goldenshluger-Lepski method. |
Date Deposited: |
26 Jan 2023 15:32 |
Last Modified: |
06 Jul 2023 15:11 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/43938 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |