Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Robust Estimation and Inference under Huber’s Contamination Model

Zhang, Peiliang (2023) Robust Estimation and Inference under Huber’s Contamination Model. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (1MB) | Preview


Huber's contamination model is widely used for analyzing distributional robustness when the shape of the real underlying data distribution deviates from the assumed model. Specifically, it models that the observed data are contaminated by some arbitrary unknown distribution with a small fraction. In this dissertation, we study the robust regression and robust density estimation under Huber’s contamination model.
In the regression setting, we assume that the noise has a heavy-tailed distribution and may be arbitrarily contaminated with a small fraction under an increasing dimension regime. We show that robust M-estimators can achieve the minimax convergence rate (except for the intercept if the uncontaminated distribution of the noise is asymmetric). We develop a multiplier bootstrap technique to construct confidence intervals for linear functionals of the coefficients. When the contamination proportion is relatively large, we further provide a bias correction procedure to alleviate the bias due to contamination. The robust estimation and inference framework can be extended to a distributed learning setting. Specifically, we demonstrate that a communication-efficient M-estimator can attain the centralized minimax rate (as if one has access to the entire data). Moreover, based on this communication-efficient M-estimator, a distributed multiplier bootstrap method is proposed only on the master machine, which is able to generate confidence intervals with optimal widths. A comprehensive simulation study demonstrates the effectiveness of our proposed procedures.
In the density estimation setting, we aim to robustly estimate a multivariate density function on $\mathbb{R}^d$ with $L_p$ loss functions from contaminated data. To investigate the contamination effect on the optimal estimation of the density, we first establish the minimax rate with the assumption that the density is in an anisotropic Nikol’skii class. We then develop a data-driven bandwidth selection procedure for kernel estimators via a robust generalization of the Goldenshluger-Lepski method. We show that the proposed bandwidth selection rule can lead to the estimator being minimax adaptive to either the smoothness parameter or the contamination proportion. When both of them are unknown, we prove that finding any minimax-rate adaptive method is impossible. Extensions to smooth contamination cases are also discussed.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Zhang, PeiliangPEZ35@pitt.edupez35
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairRen, Zhaozren@pitt.eduZREN
Committee MemberIyengar, Satishssi@pitt.eduSSI
Committee MemberChen, Kehuikhchen@pitt.edukhchen
Committee MemberZhou,
Date: 26 January 2023
Date Type: Publication
Defense Date: 21 November 2022
Approval Date: 26 January 2023
Submission Date: 5 December 2022
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 163
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Statistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Robust statistics; Huber’s contamination model; heavy-tailed distribution; M-estimation; multiplier bootstrap; communication-efficient estimator; distributed inference; minimax rate; adaptive density estimation; the Goldenshluger-Lepski method.
Date Deposited: 26 Jan 2023 15:32
Last Modified: 06 Jul 2023 15:11


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item