Robust Estimation and Inference under Huber’s Contamination Model

Zhang, Peiliang (2023) Robust Estimation and Inference under Huber’s Contamination Model. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Download (1MB) | Preview

Abstract

Huber's contamination model is widely used for analyzing distributional robustness when the shape of the real underlying data distribution deviates from the assumed model. Specifically, it models that the observed data are contaminated by some arbitrary unknown distribution with a small fraction. In this dissertation, we study the robust regression and robust density estimation under Huber’s contamination model.
In the regression setting, we assume that the noise has a heavy-tailed distribution and may be arbitrarily contaminated with a small fraction under an increasing dimension regime. We show that robust M-estimators can achieve the minimax convergence rate (except for the intercept if the uncontaminated distribution of the noise is asymmetric). We develop a multiplier bootstrap technique to construct confidence intervals for linear functionals of the coefficients. When the contamination proportion is relatively large, we further provide a bias correction procedure to alleviate the bias due to contamination. The robust estimation and inference framework can be extended to a distributed learning setting. Specifically, we demonstrate that a communication-efficient M-estimator can attain the centralized minimax rate (as if one has access to the entire data). Moreover, based on this communication-efficient M-estimator, a distributed multiplier bootstrap method is proposed only on the master machine, which is able to generate confidence intervals with optimal widths. A comprehensive simulation study demonstrates the effectiveness of our proposed procedures.
In the density estimation setting, we aim to robustly estimate a multivariate density function on $\mathbb{R}^d$ with $L_p$ loss functions from contaminated data. To investigate the contamination effect on the optimal estimation of the density, we first establish the minimax rate with the assumption that the density is in an anisotropic Nikol’skii class. We then develop a data-driven bandwidth selection procedure for kernel estimators via a robust generalization of the Goldenshluger-Lepski method. We show that the proposed bandwidth selection rule can lead to the estimator being minimax adaptive to either the smoothness parameter or the contamination proportion. When both of them are unknown, we prove that finding any minimax-rate adaptive method is impossible. Extensions to smooth contamination cases are also discussed.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Zhang, Peiliang	PEZ35@pitt.edu	pez35

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Ren, Zhao	zren@pitt.edu	ZREN
Committee Member	Iyengar, Satish	ssi@pitt.edu	SSI
Committee Member	Chen, Kehui	khchen@pitt.edu	khchen
Committee Member	Zhou, Wenxin	wez243@ucsd.edu

Date:

26 January 2023

Date Type:

Publication

Defense Date:

21 November 2022

Approval Date:

26 January 2023

Submission Date:

5 December 2022

Access Restriction:

1 year -- Restrict access to University of Pittsburgh for a period of 1 year.

Number of Pages:

163

Institution:

University of Pittsburgh

Schools and Programs:

Dietrich School of Arts and Sciences > Statistics

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Robust statistics; Huber’s contamination model; heavy-tailed distribution; M-estimation; multiplier bootstrap; communication-efficient estimator; distributed inference; minimax rate; adaptive density estimation; the Goldenshluger-Lepski method.

Date Deposited:

26 Jan 2023 15:32

Last Modified:

06 Jul 2023 15:11

URI:

http://d-scholarship.pitt.edu/id/eprint/43938

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Robust Estimation and Inference under Huber’s Contamination Model

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds