Zhou, Siyu
(2022)
Random Forests and Regularization.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
This is the latest version of this item.
Abstract
Random forests have a long-standing reputation as excellent off-the-shelf statistical learning methods. Despite their empirical success and numerous studies on their statistical properties, a full and satisfying explanation for their success has yet to be put forth. This work takes a step in this direction by demonstrating that random-feature-subsetting provides an implicit form of regularization, making random forests more advantageous in low signal-to-noise ratio (SNR) settings. Moreover, this is not a tree-specific finding but can be extended to ensembles of base learners constructed in a greedy fashion. Inspired by this, we find inclusion of additional noise features can serve as another implicit form of regularization and thereby lead to substantially more accurate models. As a result, intuitive notions of variable importance based on improved model accuracy may be deeply flawed, as even purely random noise can routinely register as statistically significant. Along these lines, we further investigate the effect of pruning trees in random forests. Despite the fact that full depth trees are recommended in many textbooks, we show that tree depth should be seen as a natural form of regularization across the entire procedure with shallow trees preferred in low SNR settings.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
12 October 2022 |
Date Type: |
Publication |
Defense Date: |
16 June 2022 |
Approval Date: |
12 October 2022 |
Submission Date: |
5 July 2022 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
133 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Statistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Random Forests, Bagging, Regularization, Interpolation, Ridge Regression, Model Selection |
Date Deposited: |
12 Oct 2022 20:35 |
Last Modified: |
12 Oct 2022 20:35 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/43285 |
Available Versions of this Item
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |