Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Random Forests and Regularization

Zhou, Siyu (2022) Random Forests and Regularization. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Download (1MB) | Preview


Random forests have a long-standing reputation as excellent off-the-shelf statistical learning methods. Despite their empirical success and numerous studies on their statistical properties, a full and satisfying explanation for their success has yet to be put forth. This work takes a step in this direction by demonstrating that random-feature-subsetting provides an implicit form of regularization, making random forests more advantageous in low signal-to-noise ratio (SNR) settings. Moreover, this is not a tree-specific finding but can be extended to ensembles of base learners constructed in a greedy fashion. Inspired by this, we find inclusion of additional noise features can serve as another implicit form of regularization and thereby lead to substantially more accurate models. As a result, intuitive notions of variable importance based on improved model accuracy may be deeply flawed, as even purely random noise can routinely register as statistically significant. Along these lines, we further investigate the effect of pruning trees in random forests. Despite the fact that full depth trees are recommended in many textbooks, we show that tree depth should be seen as a natural form of regularization across the entire procedure with shallow trees preferred in low SNR settings.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Zhou, Siyusiz25@pitt.edusiz250000-0001-7502-9316
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairMentch,
Committee MemberCheng,
Committee MemberIyengar,
Committee MemberWasserman,
Date: 12 October 2022
Date Type: Publication
Defense Date: 16 June 2022
Approval Date: 12 October 2022
Submission Date: 5 July 2022
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 133
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Statistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Random Forests, Bagging, Regularization, Interpolation, Ridge Regression, Model Selection
Date Deposited: 12 Oct 2022 20:35
Last Modified: 12 Oct 2022 20:35

Available Versions of this Item


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item