Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Advancing Inference in Supervised Learning Procedures via Permutation Tests and Importance Sampling, with Applications to Environmental Science

Coleman, Timothy (2021) Advancing Inference in Supervised Learning Procedures via Permutation Tests and Importance Sampling, with Applications to Environmental Science. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (3MB) | Preview

Abstract

Random forests, since being proposed by Breiman (2001), have become popular supervised regression and classification techniques. Their popularity stems from being easy to implement - the default hyper-parameter settings are often not far from optimal and are often competitive with more involved supervised models. While random forests are complex, they are not completely impenetrable to theoretical analysis. In this thesis, we present several contributions to random forest methodology. First, we provide a motivating application of random forests to ornithological data, where we develop a novel hypothesis test for testing equality of distribution of random forest curves. Then, we refine an observation made during that application into a means of testing hypotheses about the validation error of random forests, allowing for computationally efficient tests that are analogous to the F-test for linear regression. Finally, we propose a means of accounting for a discrepancy in test and training distributions, motivated by the problem of forecasting power outages from hurricanes.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Coleman, Timothytsc35@pitt.edutsc350000-0002-9837-4275
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairMentch, Lucaslkm31@pitt.edulkm31
Committee MemberIyengar, Satishssi@pitt.edussi
Committee MemberChen, Kehuikhchen@pitt.edukhchen
Committee MemberWasserman, Larrylarry@stat.cmu.edu
Date: 20 January 2021
Date Type: Publication
Defense Date: 19 November 2020
Approval Date: 20 January 2021
Submission Date: 3 December 2020
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 148
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Statistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Random Forests, Machine Learning, Environmental Statistics
Date Deposited: 20 Jan 2021 18:21
Last Modified: 20 Jan 2021 18:21
URI: http://d-scholarship.pitt.edu/id/eprint/39985

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item