Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Quantifying Uncertainty in context of Natural Language Processing

Jung, Taehee (2022) Quantifying Uncertainty in context of Natural Language Processing. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (10MB) | Preview

Abstract

Despite recent advances in statistical machine learning that significantly improve performance, the uncertainty behind models remains largely underexplored. We identify two sources of uncertainty in this dissertation, one coming from learning sources such as algorithms or datasets and the other from the model's predicted output. In order to better understand or even improve the model's results, we then quantify two uncertainties. In particular, we study three topics of uncertainty quantification in the context of natural language processing (NLP). Firstly, we quantify model and corpus biases in text summarization based on three sub-aspects; position, importance, and diversity. Secondly, we develop a simple but effective end-to-end procedure for improving the performance of text classification tasks and the quality of the model calibration. Finally, we propose a new framework of model calibration to interpret individual point estimations with confidence and show less-biased relative frequency approximation in classification.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Jung, Taeheetaj41@pitt.edutaj41
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairMentch, Lucas K.lkm31@pitt.edu
Committee MemberCheng, Yuyucheng@pitt.edu
Committee MemberChen, KehuiKHCHEN@pitt.edu
Committee MemberWallace, Meredith Lmel20@pitt.edu
Date: 11 October 2022
Date Type: Publication
Defense Date: 21 July 2022
Approval Date: 11 October 2022
Submission Date: 30 July 2022
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 97
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Statistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Model Uncertainty, Model Calibration, Confidence Interval, Natural Language Processing, Text Summarization, Text Classification
Date Deposited: 11 Oct 2022 20:33
Last Modified: 11 Oct 2022 20:33
URI: http://d-scholarship.pitt.edu/id/eprint/43419

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item