Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Quantifying Uncertainty in context of Natural Language Processing

Jung, Taehee (2022) Quantifying Uncertainty in context of Natural Language Processing. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (10MB) | Preview


Despite recent advances in statistical machine learning that significantly improve performance, the uncertainty behind models remains largely underexplored. We identify two sources of uncertainty in this dissertation, one coming from learning sources such as algorithms or datasets and the other from the model's predicted output. In order to better understand or even improve the model's results, we then quantify two uncertainties. In particular, we study three topics of uncertainty quantification in the context of natural language processing (NLP). Firstly, we quantify model and corpus biases in text summarization based on three sub-aspects; position, importance, and diversity. Secondly, we develop a simple but effective end-to-end procedure for improving the performance of text classification tasks and the quality of the model calibration. Finally, we propose a new framework of model calibration to interpret individual point estimations with confidence and show less-biased relative frequency approximation in classification.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Jung, Taeheetaj41@pitt.edutaj41
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairMentch, Lucas
Committee MemberCheng,
Committee MemberChen,
Committee MemberWallace, Meredith
Date: 11 October 2022
Date Type: Publication
Defense Date: 21 July 2022
Approval Date: 11 October 2022
Submission Date: 30 July 2022
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 97
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Statistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Model Uncertainty, Model Calibration, Confidence Interval, Natural Language Processing, Text Summarization, Text Classification
Date Deposited: 11 Oct 2022 20:33
Last Modified: 11 Oct 2022 20:33


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item