Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Robust Parsing for Ungrammatical Sentences

Baradaran Hashemi, Homa (2018) Robust Parsing for Ungrammatical Sentences. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (1MB) | Preview


Natural Language Processing (NLP) is a research area that specializes in studying computational approaches to human language. However, not all of the natural language sentences are grammatically correct. Sentences that are ungrammatical, awkward, or too casual/colloquial tend to appear in a variety of NLP applications, from product reviews and social media analysis to intelligent language tutors or multilingual processing. In this thesis, we focus on parsing, because it is an essential component of many NLP applications. We investigate in what ways the performances of statistical parsers degrade when dealing with ungrammatical sentences. We also hypothesize that breaking up parse trees from problematic parts prevents NLP applications from degrading due to incorrect syntactic analysis.

A parser is robust if it can overlook problems such as grammar mistakes and produce a parse tree that closely resembles the correct analysis for the intended sentence. We develop a robustness evaluation metric and conduct a series of experiments to compare the performances of state-of-the-art parsers on the ungrammatical sentences. The evaluation results show that ungrammatical sentences present challenges for statistical parsers, because the well-formed syntactic trees they produce may not be appropriate for ungrammatical sentences. We also define a new framework for reviewing the parses of ungrammatical sentences and extracting the coherent parts whose syntactic analyses make sense. We call this task parse tree fragmentation. The experimental results suggest that the proposed overall fragmentation framework is a promising way to handle syntactically unusual sentences.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Baradaran Hashemi, Homahob10@pitt.eduHOB10
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHwa,
Committee MemberLitman,
Committee MemberSchunn,
Committee MemberHan,
Date: 31 January 2018
Date Type: Publication
Defense Date: 17 October 2017
Approval Date: 31 January 2018
Submission Date: 30 November 2017
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 162
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Intelligent Systems
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Artificial Intelligence, Natural Language Processing, Syntactic Parsing, Ungrammatical Sentences, Parse Tree Fragmentation
Date Deposited: 31 Jan 2018 14:24
Last Modified: 31 Jan 2018 14:24


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item