Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Leveraging Data Augmentation and Semantic Role Labeling for Enhanced Legal Risk Prediction in Lease Contractual Text

Wang, Mengdi (2024) Leveraging Data Augmentation and Semantic Role Labeling for Enhanced Legal Risk Prediction in Lease Contractual Text. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (2MB) | Preview

Abstract

Contracts are formal statements of intent that regulate behavior among organizations and individuals, serving as foundational legal texts. They play an essential role in economic transactions and commercial or organizational agreements between people or organizations, ranging from trivial purchases to major national infrastructure projects. A contract not only states the obligations of the parties involved but also specifies permissions, prohibitions, and
even penalties or rewards. Given their critical role, contract review is essential for companies, law firms, government agencies, and individuals. This process involves verifying and
clarifying the contract’s facts and provisions, assessing feasibility, and identifying potential risks.
While researchers have tried to build computational models of intelligent behavior for contracts, most existing models perform relatively “shallow” tasks, such as extracting entities (e.g., titles, contracting parties) or classifying the basic types of sentences (e.g., obligations or provisions). Beyond obligations and permissions, commercial companies (e.g., Ravn, Kira, LawGeex) have attempted to identify specific types of provisions. Still, their models rely on the in-house datasets annotated by legal experts, which are typically not publicly accessible. These limitations stem from two key challenges in the legal domain. First, while electronic and online versions of many legal resources are available, the need for labeled data required for training supervised algorithms still involves resource-consuming annotation processes. Second, not only do legal texts contain terms and phrases that have different semantics when used in a legal context, but their syntax is also different from that of general language texts.
This dissertation addresses these challenges by leveraging advanced deep learning models to tackle the data scarcity problem in predicting risk sentences in lease agreements—one of the most common and significant contract types. Specifically, this work makes the following contributions: First, this dissertation identified the data shortage problem in risk sentence prediction. Second, this dissertation proposed three distinct methods to address this data limitation. Lastly, this work explored various strategies to combine those three augmentation methods, ultimately recommending the most effective approach.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Wang, Mengdimew133@pitt.edumew133
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHe, Daqingdah44@pitt.edudah44
Committee MemberAshley, Kevin Dashley@pitt.eduASHLEY
Committee MemberMunro, Paulpwm@pitt.edupwm
Committee MemberLiu, Xiaozhongxliu14@wpi.edu
Date: 20 December 2024
Date Type: Publication
Defense Date: 22 October 2024
Approval Date: 20 December 2024
Submission Date: 3 December 2024
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 119
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Intelligent Systems
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Legal Risk Prediction, Lease Contracts, Text Data Augmentation, Semantic Role Labeling (SRL), Text Classification
Date Deposited: 20 Dec 2024 13:48
Last Modified: 20 Dec 2024 13:48
URI: http://d-scholarship.pitt.edu/id/eprint/47159

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item