Wang, Mengdi
(2024)
Leveraging Data Augmentation and Semantic Role Labeling for Enhanced Legal Risk Prediction in Lease Contractual Text.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Contracts are formal statements of intent that regulate behavior among organizations and individuals, serving as foundational legal texts. They play an essential role in economic transactions and commercial or organizational agreements between people or organizations, ranging from trivial purchases to major national infrastructure projects. A contract not only states the obligations of the parties involved but also specifies permissions, prohibitions, and
even penalties or rewards. Given their critical role, contract review is essential for companies, law firms, government agencies, and individuals. This process involves verifying and
clarifying the contract’s facts and provisions, assessing feasibility, and identifying potential risks.
While researchers have tried to build computational models of intelligent behavior for contracts, most existing models perform relatively “shallow” tasks, such as extracting entities (e.g., titles, contracting parties) or classifying the basic types of sentences (e.g., obligations or provisions). Beyond obligations and permissions, commercial companies (e.g., Ravn, Kira, LawGeex) have attempted to identify specific types of provisions. Still, their models rely on the in-house datasets annotated by legal experts, which are typically not publicly accessible. These limitations stem from two key challenges in the legal domain. First, while electronic and online versions of many legal resources are available, the need for labeled data required for training supervised algorithms still involves resource-consuming annotation processes. Second, not only do legal texts contain terms and phrases that have different semantics when used in a legal context, but their syntax is also different from that of general language texts.
This dissertation addresses these challenges by leveraging advanced deep learning models to tackle the data scarcity problem in predicting risk sentences in lease agreements—one of the most common and significant contract types. Specifically, this work makes the following contributions: First, this dissertation identified the data shortage problem in risk sentence prediction. Second, this dissertation proposed three distinct methods to address this data limitation. Lastly, this work explored various strategies to combine those three augmentation methods, ultimately recommending the most effective approach.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
20 December 2024 |
Date Type: |
Publication |
Defense Date: |
22 October 2024 |
Approval Date: |
20 December 2024 |
Submission Date: |
3 December 2024 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
119 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Intelligent Systems |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Legal Risk Prediction, Lease Contracts, Text Data Augmentation, Semantic Role Labeling (SRL), Text Classification |
Date Deposited: |
20 Dec 2024 13:48 |
Last Modified: |
20 Dec 2024 13:48 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/47159 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |