Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Keyphrasification: Summarizing text into keyphrases - using neural language generation methods

Meng, Rui (2024) Keyphrasification: Summarizing text into keyphrases - using neural language generation methods. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Download (4MB) | Preview


Keyphrases encapsulate the core information of a text, acting as effective tools for organizing and retrieving extensive data. Their utility spans various applications, including information retrieval, document classification, and automatic summarization. Given the cost and limitations of manual keyphrase assignment, there has been a growing interest in automating this process. Traditional approaches to keyphrase assignment are categorized into extraction, which involves selecting phrases directly from the text, and tagging, where pre-defined tags are applied. Both methods often fail to address the complexity of natural language. For instance, a substantial fraction of keyphrases are absent from the source text and are missed by extraction methods. This observation highlights the need to reevaluate the paradigms within keyphrase studies and refine methodologies in automatic keyphrase prediction.
This dissertation introduces KKeyPhrasification to formulate the task of keyphrase prediction. By developing a conceptual framework and defining essential properties, this work aims to deepen the understanding of keyphrase prediction and facilitate the development of more effective techniques. Furthermore, I propose a novel modeling approach, keyphrase generation (KPGEN), utilizing neural language generation to learn the mapping between texts and keyphrases directly from data to predict contextually relevant phrases in varied forms. The dissertation further presents various enhancements and mechanisms to refine this approach.
This work makes several pivotal contributions. It reformulates keyphrase prediction as a specialized form of summarization, thereby broadening the previous research scope. It innovates in automatic keyphrasification with a data-driven approach, employing neural networks to predict context-relevant phrases, overcoming the limitations of prior methodologies. Furthermore, the study explores a range of advanced language generation techniques, from basic to pre-trained and large language models, making it a comprehensive investigation into the task of keyphrasification.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Meng, Ruirui.meng@pitt.edurum200000-0001-5583-4924
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHe,
Committee MemberBrusilovsky,
Committee MemberMunro,
Committee MemberCaragea,
Date: 13 May 2024
Date Type: Publication
Defense Date: 2 April 2024
Approval Date: 13 May 2024
Submission Date: 18 April 2024
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 146
Institution: University of Pittsburgh
Schools and Programs: School of Information Sciences > Information Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Keyphrase; Keyphrasification; Keyphrase Generation; Language Generation
Date Deposited: 13 May 2024 16:08
Last Modified: 13 May 2024 16:08

Available Versions of this Item


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item