Krishnaswamy, Raja
(2024)
A Framework for Concept Extraction from Course Presentations.
Master's Thesis, University of Pittsburgh.
(Unpublished)
Abstract
Information on the concepts covered in specific course materials has the potential to enhance the quality of education, but manually extracting such concepts is a time-consuming and laborious task. In addition, since, to our knowledge, there are no publicly available datasets of concept-labeled course presentations, there is a dearth of data available. To this end, this thesis compiles a novel dataset based on computer science lecture presentations from the University of Pittsburgh that is annotated with concepts using the BIO span-labeling framework. This dataset is comprised of a mix of human-expert labeled and automatically labeled data, which alleviated the heavy workload of manually annotating the new dataset. This thesis presents a transformer-based system trained on this dataset to automatically extract these concepts from course presentatons, a task similar to Named Entity Recognition. To reduce the noise from automated labeling, the model is additionally trained through self-training methods on unlabeled data. Results show that this framework does better than state-of-the-art Named Entity Recognition systems, with an over 10% absolute F1-score improvement in three evaluations.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
9 May 2024 |
Date Type: |
Publication |
Defense Date: |
18 April 2024 |
Approval Date: |
9 May 2024 |
Submission Date: |
22 April 2024 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
60 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Computing and Information > Computer Science |
Degree: |
MS - Master of Science |
Thesis Type: |
Master's Thesis |
Refereed: |
Yes |
Uncontrolled Keywords: |
nlp, self-training, xlnet, gpt, llm, distant-learning |
Date Deposited: |
09 May 2024 15:36 |
Last Modified: |
09 May 2024 15:36 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/46187 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |