Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

A Framework for Concept Extraction from Course Presentations

Krishnaswamy, Raja (2024) A Framework for Concept Extraction from Course Presentations. Master's Thesis, University of Pittsburgh. (Unpublished)

Primary Text

Download (628kB) | Preview


Information on the concepts covered in specific course materials has the potential to enhance the quality of education, but manually extracting such concepts is a time-consuming and laborious task. In addition, since, to our knowledge, there are no publicly available datasets of concept-labeled course presentations, there is a dearth of data available. To this end, this thesis compiles a novel dataset based on computer science lecture presentations from the University of Pittsburgh that is annotated with concepts using the BIO span-labeling framework. This dataset is comprised of a mix of human-expert labeled and automatically labeled data, which alleviated the heavy workload of manually annotating the new dataset. This thesis presents a transformer-based system trained on this dataset to automatically extract these concepts from course presentatons, a task similar to Named Entity Recognition. To reduce the noise from automated labeling, the model is additionally trained through self-training methods on unlabeled data. Results show that this framework does better than state-of-the-art Named Entity Recognition systems, with an over 10% absolute F1-score improvement in three evaluations.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Krishnaswamy, Rajarek94@pitt.edurek940009-0008-3708-7185
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairMosse, Danielmosse@pitt.edumosse0000-0002-9508-9815
Committee MemberShi, Ryanryanshi@pitt.eduryanshi0000-0001-7899-4680
Committee MemberYoder, Michael Millermmy29@pitt.edummy290000-0002-0489-3358
Date: 9 May 2024
Date Type: Publication
Defense Date: 18 April 2024
Approval Date: 9 May 2024
Submission Date: 22 April 2024
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 60
Institution: University of Pittsburgh
Schools and Programs: School of Computing and Information > Computer Science
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: nlp, self-training, xlnet, gpt, llm, distant-learning
Date Deposited: 09 May 2024 15:36
Last Modified: 09 May 2024 15:36


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item