Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Efficient Learning with Soft Label Information and Multiple Annotators

Nguyen, Quang (2014) Efficient Learning with Soft Label Information and Multiple Annotators. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Submitted Version

Download (2MB) | Preview


Nowadays, large real-world data sets are collected in science, engineering, health care and other fields. These data provide us with a great resource for building automated learning systems. However, for many machine learning applications, data need to be annotated (labelled) by human before they can be used for learning. Unfortunately, the annotation process by a human expert is often very time-consuming and costly. As the result, the amount of labeled training data instances to learn from may be limited, which in turn influences the learning process and the quality of learned models. In this thesis, we investigate ways of improving the learning process in supervised classification settings in which labels are provided by human annotators. First, we study and propose a new classification learning framework, that learns, in addition to binary class label information, also from soft-label information reflecting the certainty or belief in the class label. We propose multiple methods, based on regression, max-margin and ranking methodologies, that use the soft label information in order to learn better classifiers with smaller training data and hence smaller annotation effort. We also study our soft-label approach when examples to be labeled next are selected online using active learning. Second, we study ways of distributing the annotation effort among multiple experts. We develop a new multiple-annotator learning framework that explicitly models and embraces annotator differences and biases in order to learn a consensus and annotator specific models. We demonstrate the benefits and advantages of our frameworks on both UCI data sets and our real-world clinical data extracted from Electronic Health Records.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Nguyen, Quangqun2@pitt.eduQUN2
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHauskrecht, Milosmilos@cs.pitt.eduMILOS
Committee MemberWiebe, Janycewiebe@cs.pitt.eduJMW106
Committee MemberWang, Jingtaojingtaow@cs.pitt.eduJINGTAOW
Committee MemberCooper, Gregorygfc@pitt.eduGFC
Date: 29 May 2014
Date Type: Publication
Defense Date: 24 March 2014
Approval Date: 29 May 2014
Submission Date: 23 April 2014
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 139
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Computer Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: classification, soft label, efficient learning, multiple annotators, active learning, electronic health records
Date Deposited: 29 May 2014 22:07
Last Modified: 15 Nov 2016 14:19


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item