Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Learning visual attributes from contextual explanations

Murrugarra-Llerena, Nils (2019) Learning visual attributes from contextual explanations. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (9MB) | Preview


In computer vision, attributes are mid-level concepts shared across categories. They provide a natural communication between humans and machines for image retrieval. They also provide detailed information about objects. Finally, attributes can describe properties of unfamiliar objects. These are some very appealing properties of attributes, but learning attributes is a challenging task. Since attributes are less well-defined, capturing them with computational models poses a different set of challenges than capturing object categories does. There is a miscommunication of attributes between humans and machines, since machines may not understand what humans have in mind when referring to a particular attribute. Humans usually provide labels if an object or attribute is present or not without any explanation. However, attributes are more complex and may require explanations for a better understanding.

This Ph.D. thesis aims to tackle these challenges in learning automatic attribute predictive models. In particular, it focuses on enhancing attribute predictive power with contextual explanations. These explanations aim to enhance data quality with human knowledge, which can be expressed in the form of interactions and may be affected by our personality.

First, we emulate human learning skill to understand unfamiliar situations. Humans try to infer properties from what they already know (background knowledge). Hence, we study attribute learning in data-scarce and non-related domains emulating human understanding skills. We discover transferable knowledge to learn attributes from different domains.

Our previous project inspires us to request contextual explanations to improve attribute learning. Thus, we enhance attribute learning with context in the form of gaze, captioning, and sketches. Human gaze captures subconscious intuition and associates certain components to the meaning of an attribute. For example, gaze associates the tiptoe of a shoe to a pointy attribute. To complement this gaze representation, captioning follows conscious thinking with prior analysis. An annotator may analyze an image and may provide the following description: “This shoe is pointy because its sharp form at the tiptoe”. Finally, in image search, sketches provide a holistic view of an image query, which complement specific details encapsulated via attribute comparisons. To conclude, our methods with contextual explanations outperform many baselines via quantitative and qualitative evaluation.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Murrugarra-Llerena, Nilsnineil.cs@gmail.comnim60
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairKovashka, Adrianakovashka@cs.pitt.eduAIK85
Committee MemberHwa, Rebeccahwa@cs.pitt.edureh23
Committee MemberHauskrecht, Milosmilos@cs.pitt.eduMILOS
Committee MemberHe, Daqingdah44@pitt.edudah44
Date: 30 August 2019
Date Type: Publication
Defense Date: 12 April 2019
Approval Date: 30 August 2019
Submission Date: 5 August 2019
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 136
Institution: University of Pittsburgh
Schools and Programs: School of Computing and Information > Computer Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: computer vision, machine learning, attribute learning, metric learning, reinforcement learning, transfer learning.
Date Deposited: 30 Aug 2019 15:43
Last Modified: 30 Aug 2019 15:43


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item