Learning visual attributes from contextual explanationsMurrugarra-Llerena, Nils (2019) Learning visual attributes from contextual explanations. Doctoral Dissertation, University of Pittsburgh. (Unpublished)
AbstractIn computer vision, attributes are mid-level concepts shared across categories. They provide a natural communication between humans and machines for image retrieval. They also provide detailed information about objects. Finally, attributes can describe properties of unfamiliar objects. These are some very appealing properties of attributes, but learning attributes is a challenging task. Since attributes are less well-defined, capturing them with computational models poses a different set of challenges than capturing object categories does. There is a miscommunication of attributes between humans and machines, since machines may not understand what humans have in mind when referring to a particular attribute. Humans usually provide labels if an object or attribute is present or not without any explanation. However, attributes are more complex and may require explanations for a better understanding. This Ph.D. thesis aims to tackle these challenges in learning automatic attribute predictive models. In particular, it focuses on enhancing attribute predictive power with contextual explanations. These explanations aim to enhance data quality with human knowledge, which can be expressed in the form of interactions and may be affected by our personality. First, we emulate human learning skill to understand unfamiliar situations. Humans try to infer properties from what they already know (background knowledge). Hence, we study attribute learning in data-scarce and non-related domains emulating human understanding skills. We discover transferable knowledge to learn attributes from different domains. Our previous project inspires us to request contextual explanations to improve attribute learning. Thus, we enhance attribute learning with context in the form of gaze, captioning, and sketches. Human gaze captures subconscious intuition and associates certain components to the meaning of an attribute. For example, gaze associates the tiptoe of a shoe to a pointy attribute. To complement this gaze representation, captioning follows conscious thinking with prior analysis. An annotator may analyze an image and may provide the following description: “This shoe is pointy because its sharp form at the tiptoe”. Finally, in image search, sketches provide a holistic view of an image query, which complement specific details encapsulated via attribute comparisons. To conclude, our methods with contextual explanations outperform many baselines via quantitative and qualitative evaluation. Share
Details
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|