Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Common sense for visual reasoning tasks

Kovashka, Adriana (2020) Common sense for visual reasoning tasks. In: Pitt Momentum Fund 2020, University of Pittsburgh, Pittsburgh, Pa. (Unpublished)


When autonomous or semi-autonomous agents are used to assist blind, disabled, or aging users, these agents will need to apply common-sense in potentially life-or-death situations. For example, an agent might be asked to fetch water for its short-of-breath owner, and not see any cups in sight; a human would guess that time is of the essence and pouring water in a bowl is acceptable, but a machine agent would not draw this conclusion, and would lose time in a critical situation. Beyond daily experiences, reasoning is also important for correctly interpreting today’s increasingly influential and complex media. Because the media targets human audiences, it assumes the ability for common-sense reasoning. For example, a human can infer that a person buying flowers while dressed in a suit could likely be going either to a date, or to a funeral; thus, a film could only imply where someone is going, and does not need to state this explicitly. An image in a news article showing children near a chainlink fence implies these children are likely either refugees or prisoners. Computer vision systems lack the ability to reason in a structured manner and make the above inferences. Some recent methods complement algorithms with limited reasoning capabilities based on external knowledge captured in graphs. However, these methods have struggled to show large benefits due to the external knowledge used, even in situations like broad, “AI-complete” question-answering where intuitively external knowledge should help. We argue this is because reasoning has been approached as predicting single-hop shortcut connections between image and outputs. In other words, existing methods perform reasoning in a simplistic fashion, using straightforward adaptations of classification approaches. We propose two techniques to unleash the benefits of proper reasoning with external information, i.e. information complementary to that found in the labeled dataset for the target task. First, we prevent the model from looking for easy give-aways and shortcuts between image and answers, which prevents the model from learning to reason in generalizable fashion. Second, we enable the model to find a multi-hop path between image and answers, which mimics how humans reason through a set of inference steps. We believe our initial exploration of these two directions will shed light on significantly more promising approaches to visual reasoning than what is currently possible.


Social Networking:
Share |


Item Type: Conference or Workshop Item (Poster)
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Kovashka, Adrianakovashka@pitt.edukovashka0000-0003-1901-9660
Centers: Other Centers, Institutes, Offices, or Units > Office of Sponsored Research > Pitt Momentum Fund
Date: 2020
Event Title: Pitt Momentum Fund 2020
Event Type: Other
DOI or Unique Handle: 10.18117/ca3d-q397
Schools and Programs: School of Computing and Information > Computer Science
Refereed: No
Date Deposited: 24 Feb 2020 17:08
Last Modified: 24 Feb 2020 18:13


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item