Kovashka, Adriana and Litman, Diane and Alikhani, Malihe
(2022)
Situated Task-Driven Multimodal Intent Modeling and Applications.
In: Pitt Momentum Fund 2022.
Abstract
How should AI systems support multimodal task-driven interactions? What cues do they use to gauge the role of each communicative modality, i.e. what objective each verbal utterance, naming, pointing gesture, or visual demonstration plays in accomplishing goals? How does the meaning of each action change with the actor’s/speaker’s goal? How do actions vary depending on physical context? We will first capture the purpose of multimodal interactions by asking humans to collaborate to complete a set of tasks (e.g. finding items in a physical or virtual environment), and retrospect on the intent of each communicative action used. We will use cameras and proximity sensors to learn the relationship between (a) human placement and (b) how humans use visual and language actions. Second, we will train machine learning models to represent the meaning and purpose of each action, and learn to predict what actions a human might perform next, in order to anticipate what assistance the human might need. Third, we will test the model’s ability to anticipate actions, after first estimating intent a physical setting (collaborative cooking, toy house building, or pretend-doctor using child toys), and a disembodied one (designing an information campaign with captioned images or infographics).
Share
Citation/Export: |
|
Social Networking: |
|
Details
Metrics
Monthly Views for the past 3 years
Plum Analytics
Altmetric.com
Actions (login required)
|
View Item |