Skip to main navigation Skip to search Skip to main content

Joint Visual Perception and Linguistic Commonsense for Daily Events Causality Reasoning

  • Bole Ma
  • , Chao Tong*
  • *Corresponding author for this work
  • Beihang University
  • Yunnan Key Laboratory of Blockchain Application Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multimodal networks that juxtapose visual and linguistic modalities are currently widely adopted for solving vision-and-language tasks. They perform well in simple and intu-itive tasks, but are prone to mistakes in tasks involving latent or implicit details, due to the difficulty of capturing crucial but imperceptible visual signals in the real world. Perception errors lead to nonsensical results, but can be corrected by commonsense knowledge. To this end, we combine visual perception and linguistic commonsense to solve the challenging daily events causality reasoning task. We propose a novel Object-Aware Reasoning Network to focus on object inter-action while ignoring distracting information to refine visual perception. Further, a language branch with an independent prediction head is supervised to learn causality commonsense to help correct obvious perception errors, resulting in more plausible conclusions. Extensive experiments demonstrate that our method achieves new state-of-the-art results on Vis-Causal dataset.

Original languageEnglish
Title of host publicationICME 2022 - IEEE International Conference on Multimedia and Expo 2022, Proceedings
PublisherIEEE Computer Society
ISBN (Electronic)9781665485630
DOIs
StatePublished - 2022
Event2022 IEEE International Conference on Multimedia and Expo, ICME 2022 - Taipei, Taiwan, Province of China
Duration: 18 Jul 202222 Jul 2022

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2022-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2022 IEEE International Conference on Multimedia and Expo, ICME 2022
Country/TerritoryTaiwan, Province of China
CityTaipei
Period18/07/2222/07/22

Keywords

  • Causality reasoning
  • commonsense knowledge
  • multimodal net-work
  • visual perception

Fingerprint

Dive into the research topics of 'Joint Visual Perception and Linguistic Commonsense for Daily Events Causality Reasoning'. Together they form a unique fingerprint.

Cite this