Skip to main navigation Skip to search Skip to main content

Label-Free Contrastive Learning for Open-World Multimodal Social Event Detection

  • Zhiwei Yang*
  • , Haimei Qin*
  • , Hao Peng
  • , Xiaoyan Yu
  • , Li Sun
  • , Lei Jiang
  • *Corresponding author for this work
  • CAS - Institute of Information Engineering
  • University of Chinese Academy of Sciences
  • Beijing Institute of Technology
  • North China Electric Power University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multimodal content on social media contains abundant cues about real-world events, and its automatic detection is critical for public safety and social governance. However, Multimodal Social Event Detection in the open world faces two major challenges: (1) They depend on supervised event labels or structured information; however, social media data in open-world settings often lack both, making it challenging for such methods to adapt to the dynamic nature of social media. (2) They rely on predefined label sets, i.e., the total number of events must generally be specified during the detection process. In contrast, in the open world, the total number of events is inherently difficult to estimate. To tackle these challenges, this paper proposes LFEvent, a label-free contrastive learning framework for Multimodal Social Event Detection. To address the first challenge, we design a label-free multimodal contrastive learning strategy that relies solely on positive samples. Specifically, we design a multimodal large language model-based semantic enhancement strategy. Leveraging carefully crafted prompts, it enriches raw image-text pairs across three dimensions - event theme, event type, and image description - to construct robust positive samples. Subsequently, a dedicated Siamese Network enables self-supervised cross-modal alignment and representation learning. To address the second challenge, we introduce unsupervised clustering into the MSED task for the first time. A novel structure entropy-guided hierarchical clustering method is proposed, which automatically determines the number of event clusters and enables the detection of unseen events in the training set. Experiments on multiple social media datasets demonstrate that LFEvent significantly outperforms existing methods, especially in detecting previously unseen events.

Original languageEnglish
Title of host publicationWSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining
PublisherAssociation for Computing Machinery, Inc
Pages818-827
Number of pages10
ISBN (Electronic)9798400722929
DOIs
StatePublished - 21 Feb 2026
Event19th ACM International Conference on Web Search and Data Mining, WSDM 2026 - Boise, United States
Duration: 22 Feb 202626 Feb 2026

Publication series

NameWSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining

Conference

Conference19th ACM International Conference on Web Search and Data Mining, WSDM 2026
Country/TerritoryUnited States
CityBoise
Period22/02/2626/02/26

Keywords

  • contrastive learning
  • multimodal social event detection
  • structural entropy

Fingerprint

Dive into the research topics of 'Label-Free Contrastive Learning for Open-World Multimodal Social Event Detection'. Together they form a unique fingerprint.

Cite this