Skip to main navigation Skip to search Skip to main content

DuoNet: Joint optimization of representation learning and prototype classifier for unbiased scene graph generation

  • Zhaodi Wang
  • , Biao Leng*
  • , Shuo Zhang
  • *Corresponding author for this work
  • Beihang University
  • Beijing Jiaotong University

Research output: Contribution to journalArticlepeer-review

Abstract

Unbiased Scene Graph Generation (SGG) aims to parse visual scenes into highly informative graphs under the long-tail challenge. While prototype-based methods have shown promise in unbiased SGG, they highlight the importance of learning discriminative features that are intra-class compact and inter-class separable. In this paper, we revisit prototype-based methods and analyze critical roles of representation learning and prototype classifier in driving unbiased SGG, and accordingly propose a novel framework DuoNet. To enhance intra-class compactness, we introduce a Bi-Directional Representation Refinement (BiDR2) module that captures relation-sensitive visual variability and within-relation visual consistency of entities. This module adopts relation-to-entity-to-relation refinement by integrating dual-level relation pattern modeling with a relation-specific entity constraint. Furthermore, a Knowledge-Guided Prototype Learning (KGPL) module is devised to strengthen inter-class separability by constructing an equidistributed prototypical classifier with maximum inter-class margins. The equidistributed prototype classifier is frozen during SGG training to mitigate long-tail bias, thus a knowledge-driven triplet loss is developed to strengthen the learning of BiDR2, enhancing relation-prototype matching. Extensive experiments demonstrate the effectiveness of our method, which sets new state-of-the-art performance on Visual Genome, GQA and Open Images datasets.

Original languageEnglish
Article number113152
JournalPattern Recognition
Volume176
DOIs
StatePublished - Aug 2026

Keywords

  • Long-tail distribution
  • Prototype learning
  • Unbiased scene graph generation
  • Visual scene understanding

Fingerprint

Dive into the research topics of 'DuoNet: Joint optimization of representation learning and prototype classifier for unbiased scene graph generation'. Together they form a unique fingerprint.

Cite this