TY - JOUR
T1 - DuoNet
T2 - Joint optimization of representation learning and prototype classifier for unbiased scene graph generation
AU - Wang, Zhaodi
AU - Leng, Biao
AU - Zhang, Shuo
N1 - Publisher Copyright:
© 2026 Elsevier Ltd
PY - 2026/8
Y1 - 2026/8
N2 - Unbiased Scene Graph Generation (SGG) aims to parse visual scenes into highly informative graphs under the long-tail challenge. While prototype-based methods have shown promise in unbiased SGG, they highlight the importance of learning discriminative features that are intra-class compact and inter-class separable. In this paper, we revisit prototype-based methods and analyze critical roles of representation learning and prototype classifier in driving unbiased SGG, and accordingly propose a novel framework DuoNet. To enhance intra-class compactness, we introduce a Bi-Directional Representation Refinement (BiDR2) module that captures relation-sensitive visual variability and within-relation visual consistency of entities. This module adopts relation-to-entity-to-relation refinement by integrating dual-level relation pattern modeling with a relation-specific entity constraint. Furthermore, a Knowledge-Guided Prototype Learning (KGPL) module is devised to strengthen inter-class separability by constructing an equidistributed prototypical classifier with maximum inter-class margins. The equidistributed prototype classifier is frozen during SGG training to mitigate long-tail bias, thus a knowledge-driven triplet loss is developed to strengthen the learning of BiDR2, enhancing relation-prototype matching. Extensive experiments demonstrate the effectiveness of our method, which sets new state-of-the-art performance on Visual Genome, GQA and Open Images datasets.
AB - Unbiased Scene Graph Generation (SGG) aims to parse visual scenes into highly informative graphs under the long-tail challenge. While prototype-based methods have shown promise in unbiased SGG, they highlight the importance of learning discriminative features that are intra-class compact and inter-class separable. In this paper, we revisit prototype-based methods and analyze critical roles of representation learning and prototype classifier in driving unbiased SGG, and accordingly propose a novel framework DuoNet. To enhance intra-class compactness, we introduce a Bi-Directional Representation Refinement (BiDR2) module that captures relation-sensitive visual variability and within-relation visual consistency of entities. This module adopts relation-to-entity-to-relation refinement by integrating dual-level relation pattern modeling with a relation-specific entity constraint. Furthermore, a Knowledge-Guided Prototype Learning (KGPL) module is devised to strengthen inter-class separability by constructing an equidistributed prototypical classifier with maximum inter-class margins. The equidistributed prototype classifier is frozen during SGG training to mitigate long-tail bias, thus a knowledge-driven triplet loss is developed to strengthen the learning of BiDR2, enhancing relation-prototype matching. Extensive experiments demonstrate the effectiveness of our method, which sets new state-of-the-art performance on Visual Genome, GQA and Open Images datasets.
KW - Long-tail distribution
KW - Prototype learning
KW - Unbiased scene graph generation
KW - Visual scene understanding
UR - https://www.scopus.com/pages/publications/105029257827
U2 - 10.1016/j.patcog.2026.113152
DO - 10.1016/j.patcog.2026.113152
M3 - 文章
AN - SCOPUS:105029257827
SN - 0031-3203
VL - 176
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 113152
ER -