Skip to main navigation Skip to search Skip to main content

Parametric Visual Program Induction with Function Modularization

  • Xuguang Duan
  • , Xin Wang*
  • , Ziwei Zhang
  • , Wenwu Zhu*
  • *Corresponding author for this work
  • Tsinghua University

Research output: Contribution to journalConference articlepeer-review

Abstract

Generating programs to describe visual observations has gained much research attention recently. However, most of the existing approaches are based on non-parametric primitive functions, making them unable to handle complex visual scenes involving many attributes and details. In this paper, we propose the concept of parametric visual program induction. Learning to generate parametric programs for visual scenes is challenging due to the huge number of function variants and the complex function correlations. To solve these challenges, we propose the method of function modularization, capable of dealing with numerous function variants and complex correlations. Specifically, we model each parametric function as a multi-head self-contained neural module to cover different function variants. Moreover, to eliminate the complex correlations between functions, we propose the hierarchical heterogeneous Monto-Carlo tree search (H2MCTS) algorithm which can provide high-quality uncorrelated supervision during training, and serve as an efficient searching technique during testing. We demonstrate the superiority of the proposed method on three visual program induction datasets involving parametric primitive functions. Experimental results show that our proposed model is able to significantly outperform the state-of-the-art baseline methods in terms of generating accurate programs.

Original languageEnglish
Pages (from-to)5643-5658
Number of pages16
JournalProceedings of Machine Learning Research
Volume162
StatePublished - 2022
Externally publishedYes
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States
Duration: 17 Jul 202223 Jul 2022

Fingerprint

Dive into the research topics of 'Parametric Visual Program Induction with Function Modularization'. Together they form a unique fingerprint.

Cite this