Sequential alignment attention model for scene text recognition

  • Yan Wu
  • , Jiaxin Fan
  • , Renshuai Tao*
  • , Jiakai Wang
  • , Haotong Qin
  • , Aishan Liu
  • , Xianglong Liu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Scene text recognition has been a hot research topic in computer vision due to its various applications. The state-of-the-art solutions usually depend on the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. Unfortunately, there often exists severe misalignment between feature areas and text labels in real-world scenarios. To address this problem, this paper proposes a sequential alignment attention model to enhance the alignment between input images and output character sequences. In this model, an attention gated recurrent unit (AGRU) is first devised to distinguish the text and background regions, and further extract the localized features focusing on sequential text regions. Furthermore, CTC guided decoding strategy is integrated into the popular attention-based decoder, which not only helps to boost the convergence of the training but also enhances the well-aligned sequence recognition. Extensive experiments on various benchmarks, including the IIIT5k, SVT, and ICDAR datasets, show that our method substantially outperforms the state-of-the-art methods.

Original languageEnglish
Article number103289
JournalJournal of Visual Communication and Image Representation
Volume80
DOIs
StatePublished - Oct 2021

Keywords

  • Attention mechanism
  • Attention-gated recurrent unit
  • Connectionist temporal classification
  • Scene text recognition

Fingerprint

Dive into the research topics of 'Sequential alignment attention model for scene text recognition'. Together they form a unique fingerprint.

Cite this