Skip to main navigation Skip to search Skip to main content

Exploring Out-of-Distribution Scene Text Recognition for Driving Scenes with Hybrid Test-Time Adaptation

  • Xiaoyu Xian
  • , Jinghui Qin
  • , Yukai Shi
  • , Daxin Tian*
  • , Liang Lin
  • *Corresponding author for this work
  • Beihang University
  • CRRC Corporation Limited
  • Guangdong University of Technology
  • Sun Yat-Sen University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Scene Text Recognition (STR) in dynamic driving scenes is important for recognizing real-world kilometer marker to facilitate the scheduling and operation of industrial scenes. For example, the location information of the train affects the safe and reliable operation of the transportation, which can be effectively determined by identifying the kilometer markers with STR technology. However, most of the existing STR models make the independent and identically distributed (i.i.d) assumption that all the training data and test data are drawn from the same data distribution. Although satisfactory performance is achieved under i.i.d assumption, existing STR models remain notoriously weak at generalization on out-of-distribution (o.o.d) data, making a system unreliable and unsafe. To validate this phenomenon, we attempt to propose a new hybrid test-time adaptation (HTTA) to improve the performance of an STR model on o.o.d test data. Previously, test-time adaptation methods are targeted at classification models and do not consider the multi-step reasoning characteristic of sequence learning tasks. In HTTA, we deploy multiple semantically-reserved image augmentation and design a semantically-consistent auxiliary task to present a continual adaptation. Additionally, we construct a new Real-world Subway Kilometer Marker (RSKM) dataset for an out-of-distribution STR practice under dynamic driving scenes. We conduct extensive experiments on RSKM by embedding our HTTA into multiple classical STR methods to show the effectiveness. The experiment results show that our semantically-consistent augmentation and HTTA significantly improve the generalization performance on o.o.d STR practice.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision - 7th Chinese Conference, PRCV 2024, Proceedings
EditorsZhouchen Lin, Hongbin Zha, Ming-Ming Cheng, Ran He, Cheng-Lin Liu, Kurban Ubul, Wushouer Silamu, Jie Zhou
PublisherSpringer Science and Business Media Deutschland GmbH
Pages65-80
Number of pages16
ISBN (Print)9789819784868
DOIs
StatePublished - 2025
Event7th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2024 - Urumqi, China
Duration: 18 Oct 202420 Oct 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15031 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2024
Country/TerritoryChina
CityUrumqi
Period18/10/2420/10/24

Keywords

  • Data Augmentation
  • Driving Scenes
  • Out-of-Distribution
  • Scene Text Recognition

Fingerprint

Dive into the research topics of 'Exploring Out-of-Distribution Scene Text Recognition for Driving Scenes with Hybrid Test-Time Adaptation'. Together they form a unique fingerprint.

Cite this