Skip to main navigation Skip to search Skip to main content

Empirical Study of Unsupervised Pre-Training in CNN and Transformer Based Visual Tracking

  • Beihang University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep learning-based visual object tracking has seen the emergence of CNN-based and Transformer-based algorithms built upon the Siamese-based pipeline to pursue robustness and accuracy. However, the performance gap between them requires high-quality and large-scale labeled data for sufficient training. In this work, we design an unsupervised pre-training scheme based on data augmentation to reduce the dependence on expensive labeled data. The core step is the object localization pretext task, which randomly crops the object and pastes it onto several background images. Moreover, we apply the method to both CNN-based and Transformer-based visual trackers. Extensive experiments on public datasets demonstrate that our method outperforms prevailing unsupervised trackers on large-scale benchmarks such as LaSOT and TrackingNet. Additionally, a simple strategy of freezing the CNN backbone during Transformer-based pre-training proves to be effective.

Original languageEnglish
Title of host publication2023 5th International Conference on Artificial Intelligence and Computer Applications, ICAICA 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages291-295
Number of pages5
ISBN (Electronic)9798350323313
DOIs
StatePublished - 2023
Event2023 5th International Conference on Artificial Intelligence and Computer Applications, ICAICA 2023 - Dalian, China
Duration: 28 Nov 202330 Nov 2023

Publication series

Name2023 5th International Conference on Artificial Intelligence and Computer Applications, ICAICA 2023

Conference

Conference2023 5th International Conference on Artificial Intelligence and Computer Applications, ICAICA 2023
Country/TerritoryChina
CityDalian
Period28/11/2330/11/23

Keywords

  • CNN
  • Transformer
  • unsupervised
  • visual tracking

Fingerprint

Dive into the research topics of 'Empirical Study of Unsupervised Pre-Training in CNN and Transformer Based Visual Tracking'. Together they form a unique fingerprint.

Cite this