Skip to main navigation Skip to search Skip to main content

An automatic quality evaluator for video object segmentation masks

Research output: Contribution to journalArticlepeer-review

Abstract

Video object segmentation (VOS) has been a research hot-spot these years. However, evaluating the performance of different VOS methods requires labor-intensive and time-consuming manually labeled mask annotations, making it hard to validate the algorithm quality in field tests. In this paper, we tackle the problem of automatically measuring the mask quality for video object segmentation tasks without accessing manual annotations. We propose that with an elaborately designed network structure, we can extract quality-sensitive features to predict mask quality scores without ground-truth labels. To achieve this, we train an end-to-end convolutional neural network to capture the quality-sensitive features with both spatial reference and temporal reference. In the proposed Video Object Segmentation Evaluation Network, the VOSE-Net, the corresponding video frame and motion amplitude information are used for spatial and temporal references respectively. Instead of directly concatenating features for mask and references, we extract spatial quality cues with feature correlation, which is more rational and effective in this specific task. Taking in the segmented mask, its corresponding frame image and optical flow map, the VOSE-Net can provide an accurate quality estimation without the need for human intervention. To train and verify the proposed network, we construct a new dataset by using the DAVIS video segmentation benchmark and results from many public video object segmentation algorithms. We also demonstrate the robustness and usefulness of the proposed method on several applications, i.e. proposal selection, parameter optimization, arbitrary video mask evaluation. The experimental results and analysis show that the VOSE-Net is fast, effective and of practical use.

Original languageEnglish
Article number111003
JournalMeasurement: Journal of the International Measurement Confederation
Volume194
DOIs
StatePublished - 15 May 2022

Keywords

  • Deep learning
  • Mask quality estimation
  • Objective quality prediction
  • Video object segmentation

Fingerprint

Dive into the research topics of 'An automatic quality evaluator for video object segmentation masks'. Together they form a unique fingerprint.

Cite this