VTLayout: Fusion of Visual and Text Features for Document Layout Analysis

  • Shoubin Li*
  • , Xuyan Ma
  • , Shuaiqun Pan
  • , Jun Hu
  • , Lin Shi
  • , Qing Wang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Documents often contain complex physical structures, which make the Document Layout Analysis (DLA) task challenging. As a pre-processing step for content extraction, DLA has the potential to capture rich information in historical or scientific documents on a large scale. Although many deep-learning-based methods from computer vision have already achieved excellent performance in detecting Figure from documents, they are still unsatisfactory in recognizing the List, Table, Text and Title category blocks in DLA. This paper proposes a VTLayout model fusing the documents’ deep visual, shallow visual, and text features to localize and identify different category blocks. The model mainly includes two stages, and the three feature extractors are built in the second stage. In the first stage, the Cascade Mask R-CNN model is applied directly to localize all category blocks of the documents. In the second stage, the deep visual, shallow visual, and text features are extracted for fusion to identify the category blocks of documents. As a result, we strengthen the classification power of different category blocks based on the existing localization technique. The experimental results show that the identification capability of the VTLayout is superior to the most advanced method of DLA based on the PubLayNet dataset, and the F1 score is as high as 0.9599.

Original languageEnglish
Title of host publicationPRICAI 2021
Subtitle of host publicationTrends in Artificial Intelligence - 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Proceedings
EditorsDuc Nghia Pham, Thanaruk Theeramunkong, Guido Governatori, Fenrong Liu
PublisherSpringer Science and Business Media Deutschland GmbH
Pages308-322
Number of pages15
ISBN (Print)9783030891879
DOIs
StatePublished - 2021
Externally publishedYes
Event18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021 - Virtual, Online
Duration: 8 Nov 202112 Nov 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13031 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021
CityVirtual, Online
Period8/11/2112/11/21

Keywords

  • Document layout analysis
  • Fusion of visual and text
  • PubLayNet
  • VTLayout

Fingerprint

Dive into the research topics of 'VTLayout: Fusion of Visual and Text Features for Document Layout Analysis'. Together they form a unique fingerprint.

Cite this