TWT: Table with Written Text for Controlled Data-to-Text Generation

  • Tongliang Li
  • , Lei Fang
  • , Jian Guang Lou
  • , Zhoujun Li*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large pre-trained neural models have recently shown remarkable progress in text generation. In this paper, we propose to generate text conditioned on the structured data (table) and a prefix (the written text) by leveraging the pretrained models. We present a new data-to-text dataset, Table with Written Text (TWT), by repurposing two existing datasets: ToTTo and TabFact. TWT contains both factual and logical statements that are faithful to the structured data, aiming to serve as a useful benchmark for controlled text generation. Compared with existing data-to-text task settings, TWT is more intuitive, the prefix (usually provided by the user) controls the topic of the generated text. Existing methods usually output hallucinated text that is not faithful on TWT. Therefore, we design a novel approach with table-aware attention visibility and copy mechanism over the table. Experimental results show that our approach outperforms state-of-the-art methods under both automatic and human evaluation metrics.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics, Findings of ACL
Subtitle of host publicationEMNLP 2021
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-Tau Yih
PublisherAssociation for Computational Linguistics (ACL)
Pages1244-1254
Number of pages11
ISBN (Electronic)9781955917100
DOIs
StatePublished - 2021
Event2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 - Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021

Publication series

NameFindings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

Conference

Conference2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period7/11/2111/11/21

Fingerprint

Dive into the research topics of 'TWT: Table with Written Text for Controlled Data-to-Text Generation'. Together they form a unique fingerprint.

Cite this