TY - GEN
T1 - Improving grammatical error correction with machine translation pairs
AU - Zhou, Wangchunshu
AU - Ge, Tao
AU - Mu, Chang
AU - Xu, Ke
AU - Wei, Furu
AU - Zhou, Ming
N1 - Publisher Copyright:
© 2020 Association for Computational Linguistics
PY - 2020
Y1 - 2020
N2 - We propose a novel data synthesis method to generate diverse error-corrected sentence pairs for improving grammatical error correction, which is based on a pair of machine translation models (e.g., Chinese-English) of different qualities (i.e., poor and good). The poor translation model can resemble the ESL (En-gush as a second language) learner and tends to generate translations of low quality in terms of fluency and grammaticahty, while the good translation model generally generates fluent and grammatically correct translations. With the pair of translation models, we can generate unlimited numbers of poor-good English sentence pairs from text in the source language (e.g., Chinese) of the translators. Our approach can generate various error-corrected patterns and nicely complement the other data synthesis approaches for GEC. Experimental results demonstrate the data generated by our approach can effectively help a GEC model to improve the performance and approaching the state-of-the-art single-model performance in BEA-19 and CoNLL-14 benchmark datasets.
AB - We propose a novel data synthesis method to generate diverse error-corrected sentence pairs for improving grammatical error correction, which is based on a pair of machine translation models (e.g., Chinese-English) of different qualities (i.e., poor and good). The poor translation model can resemble the ESL (En-gush as a second language) learner and tends to generate translations of low quality in terms of fluency and grammaticahty, while the good translation model generally generates fluent and grammatically correct translations. With the pair of translation models, we can generate unlimited numbers of poor-good English sentence pairs from text in the source language (e.g., Chinese) of the translators. Our approach can generate various error-corrected patterns and nicely complement the other data synthesis approaches for GEC. Experimental results demonstrate the data generated by our approach can effectively help a GEC model to improve the performance and approaching the state-of-the-art single-model performance in BEA-19 and CoNLL-14 benchmark datasets.
UR - https://www.scopus.com/pages/publications/85106151136
M3 - 会议稿件
AN - SCOPUS:85106151136
T3 - Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020
SP - 318
EP - 328
BT - Findings of the Association for Computational Linguistics Findings of ACL
PB - Association for Computational Linguistics (ACL)
T2 - Findings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020
Y2 - 16 November 2020 through 20 November 2020
ER -