TY - GEN
T1 - LARQ
T2 - 9th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2020
AU - Zhou, Huiyang
AU - Liu, Haoyan
AU - Yan, Zhao
AU - Cao, Yunbo
AU - Li, Zhoujun
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Taking advantage of the rapid growth of community platforms, such as Yahoo Answers, Quora, etc., Community Question Answering (CQA) systems are developed to retrieve semantically equivalent questions when users raise a new query. A typical CQA system mainly consists of two key components, a retrieval model and a ranking model, to search for similar questions and select the most related, respectively. In this paper, we propose LARQ, Learning to Ask and Rewrite Questions, which is a novel sentence-level data augmentation method. Different from common lexical-level data augmentation progresses, we take advantage of the Question Generation (QG) model to obtain more accurate, diverse, and semantically-rich query examples. Since the queries differ greatly in a low-resource code-start scenario, incorporating the QG model as an augmentation to the indexed collection significantly improves the response rate of CQA systems. We incorporate LARQ in an online CQA system and the Bank Question (BQ) Corpus to evaluate the enhancements for both the retrieval process and the ranking model. Extensive experimental results show that the LARQ enhanced model significantly outperforms single BERT and XGBoost models, as well as a widely-used QG model (NQG).
AB - Taking advantage of the rapid growth of community platforms, such as Yahoo Answers, Quora, etc., Community Question Answering (CQA) systems are developed to retrieve semantically equivalent questions when users raise a new query. A typical CQA system mainly consists of two key components, a retrieval model and a ranking model, to search for similar questions and select the most related, respectively. In this paper, we propose LARQ, Learning to Ask and Rewrite Questions, which is a novel sentence-level data augmentation method. Different from common lexical-level data augmentation progresses, we take advantage of the Question Generation (QG) model to obtain more accurate, diverse, and semantically-rich query examples. Since the queries differ greatly in a low-resource code-start scenario, incorporating the QG model as an augmentation to the indexed collection significantly improves the response rate of CQA systems. We incorporate LARQ in an online CQA system and the Bank Question (BQ) Corpus to evaluate the enhancements for both the retrieval process and the ranking model. Extensive experimental results show that the LARQ enhanced model significantly outperforms single BERT and XGBoost models, as well as a widely-used QG model (NQG).
KW - Community Question Answering
KW - Data augmentation
KW - Question generation
UR - https://www.scopus.com/pages/publications/85093110279
U2 - 10.1007/978-3-030-60457-8_26
DO - 10.1007/978-3-030-60457-8_26
M3 - 会议稿件
AN - SCOPUS:85093110279
SN - 9783030604561
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 318
EP - 330
BT - Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Proceedings
A2 - Zhu, Xiaodan
A2 - Zhang, Min
A2 - Hong, Yu
A2 - He, Ruifang
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 14 October 2020 through 18 October 2020
ER -