TY - GEN
T1 - Detecting Duplicate Questions in Stack Overflow via Deep Learning Approaches
AU - Wang, Liting
AU - Zhang, Li
AU - Jiang, Jing
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Stack Overflow is a popular question and answer website based on the software programming. Different users often ask the same questions in different ways, resulting in a large number of duplicate questions in Stack Overflow. Generally, the users with high reputation manually analyze and mark duplicate questions, which is time consuming and low efficiency. Therefore, the automatic duplicate question detection approach is demanded. We first investigate the application of deep learning models to software engineering task. Then, three deep learning models (i.e., CNN, RNN and LSTM) are applied to demonstrate whether they are effective to duplicate question detection task in Stack Overflow. In this paper, we explore three deep learning approaches DQ-CNN, DQ-RNN and DQ-LSTM based on CNN, RNN and LSTM to detect duplicate questions. The effectiveness of DQ-CNN, DQ-RNN and DQ-LSTM is evaluated by six different question groups. The experimental results show that DQ-LSTM outperforms DupPredictor, Dupe, DupePredictorRep-T and DupeRep in terms of recall-rate@5, recall-rate@10 and recall-rate@20 except for Ruby question group.
AB - Stack Overflow is a popular question and answer website based on the software programming. Different users often ask the same questions in different ways, resulting in a large number of duplicate questions in Stack Overflow. Generally, the users with high reputation manually analyze and mark duplicate questions, which is time consuming and low efficiency. Therefore, the automatic duplicate question detection approach is demanded. We first investigate the application of deep learning models to software engineering task. Then, three deep learning models (i.e., CNN, RNN and LSTM) are applied to demonstrate whether they are effective to duplicate question detection task in Stack Overflow. In this paper, we explore three deep learning approaches DQ-CNN, DQ-RNN and DQ-LSTM based on CNN, RNN and LSTM to detect duplicate questions. The effectiveness of DQ-CNN, DQ-RNN and DQ-LSTM is evaluated by six different question groups. The experimental results show that DQ-LSTM outperforms DupPredictor, Dupe, DupePredictorRep-T and DupeRep in terms of recall-rate@5, recall-rate@10 and recall-rate@20 except for Ruby question group.
KW - CNN
KW - LSTM
KW - RNN
KW - Stack Overflow
KW - duplicate questions
UR - https://www.scopus.com/pages/publications/85078147343
U2 - 10.1109/APSEC48747.2019.00074
DO - 10.1109/APSEC48747.2019.00074
M3 - 会议稿件
AN - SCOPUS:85078147343
T3 - Proceedings - Asia-Pacific Software Engineering Conference, APSEC
SP - 506
EP - 513
BT - Proceedings - 2019 26th Asia-Pacific Software Engineering Conference, APSEC 2019
PB - IEEE Computer Society
T2 - 26th Asia-Pacific Software Engineering Conference, APSEC 2019
Y2 - 2 December 2019 through 5 December 2019
ER -