跳到主要导航 跳到搜索 跳到主要内容

Hierarchical and Pairwise Document Embedding for Plagiarism Detection

  • Ruitong Zhang
  • , Lianzhong Liu
  • , Jiaofu Zhang
  • , Zihang Huang
  • , Caiwei Yang
  • , Liangxuan Zhao
  • , Tongge Xu*
  • *此作品的通讯作者
  • Beihang University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The rapid development of the Internet, especially the application of search engines and machine translation, makes it easier to copy texts. Most existing text plagiarism detection methods are not capable of dealing with the increasing number of plagiarism sources and the increasingly ambiguous plagiarized texts. In this paper, we pay attention to the task of large-scale text deduplication, and propose a multi-level distributed text computing model, which improves the checking speed through multi-level latent semantic analysis, and combines BERT to judge plagiarized text more accurately. In order to further verify the model, we also combined the latest fuzzy plagiarism technology to construct a three-level data set. The experimental results show that our model performs well when plagiarism data increases and plagiarism ambiguity increases.

源语言英语
主期刊名Advanced Data Mining and Applications - 16th International Conference, ADMA 2020, Proceedings
编辑Xiaochun Yang, Chang-Dong Wang, Md. Saiful Islam, Zheng Zhang
出版商Springer Science and Business Media Deutschland GmbH
148-156
页数9
ISBN(印刷版)9783030653897
DOI
出版状态已出版 - 2020
活动16th International Conference on Advanced Data Mining and Applications, ADMA 2020 - Foshan, 中国
期限: 12 11月 202014 11月 2020

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12447 LNAI
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议16th International Conference on Advanced Data Mining and Applications, ADMA 2020
国家/地区中国
Foshan
时期12/11/2014/11/20

指纹

探究 'Hierarchical and Pairwise Document Embedding for Plagiarism Detection' 的科研主题。它们共同构成独一无二的指纹。

引用此