TY - GEN
T1 - EFwork
T2 - 22nd IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2023
AU - Chen, Chen
AU - Xia, Chunhe
AU - Wang, Tianbo
AU - Lin, Wanshuang
AU - Zhao, Yuan
AU - Li, Yang
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Malware Knowledge Graph (MKG) serves as an essential auxiliary tool for malware detection and analysis. However, the construction of MKG faces several challenges, such as inadequate dataset quality, incomplete entity feature extraction, and the limitations imposed by deep learning techniques. To address these issues, we present an Efficient Framework for constructing a malware knowledge graph (EFwork). Firstly, we build a High-Quality Dataset (HQDataset) and introduce a metric for data quality assessment based on knowledge coverage, timeliness, and density. Subsequently, we develop a Named Entity Recognition (NER) model that extracts character features, part-of-speech features, and word features from the data, leveraging deep learning models to identify malware-related entities. Finally, we implement a rule-based filtering mechanism, utilizing a comprehensive Rule Database to eliminate entities that do not conform to predefined rules. Experimental result shows that our HQDataset demonstrates superior data quality when compared to other open-source datasets. Furthermore, our NER model combined with our Rule Database outperforms existing models, achieving improvements of 0.67%, 0.74%, and 0.69% in Precision, Recall, and F1-Score, respectively.
AB - Malware Knowledge Graph (MKG) serves as an essential auxiliary tool for malware detection and analysis. However, the construction of MKG faces several challenges, such as inadequate dataset quality, incomplete entity feature extraction, and the limitations imposed by deep learning techniques. To address these issues, we present an Efficient Framework for constructing a malware knowledge graph (EFwork). Firstly, we build a High-Quality Dataset (HQDataset) and introduce a metric for data quality assessment based on knowledge coverage, timeliness, and density. Subsequently, we develop a Named Entity Recognition (NER) model that extracts character features, part-of-speech features, and word features from the data, leveraging deep learning models to identify malware-related entities. Finally, we implement a rule-based filtering mechanism, utilizing a comprehensive Rule Database to eliminate entities that do not conform to predefined rules. Experimental result shows that our HQDataset demonstrates superior data quality when compared to other open-source datasets. Furthermore, our NER model combined with our Rule Database outperforms existing models, achieving improvements of 0.67%, 0.74%, and 0.69% in Precision, Recall, and F1-Score, respectively.
KW - Dataset Quality
KW - Knowledge Graph
KW - Malware
KW - Named Entity Recognition
UR - https://www.scopus.com/pages/publications/85195482833
U2 - 10.1109/TrustCom60117.2023.00171
DO - 10.1109/TrustCom60117.2023.00171
M3 - 会议稿件
AN - SCOPUS:85195482833
T3 - Proceedings - 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom/BigDataSE/CSE/EUC/iSCI 2023
SP - 1258
EP - 1265
BT - Proceedings - 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom/BigDataSE/CSE/EUC/iSCI 2023
A2 - Hu, Jia
A2 - Min, Geyong
A2 - Wang, Guojun
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 November 2023 through 3 November 2023
ER -