TY - JOUR
T1 - Toward Generating Communication Graph Datasets for Botnet Detection in Autonomous Systems
AU - Yan, Yuhao
AU - Lang, Bo
AU - Meng, Xiaoyuan
AU - Xiao, Nan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Botnet is one of the main threats to cybersecurity because of its concealment and hazardous nature, especially in autonomous systems (ASs), such as campus networks. Graph-based detection methods are attracting increasing attention due to their ability to find and use the topological features of botnets. However, constructing or obtaining a botnet dataset is always difficult, and almost all existing public datasets suffer from extreme imbalances and poor authenticity, which makes training graph-based detection models challenging. To address these problems, we propose a role-based multistage growth method for generating AS botnet datasets, which is scalable and efficient. Our method generates a background communication graph based on complex network theory, models botnet behaviors by building a state machine, and generates the traffic of botnets. The experimental results show that our method can effectively restore the AS communication graph, and the generated datasets can significantly improve the performance of various graph-based detection models. Our generated dataset is available at https://github.com/Yebmoon/Botnet-graph-dataset.
AB - Botnet is one of the main threats to cybersecurity because of its concealment and hazardous nature, especially in autonomous systems (ASs), such as campus networks. Graph-based detection methods are attracting increasing attention due to their ability to find and use the topological features of botnets. However, constructing or obtaining a botnet dataset is always difficult, and almost all existing public datasets suffer from extreme imbalances and poor authenticity, which makes training graph-based detection models challenging. To address these problems, we propose a role-based multistage growth method for generating AS botnet datasets, which is scalable and efficient. Our method generates a background communication graph based on complex network theory, models botnet behaviors by building a state machine, and generates the traffic of botnets. The experimental results show that our method can effectively restore the AS communication graph, and the generated datasets can significantly improve the performance of various graph-based detection models. Our generated dataset is available at https://github.com/Yebmoon/Botnet-graph-dataset.
KW - Botnet detection
KW - complex networks
KW - data augmentation
KW - dataset generation
KW - graph neural network pretraining
UR - https://www.scopus.com/pages/publications/85203635523
U2 - 10.1109/TIFS.2024.3453172
DO - 10.1109/TIFS.2024.3453172
M3 - 文章
AN - SCOPUS:85203635523
SN - 1556-6013
VL - 19
SP - 7908
EP - 7923
JO - IEEE Transactions on Information Forensics and Security
JF - IEEE Transactions on Information Forensics and Security
ER -