TY - JOUR
T1 - Enhancing encrypted traffic analysis via source APIs
T2 - A robust approach for malicious traffic detection
AU - Lin, Wanshuang
AU - Xia, Chunhe
AU - Wang, Tianbo
AU - Liu, Mengyao
AU - Li, Yang
N1 - Publisher Copyright:
© 2025
PY - 2025/9
Y1 - 2025/9
N2 - The widespread adoption of encryption protocols has increased the complexity of detecting malicious Android traffic. By randomizing payload content, encryption obscures semantically explicit features in network traffic, thereby concealing its behavioral intent. Although existing methods mitigate this issue by expanding feature sets or extracting spatiotemporal patterns, they do not fundamentally reconstruct the original payload semantics. In this paper, we propose RATD, a detection model that enhances encrypted traffic representation by introducing semantics of source-APIs. This approach leverages the correlation between system API calls made prior to traffic transmission (referred to as source APIs) and the behavioral intent within encrypted traffic, thereby compensating for semantic loss. First, we construct API-traffic association samples by monitoring network connection APIs. Then, we transform the API sequences into graphs and apply a Graph Convolutional Network (GCN) to learn their structural and semantic representations. These features are fused with corresponding traffic features through a multi-source encoder module. Finally, to address the challenges of limited data availability in real-world deployment, we introduce a representation enhancement module to improve model's robustness in scenarios with missing data. Experimental results show that RATD is significantly better than the state-of-the-art models across multiple datasets. In particular, in scenarios with missing API data, the accuracy of our model decreases by at most 2.9%, showing a stronger environmental adaptability.
AB - The widespread adoption of encryption protocols has increased the complexity of detecting malicious Android traffic. By randomizing payload content, encryption obscures semantically explicit features in network traffic, thereby concealing its behavioral intent. Although existing methods mitigate this issue by expanding feature sets or extracting spatiotemporal patterns, they do not fundamentally reconstruct the original payload semantics. In this paper, we propose RATD, a detection model that enhances encrypted traffic representation by introducing semantics of source-APIs. This approach leverages the correlation between system API calls made prior to traffic transmission (referred to as source APIs) and the behavioral intent within encrypted traffic, thereby compensating for semantic loss. First, we construct API-traffic association samples by monitoring network connection APIs. Then, we transform the API sequences into graphs and apply a Graph Convolutional Network (GCN) to learn their structural and semantic representations. These features are fused with corresponding traffic features through a multi-source encoder module. Finally, to address the challenges of limited data availability in real-world deployment, we introduce a representation enhancement module to improve model's robustness in scenarios with missing data. Experimental results show that RATD is significantly better than the state-of-the-art models across multiple datasets. In particular, in scenarios with missing API data, the accuracy of our model decreases by at most 2.9%, showing a stronger environmental adaptability.
KW - API sequence
KW - Cascaded residual autoencoder
KW - Encrypted traffic detection
KW - HTTPS
UR - https://www.scopus.com/pages/publications/105005939645
U2 - 10.1016/j.cose.2025.104529
DO - 10.1016/j.cose.2025.104529
M3 - 文章
AN - SCOPUS:105005939645
SN - 0167-4048
VL - 156
JO - Computers and Security
JF - Computers and Security
M1 - 104529
ER -