TY - GEN
T1 - Data Coverage for Guided Fuzzing
AU - Wang, Mingzhe
AU - Liang, Jie
AU - Zhou, Chijin
AU - Wu, Zhiyong
AU - Fu, Jingzhou
AU - Su, Zhuo
AU - Liao, Qing
AU - Gu, Bin
AU - Wu, Bodong
AU - Jiang, Yu
N1 - Publisher Copyright:
© USENIX Security Symposium 2024.All rights reserved.
PY - 2024
Y1 - 2024
N2 - Code coverage is crucial for fuzzing. It helps fuzzers identify areas of a program that have not been explored, which are often the most likely to contain bugs. However, code coverage only reflects a small part of a program's structure. Many crucial program constructs, such as constraints, automata, and Turing-complete domain-specific languages, are embedded in a program as constant data. Since this data cannot be effectively reflected by code coverage, it remains a major challenge for modern fuzzing practices. To address this challenge, we propose data coverage for guided fuzzing. The idea is to detect novel constant data references and maximize their coverage. However, the widespread use of constant data can significantly impact fuzzing throughput if not handled carefully. To overcome this issue, we optimize for real-world fuzzing practices by classifying data access according to semantics and designing customized collection strategies. We also develop novel storage and utilization techniques for improved fuzzing efficiency. Finally, we enhance libFuzzer with data coverage and submit it to Google's FuzzBench for evaluation. Our approach outperforms many state-of-the-art fuzzers and achieves the best coverage score in the experiment. Furthermore, we have discovered 28 previously-unknown bugs on OSS-Fuzz projects that were well-fuzzed using code coverage.
AB - Code coverage is crucial for fuzzing. It helps fuzzers identify areas of a program that have not been explored, which are often the most likely to contain bugs. However, code coverage only reflects a small part of a program's structure. Many crucial program constructs, such as constraints, automata, and Turing-complete domain-specific languages, are embedded in a program as constant data. Since this data cannot be effectively reflected by code coverage, it remains a major challenge for modern fuzzing practices. To address this challenge, we propose data coverage for guided fuzzing. The idea is to detect novel constant data references and maximize their coverage. However, the widespread use of constant data can significantly impact fuzzing throughput if not handled carefully. To overcome this issue, we optimize for real-world fuzzing practices by classifying data access according to semantics and designing customized collection strategies. We also develop novel storage and utilization techniques for improved fuzzing efficiency. Finally, we enhance libFuzzer with data coverage and submit it to Google's FuzzBench for evaluation. Our approach outperforms many state-of-the-art fuzzers and achieves the best coverage score in the experiment. Furthermore, we have discovered 28 previously-unknown bugs on OSS-Fuzz projects that were well-fuzzed using code coverage.
UR - https://www.scopus.com/pages/publications/85204976970
M3 - 会议稿件
AN - SCOPUS:85204976970
T3 - Proceedings of the 33rd USENIX Security Symposium
SP - 2511
EP - 2526
BT - Proceedings of the 33rd USENIX Security Symposium
PB - USENIX Association
T2 - 33rd USENIX Security Symposium, USENIX Security 2024
Y2 - 14 August 2024 through 16 August 2024
ER -