Abstract
Code semantic learning serves as the basis of many program analysis tasks. Researchers have paid much effort to build robust and effective code representation models over the years. One line of work focuses on introducing the code structure into the representations. To further improve the robustness of the code representation, approaches based on compiler intermediate representations (IRs) are proposed. However, these IR-based models suffer from heavy computational costs and memory overhead. How to represent program semantics effectively and efficiently still remains a challenge. To this end, we propose EECS, an effective and efficient code semantic representation approach based on compiler IRs and a hybrid attention mechanism. For input representation, to address the unlimited vocabulary size issue in IR, we propose a variable identification strategy to allocate each register variable to a new ID that can represent their relative positions. Besides, we also extract the data flow information among the code blocks. Then we build a hierarchical multi-layer Transformer encoder to capture the data dependency information as well as the code semantics through a hybrid attention mechanism. To enable EECS to learn code semantics and functionality better, we optimize three objectives jointly during the training process. Experimental results on three code semantic understanding tasks show that EECS performs better than the state-of-the-art techniques, demonstrating the remarkable capability of EECS on program semantics understanding.
| Original language | English |
|---|---|
| Article number | 172101 |
| Journal | Science China Information Sciences |
| Volume | 68 |
| Issue number | 7 |
| DOIs | |
| State | Published - Jul 2025 |
Keywords
- artificial intelligence
- code semantic learning
- compiler intermediate representation
- data dependency modeling
- software engineering
Fingerprint
Dive into the research topics of 'Learning to represent code semantics'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver