TY - GEN
T1 - iSMELL
T2 - 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024
AU - Wu, Di
AU - Mu, Fangwen
AU - Shi, Lin
AU - Guo, Zhaoqiang
AU - Liu, Kui
AU - Zhuang, Weiguang
AU - Zhong, Yuqi
AU - Zhang, Li
N1 - Publisher Copyright:
© 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/10/27
Y1 - 2024/10/27
N2 - Detecting and refactoring code smells is challenging, laborious, and sustaining. Although large language models have demonstrated potential in identifying various types of code smells, they also have limitations such as input-output token restrictions, difficulty in accessing repository-level knowledge, and performing dynamic source code analysis. Existing learning-based methods or commercial expert toolsets have advantages in handling complex smells. They can analyze project structures and contextual information in-depth, access global code repositories, and utilize advanced code analysis techniques. However, these toolsets are often designed for specific types and patterns of code smells and can only address fixed smells, lacking flexibility and scalability. To resolve that problem, we propose iSMELL, an ensemble approach that employs various code smell detection toolsets via Mixture of Experts (MoE) architecture for comprehensive code smell detection, and enhances the LLMs with the detection results from expert toolsets for refactoring those identified code smells. First, we train a MoE model that, based on input code vectors, outputs the most suitable expert tool for identifying each type of smell. Then, we select the recommended toolsets for code smell detection and obtain their results. Finally, we equip the prompts with the detection results from the expert toolsets, thereby enhancing the refactoring capability of LLMs for code with existing smells, enabling them to provide different solutions based on the type of smell. We evaluate our approach on detecting and refactoring three classical and complex code smells, i.e., Refused Bequest, God Class, and Feature Envy. The results show that, by adopting seven expert code smell toolsets, iSMELL achieved an average F1 score of 75.17% on code smell detection, outperforming LLMs baselines by an increase of 35.05% in F1 score. We further evaluate the code refactored by the enhanced LLM. The quantitative and human evaluation results show that iSMELL could improve code quality metrics and conduct satisfactory refactoring toward the identified code smells. We believe that our proposed solution could provide new insights into better leveraging LLMs and existing approaches to resolving complex software tasks.
AB - Detecting and refactoring code smells is challenging, laborious, and sustaining. Although large language models have demonstrated potential in identifying various types of code smells, they also have limitations such as input-output token restrictions, difficulty in accessing repository-level knowledge, and performing dynamic source code analysis. Existing learning-based methods or commercial expert toolsets have advantages in handling complex smells. They can analyze project structures and contextual information in-depth, access global code repositories, and utilize advanced code analysis techniques. However, these toolsets are often designed for specific types and patterns of code smells and can only address fixed smells, lacking flexibility and scalability. To resolve that problem, we propose iSMELL, an ensemble approach that employs various code smell detection toolsets via Mixture of Experts (MoE) architecture for comprehensive code smell detection, and enhances the LLMs with the detection results from expert toolsets for refactoring those identified code smells. First, we train a MoE model that, based on input code vectors, outputs the most suitable expert tool for identifying each type of smell. Then, we select the recommended toolsets for code smell detection and obtain their results. Finally, we equip the prompts with the detection results from the expert toolsets, thereby enhancing the refactoring capability of LLMs for code with existing smells, enabling them to provide different solutions based on the type of smell. We evaluate our approach on detecting and refactoring three classical and complex code smells, i.e., Refused Bequest, God Class, and Feature Envy. The results show that, by adopting seven expert code smell toolsets, iSMELL achieved an average F1 score of 75.17% on code smell detection, outperforming LLMs baselines by an increase of 35.05% in F1 score. We further evaluate the code refactored by the enhanced LLM. The quantitative and human evaluation results show that iSMELL could improve code quality metrics and conduct satisfactory refactoring toward the identified code smells. We believe that our proposed solution could provide new insights into better leveraging LLMs and existing approaches to resolving complex software tasks.
UR - https://www.scopus.com/pages/publications/85212444025
U2 - 10.1145/3691620.3695508
DO - 10.1145/3691620.3695508
M3 - 会议稿件
AN - SCOPUS:85212444025
T3 - Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024
SP - 1345
EP - 1357
BT - Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024
PB - Association for Computing Machinery, Inc
Y2 - 28 October 2024 through 1 November 2024
ER -