TY - GEN
T1 - Understanding Language Selection in Multi-language Software Projects on GitHub
AU - Li, Wen
AU - Meng, Na
AU - Li, Li
AU - Cai, Haipeng
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/5
Y1 - 2021/5
N2 - There are hundreds of programming languages available for software development today. As a result, modern software is increasingly developed in multiple languages. In this context, there is an urgent need for automated tools for multi-language software quality assurance. To that end, it is useful to first understand how languages are chosen by developers in multi-language software projects. One intuitive perspective towards the understanding would be to explore the potential functionality relevance of those choices. With a plethora of publicly hosted multi-language software projects available on GitHub, we were able to obtain thousands of popular, relevant repositories across 10 years from 2010 to 2019 to enable the exploration. We start by estimating the functionality domain of each project through topic modeling, followed by studying the statistical correlation between these domains and language selection over all the sample projects through association mining. We proceed with an evolutionary characterization of these projects to provide a longitudinal view of how the association has changed over the years. Our findings offer useful insights into the rationale behind developers' choices of language combinations in multi-language software construction.
AB - There are hundreds of programming languages available for software development today. As a result, modern software is increasingly developed in multiple languages. In this context, there is an urgent need for automated tools for multi-language software quality assurance. To that end, it is useful to first understand how languages are chosen by developers in multi-language software projects. One intuitive perspective towards the understanding would be to explore the potential functionality relevance of those choices. With a plethora of publicly hosted multi-language software projects available on GitHub, we were able to obtain thousands of popular, relevant repositories across 10 years from 2010 to 2019 to enable the exploration. We start by estimating the functionality domain of each project through topic modeling, followed by studying the statistical correlation between these domains and language selection over all the sample projects through association mining. We proceed with an evolutionary characterization of these projects to provide a longitudinal view of how the association has changed over the years. Our findings offer useful insights into the rationale behind developers' choices of language combinations in multi-language software construction.
KW - evolution
KW - functionality relevance
KW - language selection
KW - Multi-language software
UR - https://www.scopus.com/pages/publications/85115721694
U2 - 10.1109/ICSE-Companion52605.2021.00119
DO - 10.1109/ICSE-Companion52605.2021.00119
M3 - 会议稿件
AN - SCOPUS:85115721694
T3 - Proceedings - International Conference on Software Engineering
SP - 256
EP - 257
BT - Proceedings - 2021 IEEE/ACM 43rd International Conference on Software Engineering
PB - IEEE Computer Society
T2 - 43rd IEEE/ACM International Conference on Software Engineering: Companion Proceedings, ICSE-Companion 2021
Y2 - 25 May 2021 through 28 May 2021
ER -