TY - GEN
T1 - Efficient customer incident triage via linking with system incidents
AU - Gu, Jiazhen
AU - Wen, Jiaqi
AU - Wang, Zijian
AU - Zhao, Pu
AU - Luo, Chuan
AU - Kang, Yu
AU - Zhou, Yangfan
AU - Yang, Li
AU - Sun, Jeffrey
AU - Xu, Zhangwei
AU - Qiao, Bo
AU - Li, Liqun
AU - Lin, Qingwei
AU - Zhang, Dongmei
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/11/8
Y1 - 2020/11/8
N2 - In cloud service systems, customers will report the service issues they have encountered to cloud service providers. Despite many issues can be handled by the support team, sometimes the customer issues can not be easily solved, thus raising customer incidents. Quick troubleshooting of a customer incident is critical. To this end, a customer incident should be assigned to its responsible team accurately in a timely manner. Our industrial experiences show that linking customer incidents with detected system incidents can help the customer incident triage. In particular, our empirical study on 7 real cloud service systems shows that with the additional information about the system incidents (i.e., incident reports generated by system monitors), the triage time of customer incidents can be accelerated 13.1× on average. Based on this observation, in this paper, we propose LinkCM, a learning based approach to automatically link customer incidents to monitor reported system incidents. LinkCM incorporates a novel learning-based model that effectively extracts related information from two resources, and a transfer learning strategy is proposed to help LinkCM achieve better performance without huge amount of data. The experimental results indicate that LinkCM is able to achieve accurate link prediction. Furthermore, case studies are presented to demonstrate how LinkCM can help the customer incident triage procedure in real production cloud service systems.
AB - In cloud service systems, customers will report the service issues they have encountered to cloud service providers. Despite many issues can be handled by the support team, sometimes the customer issues can not be easily solved, thus raising customer incidents. Quick troubleshooting of a customer incident is critical. To this end, a customer incident should be assigned to its responsible team accurately in a timely manner. Our industrial experiences show that linking customer incidents with detected system incidents can help the customer incident triage. In particular, our empirical study on 7 real cloud service systems shows that with the additional information about the system incidents (i.e., incident reports generated by system monitors), the triage time of customer incidents can be accelerated 13.1× on average. Based on this observation, in this paper, we propose LinkCM, a learning based approach to automatically link customer incidents to monitor reported system incidents. LinkCM incorporates a novel learning-based model that effectively extracts related information from two resources, and a transfer learning strategy is proposed to help LinkCM achieve better performance without huge amount of data. The experimental results indicate that LinkCM is able to achieve accurate link prediction. Furthermore, case studies are presented to demonstrate how LinkCM can help the customer incident triage procedure in real production cloud service systems.
KW - Cloud Service Systems
KW - Customer Issue Triage
KW - Transfer Learning
UR - https://www.scopus.com/pages/publications/85097159592
U2 - 10.1145/3368089.3417061
DO - 10.1145/3368089.3417061
M3 - 会议稿件
AN - SCOPUS:85097159592
T3 - ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
SP - 1296
EP - 1307
BT - ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
A2 - Devanbu, Prem
A2 - Cohen, Myra
A2 - Zimmermann, Thomas
PB - Association for Computing Machinery, Inc
T2 - 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020
Y2 - 8 November 2020 through 13 November 2020
ER -