Efficient customer incident triage via linking with system incidents

  • Jiazhen Gu*
  • , Jiaqi Wen*
  • , Zijian Wang*
  • , Pu Zhao
  • , Chuan Luo
  • , Yu Kang
  • , Yangfan Zhou
  • , Li Yang
  • , Jeffrey Sun
  • , Zhangwei Xu
  • , Bo Qiao
  • , Liqun Li
  • , Qingwei Lin
  • , Dongmei Zhang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In cloud service systems, customers will report the service issues they have encountered to cloud service providers. Despite many issues can be handled by the support team, sometimes the customer issues can not be easily solved, thus raising customer incidents. Quick troubleshooting of a customer incident is critical. To this end, a customer incident should be assigned to its responsible team accurately in a timely manner. Our industrial experiences show that linking customer incidents with detected system incidents can help the customer incident triage. In particular, our empirical study on 7 real cloud service systems shows that with the additional information about the system incidents (i.e., incident reports generated by system monitors), the triage time of customer incidents can be accelerated 13.1× on average. Based on this observation, in this paper, we propose LinkCM, a learning based approach to automatically link customer incidents to monitor reported system incidents. LinkCM incorporates a novel learning-based model that effectively extracts related information from two resources, and a transfer learning strategy is proposed to help LinkCM achieve better performance without huge amount of data. The experimental results indicate that LinkCM is able to achieve accurate link prediction. Furthermore, case studies are presented to demonstrate how LinkCM can help the customer incident triage procedure in real production cloud service systems.

Original languageEnglish
Title of host publicationESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsPrem Devanbu, Myra Cohen, Thomas Zimmermann
PublisherAssociation for Computing Machinery, Inc
Pages1296-1307
Number of pages12
ISBN (Electronic)9781450370431
DOIs
StatePublished - 8 Nov 2020
Externally publishedYes
Event28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020 - Virtual, Online, United States
Duration: 8 Nov 202013 Nov 2020

Publication series

NameESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Conference

Conference28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020
Country/TerritoryUnited States
CityVirtual, Online
Period8/11/2013/11/20

Keywords

  • Cloud Service Systems
  • Customer Issue Triage
  • Transfer Learning

Fingerprint

Dive into the research topics of 'Efficient customer incident triage via linking with system incidents'. Together they form a unique fingerprint.

Cite this