跳到主要导航 跳到搜索 跳到主要内容

Reliable computing service in massive-scale systems through rapid low-cost failover

  • Renyu Yang*
  • , Yang Zhang
  • , Peter Garraghan
  • , Yihui Feng
  • , Jin Ouyang
  • , Jie Xu
  • , Zhuo Zhang
  • , Chao Li
  • *此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

Large-scale distributed systems deployed as Cloud datacenters are capable of provisioning service to consumers with diverse business requirements. Providers face pressure to provision uninterrupted reliable services while reducing operational costs due to significant software and hardware failures. A widely adopted means to achieve such a goal is using redundant system components to implement user-transparent failover, yet its effectiveness must be balanced carefully without incurring heavy overhead when deployed-an important practical consideration for complex large-scale systems. Failover techniques developed for Cloud systems often suffer serious limitations, including mandatory restart leading to poor cost-effectiveness, as well as solely focusing on crash failures, omitting other important types, such as timing failures and simultaneous failures. This paper addresses these limitations by presenting a new approach to user-transparent failover for massive-scale systems. The approach uses soft-state inference to achieve rapid failure recovery and avoid unnecessary restart, with minimal system resource overhead. It also copes with different failures, including correlated and simultaneous events. The proposed approach was implemented, deployed and evaluated within Fuxi system, the underlying resource management system used within Alibaba Cloud. Results demonstrate that our approach tolerates complex failure scenarios while incurring at worst 228.5 microsecond instance overhead with 1.71 percent additional CPU usage.

源语言英语
页(从-至)969-983
页数15
期刊IEEE Transactions on Services Computing
10
6
DOI
出版状态已出版 - 1 11月 2017

指纹

探究 'Reliable computing service in massive-scale systems through rapid low-cost failover' 的科研主题。它们共同构成独一无二的指纹。

引用此