跳到主要导航 跳到搜索 跳到主要内容

KPAMA: A Kubernetes based tool for Mitigating ML system Aging

  • Wenjie Ding
  • , Zhihao Liu
  • , Xuhui Lu
  • , Xiaoting Du
  • , Zheng Zheng*
  • *此作品的通讯作者
  • Beihang University
  • Beijing University of Posts and Telecommunications

科研成果: 期刊稿件文章同行评审

摘要

As machine learning (ML) systems continue to evolve and be applied, their user base and system size also expand. This expansion is particularly evident with the widespread adoption of large language models. Currently, the infrastructure supporting ML systems, such as cloud services and computing hardware, which are increasingly becoming foundational to the ML system environment, is increasingly adopted to support continuous training and inference services. Nevertheless, it has been shown that the increased data volume, complexity of computations, and extended run times challenge the stability of ML systems, efficiency, and availability, precipitating system aging. To address this issue, we develop a novel solution, KPAMA, leveraging Kubernetes, the leading container orchestration platform, to enhance the autoscaling of computing workflows and resources, effectively mitigating system aging. KPAMA employs a hybrid model to predict key aging metrics and uses decision and anti-oscillation algorithms to achieve system resource autoscaling. Our experiments indicate that KPAMA markedly mitigates system aging and enhances task reliability compared to the standard Horizontal Pod Autoscaler and systems without scaling capabilities.

源语言英语
文章编号112389
期刊Journal of Systems and Software
226
DOI
出版状态已出版 - 8月 2025

指纹

探究 'KPAMA: A Kubernetes based tool for Mitigating ML system Aging' 的科研主题。它们共同构成独一无二的指纹。

引用此