跳到主要导航 跳到搜索 跳到主要内容

Past-Future Scheduler for LLM Serving under SLA Guarantees

  • Ruihao Gong
  • , Shihao Bai
  • , Siyu Wu
  • , Yunqian Fan
  • , Zaijun Wang
  • , Xiuhong Li
  • , Hailong Yang*
  • , Xianglong Liu*
  • *此作品的通讯作者
  • Beihang University
  • SenseTime Group Limited
  • Peking University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The exploration and application of Large Language Models (LLMs) is thriving. To reduce deployment costs, continuous batching has become an essential feature in current service frameworks. The effectiveness of continuous batching relies on an accurate estimate of the memory requirements of requests. However, due to the diversity in request output lengths, existing frameworks tend to adopt aggressive or conservative schedulers, which often result in significant overestimation or underestimation of memory consumption. Consequently, they suffer from harmful request evictions or prolonged queuing times, failing to achieve satisfactory throughput under strict Service Level Agreement (SLA) guarantees (a.k.a. goodput), across various LLM application scenarios with differing input-output length distributions. To address this issue, we propose a novel Past-Future scheduler that precisely estimates the peak memory resources required by the running batch via considering the historical distribution of request output lengths and calculating memory occupancy at each future time point. It adapts to applications with all types of input-output length distributions, balancing the trade-off between request queuing and harmful evictions, thereby consistently achieving better goodput. Furthermore, to validate the effectiveness of the proposed scheduler, we developed a high-performance LLM serving framework, LightLLM, that implements the Past-Future scheduler. Compared to existing aggressive or conservative schedulers, LightLLM demonstrates superior goodput, achieving up to 2-3× higher goodput than other schedulers under heavy loads. LightLLM is open source to boost the research in such direction (https://github.com/ModelTC/lightllm).

源语言英语
主期刊名ASPLOS 2025 - Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
出版商Association for Computing Machinery
798-813
页数16
ISBN(电子版)9798400710797
DOI
出版状态已出版 - 30 3月 2025
活动30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2025 - Rotterdam, 荷兰
期限: 30 3月 20253 4月 2025

出版系列

姓名International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
2

会议

会议30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2025
国家/地区荷兰
Rotterdam
时期30/03/253/04/25

指纹

探究 'Past-Future Scheduler for LLM Serving under SLA Guarantees' 的科研主题。它们共同构成独一无二的指纹。

引用此