Skip to main navigation Skip to search Skip to main content

QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU

Research output: Contribution to journalArticlepeer-review

Abstract

Although GPUs have been indispensable in data centers, meeting the Quality of Service (QoS) under task consolidation on GPU is extremely challenging. Previous works mostly rely on the static task or resource scheduling and cannot handle the QoS violation during runtime. In addition, existing works fail to exploit the computing characteristics of batch tasks, and thus waste the opportunities to reduce power consumption while improving GPU utilization. To address the above problems, we propose a new runtime mechanism SMQoS that can dynamically adjust the resource allocation during runtime to meet the QoS of latency-sensitive (LS) tasks and determine the optimal resource allocation for batch tasks to improve GPU utilization and power efficiency. We implement the proposed mechanism on both simulator (SMQoS) and real GPU hardware (RH-SMQoS). The experimental results show that both SMQoS and RH-SMQoS can achieve better QoS for LS tasks and higher throughput for batch tasks compared to the state-of-the-art works. With hardware extension, the SMQoS can further reduce the power consumption by power gating idle computing resources.

Original languageEnglish
Article number102958
JournalParallel Computing
Volume113
DOIs
StatePublished - Oct 2022

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Dynamic resource management
  • Graphics processing units
  • Power efficiency
  • Quality of service
  • Throughput

Fingerprint

Dive into the research topics of 'QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU'. Together they form a unique fingerprint.

Cite this