跳到主要导航 跳到搜索 跳到主要内容

Switch-Assistant Loss Recovery for RDMA Transport Control

  • Beihang University
  • Beijing University of Posts and Telecommunications
  • Zhongguancun Laboratory
  • Nanjing University of Aeronautics and Astronautics
  • Tsinghua University

科研成果: 期刊稿件文章同行评审

摘要

RoCEv2 (RDMA over Converged Ethernet version 2) is the canonical method for deploying RDMA in Ethernet-based datacenters. Traditionally, RoCEv2 runs over the lossless network which is in turn achieved by enabling Priority Flow Control (PFC) within the network. However, as the scale of the datacenter increases, PFC’s side effects, such as head-of-line blocking, congestion spreading, and pause frame storm, are amplified. Datacenter operators can no longer tolerate these problems. In hence, they are seeking PFC alternatives for RDMA networks. Rather than aiming at the lossless RDMA network, we instead handle packet loss effectively to support RDMA over Ethernet. In this paper, we propose Switch-assistant Loss Recovery (SLR), a switch building block to enhance RoCEv2’s loss recovery. Specifically, SLR-enabled switches send loss notifications to request fast retransmissions. To cooperate with go-back-N retransmission, SLR generates loss notifications only when expected packets (i.e., in-order packets expected by receivers) are dropped and then filters out unexpected packets, which can avoid timeouts and prevent exacerbating congestion. Further, we adapt SLR to multi-bottleneck scenarios by inferring expected packets among multiple switch views. We implement SLR prototypes on commodity programmable switches. Evaluations show that SLR reduces the 99.9th-percentile FCT slowdown by up to 21.6× compared to PFC and other state-of-the-arts.

源语言英语
页(从-至)1-16
页数16
期刊IEEE/ACM Transactions on Networking
DOI
出版状态已出版 - 2023

指纹

探究 'Switch-Assistant Loss Recovery for RDMA Transport Control' 的科研主题。它们共同构成独一无二的指纹。

引用此