Scalable network scheduling in software
Said Mohamed Tawfik Issa, Ahmed Mohamed
MetadataShow full item record
Network scheduling determines the relative ordering and priority of different flows or packets with respect to some ranking function that is mandated by a scheduling policy. It is the core component in many recent innovations to optimize network performance and utilization. The increasing scale of cloud applications is driving the need to scale network scheduling to millions of flows and to apply complex policies. This requires software network stacks to handle a large number of flows, ranked according to operator-defined policies, and going over different types of networks. This dissertation describes three network scheduling systems that allow network operator to deploy scheduling policies efficiently in software, taking into account the diversity of the requirements of modern networks. First, we develop a system that enforces packet shaping, the most basic scheduling policy efficiently at end hosts. The challenge of packet shaping at scale is optimizing the CPU and memory overhead of the rate limiting system. We present a framework that scales to tens of thousands of rate limiting policies and flows per server, built from the synthesis of three key ideas: i) a single queue shaper using time as the basis for releasing packets, ii) fine-grained, just-in-time freeing of resources in higher layers coupled to actual packet departures, and iii) one shaper per CPU core, with lock-free coordination. Second, we develop a system that provides efficient and programmable packet schedulers in software. Unlike non-work-conserving shapers, programmable schedulers are challenging to implement efficiently. We overcome this challenge by exploiting underlying features of packet ranking; namely, packet ranks are integers and, at any point in time, fall within a limited range of values. To support programmability, we introduce novel programming abstractions to express scheduling policies that cannot be captured by current, state-of-the-art scheduler programming models. Third, we propose a single congestion control approach that is designed to handle datacenter LAN traffic and WAN traffic simultaneously. This work is motivated by our observation that customization of congestion control algorithms causes the WAN traffic to negatively impact the performance of the LAN traffic. This customization is based on the differences between WAN and LAN characteristics. However, it overlooks that both types of traffic compete at the near-source bottlenecks where the two types of traffic overlap. We propose deploying two interacting congestion control loops: one handles near-source congestion leveraging local network feedback, while the other handles congestion on the rest of the path using conventional LAN and WAN congestion control schemes.