Mitigating interconnect and end host congestion in modern networks
MetadataShow full item record
One of the most critical building blocks of the Internet is the mechanism to mitigate network congestion. While existing congestion control approaches have served their purpose well in the last decades, the last few years saw a significant increase in new applications and user demand, stressing the network infrastructure to the extent that new ways of handling congestion are required. This dissertation identifies the congestion problems caused by the increased scale of the network usage, both in inter-AS connects and on end hosts in data centers, and presents abstractions and frameworks that allow for improved solutions to mitigate congestion. To mitigate inter-AS congestion, we develop Unison, a framework that allows an ISP to jointly optimize its intra-domain routes and inter-domain routes, in collaboration with content providers. The basic idea is to provide the ISP operator and the neighbors of the ISP with an abstraction of the ISP network in the form of a virtual switch (vSwitch). Unison allows the ISP to provide hints to its neighbors, suggesting alternative routes that can improve their performance. We investigate how the vSwitch abstraction can be used to maximize the throughput of the ISP. To mitigate end-host congestion in data center networks, we develop a backpressure mechanism for queuing architecture in congested end hosts to cope with tens of thousands of flows. We show that current end-host mechanisms can lead to high CPU utilization, high tail latency, and low throughput in cases of congestion of egress traffic. We introduce the design, implementation, and evaluation of zero-drop networking (zD) stack, a new architecture for handling congestion of scheduled buffers. Besides queue overflow, another cause of congestion is CPU resource exhaustion. The CPU cost of processing packets in networking stacks, however, has not been fully investigated in the literature. Much of the focus of the community has been on scaling servers in terms of aggregate traffic intensity, but bottlenecks caused by the increasing number of concurrent flows have received little attention. We conduct a comprehensive analysis on the CPU cost of processing packets and identify the root cause that leads to high CPU overhead and degraded performance in terms of throughput and RTT. Our work highlights considerations beyond packets per second for the design of future stacks that scale to millions of flows.