Taming latency in data center applications
Kumar, Mohan Kumar
MetadataShow full item record
A new breed of low-latency I/O devices, such as the emerging remote memory access and the high-speed Ethernet NICs, are becoming ubiquitous in current data centers. For example, big data center operators such as Amazon, Facebook, Google, and Microsoft are already migrating their networks to 100G. However, the overhead incurred by the system software, such as protocol stack and synchronous operations, is dominant with these faster I/O devices. This dissertation approaches the above problem by redesigning a protocol stack to provide an interface for the latency-sensitive operation, and redesigning synchronous operation such as TLB shootdown and consensus in the operating systems and distributed systems respectively. First, the dissertation presents an extensible protocol stack, XPS to address the software overhead incurred in protocol stacks such as TCP and UDP. XPS provides the abstractions to allow an application-defined, latency-sensitive operation to run immediately after the protocol processing (called the fast path) in various protocol stacks: in a commodity OS protocol stack (e.g., Linux), a user space protocol stack (e.g., mTCP), as well as recent smart NICs. For all other operations, XPS retains the popular, well-understood socket interface. XPS ’ approach is practical: rather than proposing a new OS or removing the socket interface completely, our goal is to provide stack extensions for latency-sensitive operations and use the existing socket layer for all other operations. Second, the dissertation provides a lazy, asynchronous mechanism to address the system software overhead incurred due to a synchronous operation TLB shootdown. The key idea of the lazy shootdown mechanism, called LATR , is to use lazy memory reclamation and lazy page table unmap to perform an asynchronous TLB shootdown. By handling TLB shootdowns in a lazy fashion, LATR can eliminate the performance overheads associated with IPI mechanisms as well as the waiting time for acknowledgments from remote cores. By proposing an asynchronous mechanism, LATR provides an eventually consistent solution. Finally, the dissertation untangles the logically coupled consensus mechanism from the application which alleviates the overhead incurred by consensus algorithms such as Multi Paxos/Viewstamp Replication(VR). By physical isolation, DYAD eliminates the consensus component from competing for system resources with the application which improves the application performance. To provide physical isolation, DYAD defines the abstraction needed from the SmartNIC and the operations performed on the application running on the host processor. With the resulting consensus mechanism, the host processor handles only the client requests on the host processor in the normal case and the disappropriate messages needed for consensus is handled on the SmartNIC.