A virtualized quality of service packet scheduler accelerator
Chuang, Kangtao Kendall
MetadataShow full item record
Resource virtualization is emerging as a technology to enable the management and sharing of hardware resources including multiple core processors and accelerators such as Digital Signal Processors (DSP), Graphics Processing Units (GPU), and Field Programmable Gate Arrays (FPGA). Accelerators present unique problems for virtualization and sharing due to their specialized architectures and interaction modes. This thesis explores and proposes solutions for the virtualized operation of high performance, quality of service (QoS) packet scheduling accelerators. It specifically concentrates on challenges to meet 10Gbps Ethernet wire speeds. The packet scheduling accelerator is realized in a FPGA and implements the Sharestreams-V architecture. ShareStreams-V implements the Dynamic Window-Constrained Scheduler (DWCS) algorithm, and virtualizes the previous ShareStreams architecture. The original ShareStreams architecture, implemented on Xilinx Virtex-I and Virtex-II FPGAs, was able to schedule 128 streams at 10Gbps Ethernet throughput for 1500-byte packets. Sharestreams-V provides both hardware and software extensions to enable a single implementation to host isolated, independent virtual schedulers. Four methods for virtualization of the packet scheduler accelerator are presented: coarse- and fine-grained temporal partitioning, spatial partitioning, and dynamic spatial partitioning. In addition to increasing the utilization of the scheduler, the decision throughput of the physical scheduler can be increased when sharing the physical scheduler across multiple virtual schedulers among multiple processes. This leads to the hypothesis for this work: Virtualization of a quality of service packet scheduler accelerator through dynamic spatial partitioning is an effective and efficient approach to the accelerator virtualization supporting scalable decision throughput across multiple processes. ShareStreams-V was synthesized targeting a Xilinx Virtex-4 FPGA. While sharing among four processes, designs that supported up to 16, 32, and 64 total streams are able to reach 10Gbps Ethernet scheduling throughput for 64-byte packets. When sharing among 32 processes, a scheduler supporting 64 total streams was able to reach the same throughput. An access API presents the virtual scheduler abstraction to individual processes in order to allocate, deallocate, update and control virtual the scheduler allocated to a process. Practically, the bottleneck for the test system is the software to hardware interface. Effective future implementations are anticipated to use a tightly-coupled host CPU to accelerator interconnect.