Compiler Optimizations for Multithreaded Multicore Network Processors
MetadataShow full item record
Network processors are new types of multithreaded multicore processors geared towards achieving both fast processing speed and flexibility of programming. The architecture of network processors considers many special properties for packet processing, including multiple threads, multiple processor cores on the same chip, special functional units, simplified ISA and simplified pipeline, etc. The architectural peculiarities of network processors raise new challenges for compiler design and optimization. Due to very high clocking speeds, the CPU memory gap on such processors is huge, making registers extremely precious. Moreover, the register file is split into two banks, and for any ALU instruction, the two source operands must come from different banks. We present and compare three different approaches to do register allocation and bank assignment. We also address the problem of sharing registers across threads in order to maximize the utilization of hardware resources. The context switches on the IXP network processor only happen when long latency operations are encountered. As a result, context switches are highly frequent. Therefore, the designer of the IXP network processor decided to make context switches extremely lightweight, i.e. only the program counter(PC) is stored together with the context. Since registers are not saved and restored during context switches, it becomes difficult to share registers across threads. For a conventional processor, each thread can assume that it can use the entire register file, because registers are always part of the context. However, with lightweight context switch, each thread must take a separate piece of the register file, making register usage inefficient. Programs executing on network processors typically have runtime constraints. Scheduling of multiple programs sharing a CPU must be orchestrated by the OS and the hardware using certain sharing policies. Real time applications demand a real time aware OS kernel to meet their specified deadlines. However, due to stringent performance requirements on network processors, neither OS nor hardware mechanisms is typically feasible. In this work, we demonstrate that a compiler approach could achieve some of the OS scheduling and real time scheduling functionalities without introducing a hefty overhead.