Hardware acceleration for conservative parallel discrete event simulation on multi-core systems
Lynch, Elizabeth Whitaker
MetadataShow full item record
Multi-core architectures are becoming more common and core counts continue to increase. There are six- and eight-core chips currently in production, such as Intel Gulftown, and many-core chips with dozens of cores, such as the Intel Teraflops 80-core chip, are projected in the next five years. However, adding more cores often does not improve the performance of applications. It would be desirable to take advantage of the multi-core environment to speed up parallel discrete event simulation. The current bottleneck for many parallel simulations is time synchronization. This is especially true for simulations of wireless networks and on-chip networks, which have low lookahead. Message passing is also a common simulation bottleneck. In order to address the issue of time synchronization, we have designed hardware at a functional level that performs the time synchronization for parallel discrete event simulation asynchronously and in just a few clock cycles, eliminating the need for global communication with message passing or lock contention for shared memory. This hardware, the Global Synchronization Unit, consists of 3 register files, each the size of the number of cores, and is accessed using 5 new atomic instructions. In order to reduce the simulation overhead from message passing, we have also designed two independent pieces of hardware at a functional level, the Atomic Shared Heap and Atomic Message Passing, which can be used to perform lock-free, zero-copy message passing on a multi-core system. The impact of these specialized hardware units on the performance of parallel discrete event simulation is assessed and compared to traditional shared-memory techniques.