Integration and Evaluation of Cache Coherence Protocols for Multiprocessor SoCs
MetadataShow full item record
System-on-a-chip (SoC) designs is characterized by heavy reuse of IP blocks to satisfy specific computing needs for target applications, reduce overall design cost, and expedite time-to-market. To meet their performance goal and cost constraint, SoC designers integrate multiple, sometimes heterogeneous, processor IPs to perform particular functions. This design approach is called Multiprocessor SoC (MPSoC). In this thesis, I investigated generic methodologies for enabling efficient communication among heterogeneous processors and quantified the efficiency of coherence traffic. Hardware techniques for two main MPSoC architectures were studied: Integration of cache coherence protocols for shared-bus-based MPSoCs and Cache coherence support for non-shared-bus-based MPSoCs. In the shared-bus-based MPSoCs, the integration techniques guarantee data consistency among incompatible coherence protocols. An integrated protocol will contain common states from these coherence protocols. A snoop-hit buffer and region-based cache coherence were also proposed to further enhance the coherence performance. For the non-shared-bus-based MPSoCs, bypass and bookkeeping approaches were proposed to maintain coherence in a new cache coherence-enforced memory controller. The simulations based on micro-benchmark and RTOS kernel showed the benefits of my methodologies over a generic software solution. This thesis also evaluated and quantified the efficiency of coherence traffic based on a novel emulation platform using FPGA. The proposed technique can completely isolate the intrinsic delay of the coherence traffic to demonstrate the impact of coherence traffic on system performance. Unlike previous evaluation methods, this technique eliminated non-deterministic factors in measurements such as bus arbitration delay and stall in the pipelined bus. The experimental results showed that the cache-to-cache transfer in the Intel server system is less efficient than the main memory access.