Bridging Processor and Memory Performance in ILP Processors via Data-Remapping
Rabbah, Rodric Michel
Palem, Krishna V.
MetadataShow full item record
Current system design trends continue to magnify the disparity between processor and memory performance. Thus, as microprocessors perform increasingly better than the mem-ory systems supporting them, it is ever more important to bridge the performance gap to help translate the promise of Moore s law into overall performance delivered to the end applica-tions. This gap in performance between the processor and the memory is further exacerbated in the context of modern processors with high-levels of instruction level parallelism (ILP), especially for data-intensive applications. In these processors, increased demands for data delivery lead to concomitant needs for higher memory bandwidth and cache sizes. In this paper we provide a fast compile-time data-remapping technique which helps in bridging the gap between the ILP processor and its memory system, by enhancing the spatial locality of data-access. Our strategy is the first automatic approach applicable to pointer-intensive dy-namic applications for which existing optimizations are mostly inadequate. We demonstrate an average performance improvement of 27% for several data-intensive applications. This is attributed to enhanced data locality, resulting in lowered demand on the bandwidth between cache levels, as well as between the cache subsystem and main memory. We also show that with increasing levels of ILP and fixed memory bandwidth, our remapping technique enables very high levels of performance with smaller cache sizes. For example, as much as a factor of 15 reduction in multi-level caches can be tolerated without a loss in performance. Although we use cycle-accurate simulators to detail the benefits of our remapping, we also measure 24% performance improvements for the Intel Pentium II and III processors, and a 9% yield on the Sun UltraSparc-II.