Bridging Processor and Memory Performance in ILP Processors via Data-Remapping

Show full item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/1853/6568

Title: Bridging Processor and Memory Performance in ILP Processors via Data-Remapping
Author: Rabbah, Rodric Michel ; Palem, Krishna V.
Abstract: Current system design trends continue to magnify the disparity between processor and memory performance. Thus, as microprocessors perform increasingly better than the mem-ory systems supporting them, it is ever more important to bridge the performance gap to help translate the promise of Moore s law into overall performance delivered to the end applica-tions. This gap in performance between the processor and the memory is further exacerbated in the context of modern processors with high-levels of instruction level parallelism (ILP), especially for data-intensive applications. In these processors, increased demands for data delivery lead to concomitant needs for higher memory bandwidth and cache sizes. In this paper we provide a fast compile-time data-remapping technique which helps in bridging the gap between the ILP processor and its memory system, by enhancing the spatial locality of data-access. Our strategy is the first automatic approach applicable to pointer-intensive dy-namic applications for which existing optimizations are mostly inadequate. We demonstrate an average performance improvement of 27% for several data-intensive applications. This is attributed to enhanced data locality, resulting in lowered demand on the bandwidth between cache levels, as well as between the cache subsystem and main memory. We also show that with increasing levels of ILP and fixed memory bandwidth, our remapping technique enables very high levels of performance with smaller cache sizes. For example, as much as a factor of 15 reduction in multi-level caches can be tolerated without a loss in performance. Although we use cycle-accurate simulators to detail the benefits of our remapping, we also measure 24% performance improvements for the Intel Pentium II and III processors, and a 9% yield on the Sun UltraSparc-II.
Type: Technical Report
URI: http://hdl.handle.net/1853/6568
Date: 2001
Relation: CC Technical Report; GIT-CC-01-14
Publisher: Georgia Institute of Technology
Subject: Performance
Instruction level parallelism
Data locality

All materials in SMARTech are protected under U.S. Copyright Law and all rights are reserved, unless otherwise specifically indicated on or in the materials.

Files in this item

Files Size Format View
GIT-CC-01-14.pdf 318.4Kb PDF View/ Open

This item appears in the following Collection(s)

Show full item record