Active management of Cache resources
MetadataShow full item record
This dissertation addresses two sets of challenges facing processor design as the industry enters the deep sub-micron region of semiconductor design. The first set of challenges relates to the memory bottleneck. As the focus shifts from scaling processor frequency to scaling the number of cores, performance growth demands increasing die area. Scaling the number of cores also places a concurrent area demand in the form of larger caches. While on-chip caches occupy 50-60% of area and consume 20-30% of energy expended on-chip, their performance and energy efficiencies are less than 15% and 1% respectively for a range of benchmarks! The second set of challenges is posed by transistor leakage and process variation (inter-die and intra-die) at future technology nodes. Leakage power is anticipated to increase exponentially and sharply lower defect-free yield with successive technology generations. For performance scaling to continue, cache efficiencies have to improve significantly. This thesis proposes and evaluates a broad family of such improvements. This dissertation first contributes a model for cache efficiencies and finds them to be extremely low - performance efficiencies less than 15% and energy efficiencies in the order of 1%. Studying the sources of inefficiency leads to a framework for efficiency improvement based on two interrelated strategies. The approach for improving energy efficiency primarily relies on sizing the cache to match the application memory footprint during a program phase while powering down all remaining cache sets. Importantly, the sized is fully functional with no references to inactive sets. Improving performance efficiency primarily relies on cache shaping, i.e., changing the placement function and thereby the manner in which memory shares the cache. Sizing and shaping are applied at different phase of the design cycle: i) post-manufacturing & offline, ii) at compile-time, and at iii) run-time. This thesis proposes and explores techniques at each phase collectively realizing a repertoire of techniques for future memory system designers. The techniques use a combination of HW-SW techniques and are demonstrated to provide substantive improvements with modest overheads.