A Framework for Data Prefetching using Off-line Training of Markovian Predictors
Wong, Weng Fai
Palem, Krishna V.
MetadataShow full item record
An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching solutions ranging from pure software approach by inserting prefetch instructions through program analysis to purely hardware mechanisms have been proposed. The degrees of success of those techniques are dependent on the nature of the applications. The need for innovative approach is rapidly growing with the introduction of applications such as object-oriented applications that show dynamically changing memory access behavior. In this paper, we propose a novel framework for the use of data prefetchers that are trained off-line. In particular, we propose two techniques for building small prediction tables off-line and the hardware support needed to deploy them at runtime. Our first technique is an adaptation of the Hidden Markov Model that has been used successfully in many diverse areas including molecular biology, speech, fingerprint and a wide range of recognition problems to find hidden patterns. Our second proposed technique is called the Window Markov Predictor, which seeks to identify relationships between miss addresses within a fixed window. Sample traces of applications are fed into these sophisticated off-line learning schemes to find hidden memory access patterns and prediction models are constructed. Once built, the predictor models are loaded into a data prefetching unit in the CPU at the appropriate point during the runtime to drive the prefetching. We will propose a general architecture for such a process and report on the results of the experiments we performed, comparing them against other hardware prefetching schemes. On average by using table size of about 8KB size, we were able to achieve prediction accuracy of about 68% through our own proposed method and performance was boosted about 37% on average on the benchmarks we tested. Furthermore, we believe our proposed framework is amenable to other predictors and can be done as a phase of the profiling-optimizing-compiler.