Fully Distributed Register Files for Heterogeneous Clustered Microarchitectures
MetadataShow full item record
Conventional processor design utilizes a central register file and a bypass network to deliver operands to and from functional units, which cannot scale to a large number of functional units. As more functional units are integrated into a processor, the number of ports on a register file grows linearly while area, delay, and energy consumption grow even more rapidly. Physical properties of a bypass network scale in a similar manner. In this dissertation, a fully distributed register file organization is presented to overcome this limitation by relying on small register files with fewer ports and localized operand bypasses. Unlike other clustered microarchitectures, each cluster features a small single-issue functional unit coupled with a small local register file. Several clusters are used, and each of them can be different. All register files are connected through a register transfer network that supports multicast communications. Techniques to support distributed register file operations are presented for both dynamically and statically scheduled processors. These include the eager and multicast register transfer mechanisms in the dynamic approach and the global data routing with multicasting algorithm in the static approach. Although this organizaiton requires additional cycles to execute a program, it is compensated by significant savings obtained through smaller area, faster operand access time, and lower energy consumption. With faster operating frequency and more efficient hardware implementation, overall performance can be improved. Additionally, the fully distributed register file organization is applied to an ILP-SIMD processing element, which is the major building block of a massively parallel media processor array. The results show reduction in die area, which can be utilized to implement additional processing elements. Consequently, performance is improved through a higher degree of data parallelism through a larger processor array. In summary, the fully distributed register file architecture permits future processors to scale to a large number of functional units. This is especially desirable in high-throughput processors such as wide-issue processors and multithreaded processors. Moreover, localized communication is highly desirable in the transition to future deep submicron technologies since long wire is a critical issue in processes with extremely small feature sizes.