Show simple item record

dc.contributor.authorWoo, Dong Hyuk
dc.contributor.authorFryman, Joshua Bruce
dc.contributor.authorKnies, Allan D.
dc.contributor.authorEng, Marsha
dc.contributor.authorLee, Hsien-Hsin Sean
dc.date.accessioned2008-02-22T20:17:10Z
dc.date.available2008-02-22T20:17:10Z
dc.date.issued2007-05
dc.identifier.urihttp://hdl.handle.net/1853/20069
dc.description.abstractAs power constraints, complexity and design verification cost make it difficult to improve single-stream performance, parallel computing paradigm is taking a place amongst mainstream high-volume architectures. Most current commercial designs focus on MIMD-style CMPs built with rather complex single cores. While such designs provide a degree of generality, they may not be the most efficient way to build processors for applications with inherently scalable parallelism. These designs have been proven to work well for certain classes of applications such as transaction processing, but they have driven the development of new languages and complex architectural features. Instead of building MIMD-CMPs for all workloads, we propose an alternative parallel on-die many-core architecture called POD based on a large SIMD PE array. POD helps to address the key challenges of on-chip communication bandwidth, area limitations, and energy consumed by routers by factoring out features necessary for MIMD machines and focusing on architectures that match many scalable workloads. In this paper, we evaluate and quantify the advantages of the POD architecture based its ISA on a commercially relevant CISC architecture and show that it can be as efficient as more specialized array processors based on one-off ISAs. Our single-chip POD is capable of best-in-class scalar performance up to 1.5 TFLOPS of single-precision floating-point arithmetic. Our experimental results show that in some application domains, our architecture can achieve nearly linear speedup on a large number of SIMD PEs, and this speedup is much bigger than the maximum speedup that MIMD-CMPs on the same die size can achieve. Furthermore, owing to synchronized computation and communication, it shows that POD can efficiently suppress energy consumption on the novel communication method in our interconnection network.en_US
dc.language.isoen_USen_US
dc.publisherGeorgia Institute of Technologyen_US
dc.relation.ispartofseriesCERCS; GIT-CERCS-07-09en_US
dc.subjectCommunicationen_US
dc.subjectCoresen_US
dc.subjectParallel on-dieen_US
dc.subjectProcessorsen_US
dc.subjectProgrammingen_US
dc.titlePOD: A Parallel-On-Die Architectureen_US
dc.typeTechnical Reporten_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record