Show simple item record

dc.contributor.authorHong, Sunpyoen_US
dc.date.accessioned2013-01-17T22:01:31Z
dc.date.available2013-01-17T22:01:31Z
dc.date.issued2012-11-12en_US
dc.identifier.urihttp://hdl.handle.net/1853/45922
dc.description.abstractThe objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to investigate and quantify a complex relationship that exists between the thread-level parallelism and memory-level parallelism for an application on a given many-core architecture. Two metrics are proposed: memory warp parallelism (MWP), which refers to the number of overlapping memory accesses per core, and computation warp parallelism (CWP), which characterizes an application type. By using these metrics in addition to the architectural and application parameters, the overall application performance is produced. The model uses statically-available parameters such as instruction-mixture information and input-data size, and the prediction accuracy is 13.3% for the GPU-computing benchmarks. Another important aspect of using many-core architecture is reducing peak power and achieving energy savings. By using the proposed integrated power and performance (IPP) framework, the results showed that different optimization points exist for GPU architecture depending on the application type. The work shows that by activating fewer cores, 10.99% of run-time energy consumption can be saved for the bandwidth-limited benchmarks, and a projection of 25.8% energy savings is predicted when power-gating at core level is employed. Finally, the model is shifted to throughput using OpenCL for targeting more variety of processors. First, multiple outputs relating to performance are predicted, including upper-bound and lower-bound values. Second, by using the model parameters, an application can be categorized into a different category, each with its own suggestions for improving performance and energy efficiency. Third, the bandwidth saturation point accuracy is significantly improved by considering independent memory accesses and updating the performance model. Furthermore, a trade-off analysis using architectural and application parameters is straightforward, which provides more insights to improve energy efficiency. In the future, a computer system will contain hundreds of heterogeneous cores. Hence, it is mandatory that a workload gets scheduled to an efficient core or distributed on both types of cores. A preliminary work by using the analytical model to do scheduling between CPU and GPU is demonstrated in the appendix. Since profiling phase is not required, the kernel code can be transformed to run more efficiently on the specific architecture. Another extension of the work regarding the relationship between the speed-up and energy efficiency is mathematically derived. Finally, future research ideas are presented regarding the usage of the model for programmer, compiler, and runtime for future heterogeneous systems.en_US
dc.publisherGeorgia Institute of Technologyen_US
dc.subjectModelen_US
dc.subjectPoweren_US
dc.subjectEnergyen_US
dc.subjectGPGPUen_US
dc.subjectGPUen_US
dc.subjectAnalytical modelen_US
dc.subjectPerformanceen_US
dc.subject.lcshGraphics processing units
dc.subject.lcshComputer architecture
dc.subject.lcshEnergy consumption
dc.titleModeling performance and power for energy-efficient GPGPU computingen_US
dc.typeDissertationen_US
dc.description.degreePhDen_US
dc.contributor.departmentElectrical and Computer Engineeringen_US
dc.description.advisorCommittee Chair: Hyesoon Kim; Committee Member: Milos Prvulovic; Committee Member: Moinuddin Qureshi; Committee Member: Richard Vuduc; Committee Member: Sudhakar Yalamanchilien_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record