Neural Acceleration for GPU Throughput Processors
MetadataShow full item record
General-purpose computing on graphics processing units (GPGPU) accelerates the execution of diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application characteristic provides an opportunity to improve the performance and efficiency of GPGPU. Recent work has shown significant gains with neural approximate acceleration for CPU workloads. This work studies the effectiveness of neural approximate acceleration for GPU workloads. As applying CPU neural accelerators to GPUs leads to high area overhead, we define a low overhead neurally accelerated architecture for GPGPUs that enables scalable integration of neural acceleration on the large number of GPU cores. We also devise a mechanism that controls the tradeoff between the quality of results and the benefits from neural acceleration. We evaluate this design on a modern GPU architecture using a diverse set of benchmarks. Compared to the baseline GPGPU architecture, the cycle- accurate simulation results show 2.4 average speedup and 2.8 average energy reduction with 10% quality loss across all benchmarks. The quality control mechanism retains 1.9 average speedup and 2.1 energy reduction while reducing the quality degradation to 2.5%. These benefits are achieved by approximately 1.2% area overhead.