OPTIMIZED SCHEDULING AND RESOURCE ALLOCATION FOR THREAD PARALLEL ARCHITECTURES
Abstract
Performance characteristics of irregular programs on parallel architectures were studied. Results indicated significant overheads of thread divergence and register utilization on the GPU and sub-optimal thread migrations patterns on the EMU. Compiler and architecture optimizations addressing these inefficiencies were designed and
implemented, and performance data were collected. These optimizations included instruction and thread scheduling, as well as resource allocation techniques. Findings showed the potential for significant performance improvements for irregular programs executing on the GPU or the EMU. Further analysis revealed both positive and negative implications of other compiler phases and program characteristics on
the performance impact of the proposed optimizations.