Performance debugging support for many-core processors project
MetadataShow full item record
In recent years, the number of cores available on a processor has increased rapidly, while the performance of an individual core has increased much more slowly. As a result, achieving a large performance improvement for applications now requires programmers to leverage the increased core count. This is often a very challenging problem, and many parallel applications end up suffering from performance bugs caused by scalability limiters. These prevent performance from improving as much as it should with more cores. Since we expect core counts to continue increasing for the foreseeable future, addressing scalability limiters is important for developing software that will obtain better performance on future hardware. This project, jointly funded by SRC and NSF, investigated software and hardware mechanisms that automate significant parts of this performance/scalability debugging effort in order to give programmers accurate and actionable feedback about the scaling limiters present in their code. Scalability limiters are mostly caused by resource-related bottlenecks and by insufficient exposed parallelism in the application. The main resource-related bottlenecks are related to excessive cache misses, while insufficient parallelism is mostly manifested as threads waiting to complete a synchronization operation such as a lock (lock contention) or a barrier (load imbalance).