Evaluating Scalability of Multi-threaded Applications on a Many-core Platform
Abstract
Multicore processors have been effective in scaling application
performance by dividing computation among multiple
threads running in parallel. However, application performance
does not necessarily improve as more cores are
added. Application performance can be limited due to multiple
bottlenecks including contention for shared resources
such as caches and memory. In this paper, we perform a scalability analysis of parallel
applications on a 64-threaded Intel Nehalem-EX based
system. We find that applications which scale well on small
number of cores, exhibit poor scalability on large number
of cores. Using hardware performance counters, we show
that many performance limited applications are limited by
memory bandwidth on manycore platforms and exhibit improved
scalability when provisioned with higher memory
bandwidth. By regulating the number of threads used and
applying dynamic voltage and frequency scaling for memory
bandwidth limited benchmarks, significant energy savings
can be achieved.
Collections
- CERCS Technical Reports [193]