Model, predict, and mitigate scalability bottlenecks for parallel application on many-core processors
MetadataShow full item record
Ensuring the continuous scaling of parallel applications is challenging on many-core processors, due to the complex relationship of available parallelism in application and the limited shared on-chip resources. Two main bottlenecks that limit the scalability of parallel applications are synchronization and memory bandwidth. With this thesis, I proposed MiSAR, a minimalistic synchronization accelerator (MSA) that supports all three commonly used synchronization (locks, barriers, and condition variables), and a novel overflow management unit (OMU) that dynamically manages its (very) limited hardware synchronization resources. The OMU allows safe and efficient dynamic transitions between using hardware (MSA) and software synchronization implementations. Along with MSA, our proposed hardware synchronization accelerator was able to reduce the impact of synchronization latency on the scaling of parallel applications. In this thesis, we also proposed a new performance model that captures program characteristics of multi-threaded applications, allowing it to use few-threaded runs along with small input sets to predict performance of many-threaded runs with large input sets. Our model considers the effect of increasing memory bandwidth demand and workload imbalance, as well as the increase in lock contention. Results show that our model can accurately predict the parallel speedup of an application with increasing thread count and identify the scalability bottlenecks that are limiting the scaling of an application.