Performance understanding and tuning of iterative computation using profiling techniques
Ozarde, Sarang Anil
MetadataShow full item record
Most applications spend a significant amount of time in the iterative parts of a computation. They typically iterate over the same set of operations with different values. These values either depend on inputs or values calculated in previous iterations. While loops capture some iterative behavior, in many cases such a behavior is spread over whole program sometimes through recursion. Understanding iterative behavior of the computation can be very useful to fine-tune it. In this thesis, we present a profiling based framework to understand and improve performance of iterative computation. We capture the state of iterations in two aspects 1) Algorithmic State 2) Program State. We demonstrate the applicability of our framework for capturing algorithmic state by applying it to the SAT Solvers and program state by applying it to a variety of benchmarks exhibiting completely parallelizable loops. Further, we show that such a performance characterization can be successfully used to improve the performance of the underlying application. Many high performance combinatorial optimization applications involve SAT solving. A variety of SAT solvers have been developed that employ different data structures and different propagation methods for converging on a fixed point for generating a satisfiable solution. The performance debugging and tuning of SAT solvers to a given domain is an important problem encountered in practice. Unfortunately not much work has been done to quantify the iterative efficiency of SAT solvers. In this work, we develop quantifiable measures for calculating convergence efficiency of SAT solvers. Here, we capture the Algorithmic state of the application by tracking the assignment of variables for each iteration. A compact representation of profile data is developed to track the rate of progress and convergence. The novelty of this approach is that it is independent of the specific strategies used in individual solvers, yet it gives key insights into the "progress" and "convergence behavior" of the solver in terms of a specific implementation at hand. An analysis tool is written to interpret the profile data and extract values of the following metrics such as: average convergence rate, efficiency of iteration and variable stabilization. Finally, using this system we produce a study of 4 well known SAT solvers to compare their iterative efficiency using random as well as industrial benchmarks. Using the framework, iterative inefficiencies that lead to slow convergence are identified. We also show how to fine-tune the solvers by adapting the key steps. We also show that the similar profile data representation can be easily applied to loops, in general, to capture their program state. One of the key attributes of the program state inside loops is their branch behavior. We demonstrate the applicability of the framework by profiling completely parallelizable loops (no cross-iteration dependence) and by storing the branching behavior of each iteration. The branch behavior across a group of iterations is important in devising the thread warps from parallel loops for efficient execution on GPUs. We show how some loops can be effectively parallelized on GPUs using this information.