Extracting Data-Level Parallelism from Sequential Programs for SIMD Execution
Baumstark, Lewis Benton, Jr.
MetadataShow full item record
The goal of this research is to retarget multimedia programs written in sequential languages (e.g., C) to architectures with data-parallel execution capabilities. Image processing algorithms often have a high potential for data-level parallelism, but the artifacts imposed by the sequential programming language (e.g., loops, pointer variables) can obscure the parallelism and prohibit generation of efficient parallel code. This research presents a program representation and recognition approach for generating a data parallel program specification from sequential source code and retargeting it to data parallel execution mechanisms. The representation is based on an extension of the multi-dimensional synchronous dataflow model of computation. A partial recognition approach identifies and transforms only those program elements that hinder parallelization while leaving other computational elements intact. This permits flexibility in the types of programs that can be retargeted, while avoiding the complexity of complete program recognition. This representation and recognition process is implemented in the PARRET system, which is used to extract the high-level specification of a set of image-processing programs. From this specification, code is generated for Intels SSE2 instruction set and for the SIMPil processor. The results demonstrate that PARRET can exploit, given sufficient parallel resources, the maximum available parallelism in the retargeted applications. Similarly, the results show PARRET can also exploit parallelism on architectures with hardware-limited parallel resources. It is desirable to estimate potential parallelism before undertaking the expensive process of reverse engineering and retargeting. The goal is to narrow down the search space to a select set of loops which have a high likelihood of being data-parallel. This work also presents a hybrid static/dynamic approach, called DLPEST, for estimating the data-level parallelism in sequential program loops. We demonstrate the correctness of the DLPESTs estimates, show that estimates for programs of 25 to 5000 lines of code can be performed in under 10 minutes and that estimation time scales sub-linearly with input program size.