Chainsaw Before Scalpel: Dependency-Based Pre-processing for Program Reduction
Abstract
Program reduction techniques, which aim to minimize the size of a program, have many applications, including software debloating, debugging, and optimization in general. Therefore, these techniques have been extensively studied for decades. Past work in this area has typically focused on either performing larger, more effective edits (e.g., Delta Debugging, Hierarchical Delta Debugging) or reducing the search space based on a language grammar (e.g., Perses, C-Reduce). Most of these techniques had a primary goal of minimal output size, with reduction speed only as a secondary goal.
We propose Chainsaw, a novel approach that improves existing techniques by offloading a subset of the reduction to a pre-processing step. Since Chainsaw does not need to be as thorough as existing reducers, this creates an opportunity to take a new approach which can benefit overall end-to-end performance. Our key insight is that in practical application, a considerable amount of input code is not needed, and dependency analysis enables effective and fast identification of this removable code. This dependency analysis is both general, thus easily applicable to different languages, and inexpensive, thus amenable to a speedy pre-processing step. Such analysis can enable the higher-fidelity techniques previously developed to skip a significant quantity of work and produce better results more quickly.
We also present a prototype tool based on our approach. Our tool finds unused sections of code by analyzing the dependencies between items in the input text and is straightforward to implement. We leverage existing analysis tooling via the Language Server Protocol to easily identify dependencies. Our initial results are promising and show that our approach is extremely fast and can yield up to twofold end-to-end speed improvement when used as a pre-processor with existing state-of-the-art techniques.