Developer-Centric Automated Debugging
MetadataShow full item record
Software debugging is an expensive activity that is responsible for a significant part of software maintenance cost. In particular, locating faulty code (i.e., fault localization) is one of the most challenging parts of software debugging. In the past years, researchers have proposed many techniques that aim at fully automating the task of fault localization. Although these techniques have been shown to be effective in reducing the amount of code developers need to inspect to locate faults, there is growing evidence that they provide developers with limited help in realistic debugging scenarios. I believe that a practical automated debugging technique should have human developers at the center of the debugging process rather than trying to completely replace them. In this dissertation, I present three of my techniques that support developer-centric automated debugging. First, I present ENLIGHTEN, an interactive, feedback-driven fault localization technique. ENLIGHTEN supports and automates developers’ debugging workflow as follows. It 1) uses traditional statistical fault localization (SFL) to formulate an initial hypothesis of where the fault may be; 2) identifies a relevant subset of execution that can help support or refute the formulated hypothesis; 3) presents the developer with a query about the identified execution subset in the form of a correctness question about the input-output relation of the partial execution; 4) refines its hypothesis of the fault by using the developer’s feedback; and 5) repeats these steps until the fault is found. Second, I discuss my work on improving the accuracy of dynamic dependence analysis, which is a powerful tool for developers to investigate program behavior in an interactive debugging setting and a foundation that many automated debugging techniques leverage to model dynamic execution semantics. I present my finding that existing dynamic dependence analysis techniques could miss the cause-effect relations between faults and the observed failures if the faulty program states propagate via incorrect computation of memory addresses. To address this limitation, I define the concept of potential memory-address dependence, which explicitly represents this type of causal relations, and describe an algorithm that computes it. Third, I present TESSERACT, a technique that improves the scalability of dynamic dependency analysis in the context of interactive debugging. Many existing dependency-based debugging techniques are shown to work well on short executions, but fail to scale to larger ones. TESSERACT has the potential to address this limitation by utilizing a record-and-replay system to efficiently recreate the failing execution, break it down into small time slices, and analyze these slices in a parallelized, on-demand fashion.