Statistical causal analysis for fault localization
Baah, George Kofi
MetadataShow full item record
The ubiquitous nature of software demands that software is released without faults. However, software developers inadvertently introduce faults into software during development. To remove the faults in software, one of the tasks developers perform is debugging. However, debugging is a difficult, tedious, and time-consuming process. Several semi-automated techniques have been developed to reduce the burden on the developer during debugging. These techniques consist of experimental, statistical, and program-structure based techniques. Most of the debugging techniques address the part of the debugging process that relates to finding the location of the fault, which is referred to as fault localization. The current fault-localization techniques have several limitations. Some of the limitations of the techniques include (1) problems with program semantics, (2) the requirement for automated oracles, which in practice are difficult if not impossible to develop, and (3) the lack of theoretical basis for addressing the fault-localization problem. The thesis of this dissertation is that statistical causal analysis combined with program analysis is a feasible and effective approach to finding the causes of software failures. The overall goal of this research is to significantly extend the state of the art in fault localization. To extend the state-of-the-art, a novel probabilistic model that combines program-analysis information with statistical information in a principled manner is developed. The model known as the probabilistic program dependence graph (PPDG) is applied to the fault-localization problem. The insights gained from applying the PPDG to fault localization fuels the development of a novel theoretical framework for fault localization based on established causal inference methodology. The development of the framework enables current statistical fault-localization metrics to be analyzed from a causal perspective. The analysis of the metrics show that the metrics are related to each other thereby allowing the unification of the metrics. Also, the analysis of metrics from a causal perspective reveal that the current statistical techniques do not find the causes of program failures instead the techniques find the program elements most associated with failures. However, the fault-localization problem is a causal problem and statistical association does not imply causation. Several empirical studies are conducted on several software subjects and the results (1) confirm our analytical results, (2) demonstrate the efficacy of our causal technique for fault localization. The results demonstrate the research in this dissertation significantly improves on the state-of-the-art in fault localization.