Effective fault localization techniques for concurrent software
Park, Sang Min
MetadataShow full item record
Multicore and Internet cloud systems have been widely adopted in recent years and have resulted in the increased development of concurrent programs. However, concurrency bugs are still difficult to test and debug for at least two reasons. Concurrent programs have large interleaving space, and concurrency bugs involve complex interactions among multiple threads. Existing testing solutions for concurrency bugs have focused on exposing concurrency bugs in the large interleaving space, but they often do not provide debugging information for developers to understand the bugs. To address the problem, this thesis proposes techniques that help developers in debugging concurrency bugs, particularly for locating the root causes and for understanding them, and presents a set of empirical user studies that evaluates the techniques. First, this thesis introduces a dynamic fault-localization technique, called Falcon, that locates single-variable concurrency bugs as memory-access patterns. Falcon uses dynamic pattern detection and statistical fault localization to report a ranked list of memory-access patterns for root causes of concurrency bugs. The overall Falcon approach is effective: in an empirical evaluation, we show that Falcon ranks program fragments corresponding to the root-cause of the concurrency bug as "most suspicious" almost always. In principle, such a ranking can save a developer's time by allowing him or her to quickly hone in on the problematic code, rather than having to sort through many reports. Others have shown that single- and multi-variable bugs cover a high fraction of all concurrency bugs that have been documented in a variety of major open-source packages; thus, being able to detect both is important. Because Falcon is limited to detecting single-variable bugs, we extend the Falcon technique to handle both single-variable and multi-variable bugs, using a unified technique, called Unicorn. Unicorn uses online memory monitoring and offline memory pattern combination to handle multi-variable concurrency bugs. The overall Unicorn approach is effective in ranking memory-access patterns for single- and multi-variable concurrency bugs. To further assist developers in understanding concurrency bugs, this thesis presents a fault-explanation technique, called Griffin, that provides more context of the root cause than Unicorn. Griffin reconstructs the root cause of the concurrency bugs by grouping suspicious memory accesses, finding suspicious method locations, and presenting calling stacks along with the buggy interleavings. By providing additional context, the overall Griffin approach can provide more information at a higher-level to the developer, allowing him or her to more readily diagnose complex bugs that may cross file or module boundaries. Finally, this thesis presents a set of empirical user studies that investigates the effectiveness of the presented techniques. In particular, the studies compare the effectiveness between a state-of-the-art debugging technique and our debugging techniques, Unicorn and Griffin. Among our findings, the user study shows that while the techniques are indistinguishable when the fault is relatively simple, Griffin is most effective for more complex faults. This observation further suggests that there may be a need for a spectrum of tools or interfaces that depend on the complexity of the underlying fault or even the background of the user.