Dissertation Defense

Automating the Detection and Correction of Failures in Modern Persistent Memory Systems

Ian NealPh.D. Candidate
WHERE:
3725 Beyster BuildingMap
SHARE:

Hybrid Event: Zoom

Abstract:  Writing crash-consistent PM applications is challenging for developers, as untimely program crashes can result in data corruption and loss if the application does not carefully order updates to PM, and testing all possible crashes for data consistency is intractable. Furthermore, crash-consistency bugs are difficult to manually debug and repair, taking weeks or months for a developer to correctly fix.
This dissertation explores software techniques that automate difficult and time-consuming PM development tasks. We study PM system design, bugs, and bugs fixes and observe that we can automatically provide scalable and high-coverage bug detection and correction by approximating the reasoning performed by developers as they develop their applications. Based on this insight, we first explore automated bug detection and correction for PM application bugs caused by the misuse of platform-specific PM primitives. We develop a testing technique that prioritizes testing program paths that heavily modify PM, as these paths are more likely to misuse PM. We implement this technique in Agamotto, which we use to find 84 new bugs while incurring no false positives. We then develop a technique for generating fixes for PM platform-specific bugs that are guaranteed to be correct and implement the technique in a compiler tool, Hippocrates.
Second, this dissertation explores automated bug detection for application-specific PM crash-consistency bugs. We develop a technique that automatically identifies groups of PM program behaviors that are likely to result in the same crash-consistency bugs and only tests one such behavior, which provides high testing accuracy while increasing efficiency by eliminating redundant testing on functionally-similar behaviors. We implement this technique in Squint, a model-checking tool that selectively tests groups of PM program behaviors identified from a dynamic program trace, which we use to find 108 PM crash-consistency bugs. In sum, these tools have been used to find and fix over two hundred PM bugs in real-world PM systems, demonstrating both the need for such tools and the efficacy of the tools presented in this dissertation.

Organizer

CSE Graduate Programs Office

Faculty Host

Prof. Baris Kasikci