Failure Immunity: Teaching Systems To Defend Against Failures

George Candea – Researcher, Swiss Federal Institute of Technology in Lausanne (EPFL)


Failure immunity is a property by which programs, once afflicted by a given bug, develop resistance against future manifestations of that and similar bugs. We are developing techniques that enable programs to automatically gain such immunity without assistance from programmers or users. In this talk I will focus on Dimmunix, a tool that gives deadlock immunity to both Java and C/C++ programs. We evaluated this approach on several real systems, including MySQL, JBoss, SQLite, Apache ActiveMQ, Limewire, Java JDK and demonstrated effectiveness against real, reported deadlock bugs, while incurring only modest performance overhead and scaling to 1024 threads. While Dimmunix offers users an attractive way to cope with elusive deadlocks, I will also discuss how it helps programmers fix the underlying bugs and how it can be used to vaccinate program populations against discovered bugs. Finally, I will briefly present other types of failure immunity we are currently pursuing.


George Candea has been an assistant professor at EPFL (Swiss Federal Institute of Technology in Lausanne) since Fall 2006, where he established the Dependable Systems Lab. His research focuses on techniques for improving systems’ dependability along with programmers’ productivity; besides failure immunity, current projects include deadlock debuggers, automated reverse-engineering, and automated fault injection. In 2005, George co-founded Aster Data Systems, which is currently a 50-person Silicon Valley company developing a high-performance, large-scale cluster-based database. He was named one of the Top 35 Young Technology Innovators by the MIT Technology Review in 2005. George received his PhD in computer science from Stanford University (2005) and his BS (1997) and MEng (1998) in computer science from the Massachusetts Institute of Technology.