Video of lecture
Diagnosing performance problems in distributed systems is very time consuming and difficult. The root cause could be contained in any one of the numerous components or subcomponents of the system, or worse, could be a result of interactions among them. Clearly, new problem diagnosis techniques are needed.
In this talk, I will describe request-flow comparison, a new technique for automatically localizing the sources of performance changes in distributed systems. It uses the key insight that such changes often manifest as mutations in the paths requests take through the distributed system—e.g., the components they visit and the functions they access—or in their timing. Exposing these mutations and showing how they differ from previous behaviour localizes the source of the problem and significantly guides developer effort. I will present case studies of using request-flow comparison to diagnose real, previously unsolved problems in real distributed systems and research on how to best present its results to developers.
Raja Sambasivan is a Postdoctoral Research Fellow at Carnegie Mellon University. His current research focuses on ways to avoid inter-domain routing problems. His PhD research focused on techniques for automating problem diagnosis in distributed systems and cloud-computing environments.