SCS Undergraduate Thesis Topics

Keith Bare Priya Narasimhan Online Fingerpointing: Just-in-Time Problem Diagnosis for Distributed Systems

Distributed systems are growing both in size and complexity. In the event of a system failure, this makes it increasingly difficult for systems administrators to determine which component failed. Existing tools and algorithms have been designed to diagnose problems, but they rely on offline analysis. This work explores the possibility of online failure diagnosis that operates as the distributed system under observation is running. A framework for online fingerpointing is presented and evaluated.

