Improving the Reliability of Commodity Operating Systems
Date and Time
Tuesday, April 26, 2005 - 4:00pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Michael Swift, from Washington
Host
Kai Li
Despite decades of research in fault tolerance, commodity
operating systems, such as Windows and Linux, continue to crash. In this talk, I will describe a new reliability subsystem for operating systems that prevents the most common cause of crashes, device driver failures,
without requiring changes to drivers themselves. To date, the subsystem has been used in Linux to prevent system crashes in the presence of driver failures, recover failed drivers transparently to the OS and applications, and update drivers "on the fly" without requiring a system
reboot after installation. Measurements show that the system is extremely effective at protecting the OS from driver failures, while imposing little runtime overhead.