Fault tolerance in computing

Prof. Lorenzo Strigini
Centre for Software Reliability, City University London

16 hours, 4 credits

October 5 - October 8, 2010

Dipartimento di Ingegneria dell'Informazione: Elettronica, Informatica, Telecomunicazioni, Largo Lucio Lazzarino, meeting room

Contacts: Prof. Cinzia Bernardeschi

   

Aims

Fault tolerance, that is, clever use of redundancy, is one of the organising principles for achieving dependability and resilience in all systems. Fault tolerance techniques are well established in some areas of computing, and many off-the-shelf building blocks routinely include some fault tolerance mechanism. Yet, the philosophy of fault tolerance and the knowledge of its design patterns and tricks are not widespread among those who could take advantage of it, especially in the design of applications and of complex hardware-software-human systems. While specific technical communities (e.g., in various safety-critical applications of embedded computers) have consolidated techniques and practices for redundant design, diversity and so on, attempts to improve these practices or to apply the same principles outside these specialised communities often lead to controversy (e.g., in the security community) arising from a lack of a common language to deal with the basic issues in fault tolerance.

These lectures aim to: