Dependable Computing and

Fault Tolerance

The IEEE Technical Committee on Dependable Computing and Fault Tolerance
IFIP Working Group 10.4 on Dependable Computing and Fault Tolerance

Dependable Computing and Fault Tolerance News

Submission Guidelines

  • E-mail submissions with a specific request for inclusion to Chuck Weinstock
  • We now accept formatted entries. Please keep HTML tags to a minimum.
  • We may not be able to post your submission for at least a week, so please plan accordingly.

close

Submitting Items for Publication

To submit an item for publication on the FTTC mailing list simply send it to fttc@dependability.org. Submissions will be moderated but, unless it is rejected, your submission will be sent to the list within a day or so.

2014 Laprie Award Winners

The IFIP 10.4 working group on Dependable Computing created the award in honor of the late Jean-Claude Laprie in 2011. It recognizes outstanding papers that have significantly influenced the theory and/or practice of Dependable Computing. For 2014, the award committee decided to recognize three seminal papers in the award’s impact categories:

B. Randell, "System Structure for Software Fault Tolerance", IEEE Transactions on Software Engineering, vol.SE-1, no.1, 1975, pp 220-232.

J.H. Wensley, L. Lamport, J. Goldberg, M.W. Green, K.N. Levitt, P.M. Melliar-Smith, R.E. Shostak, C.B. Weinstock, "SIFT: The Design and Analysis of a Fault-Tolerant Computer for Aircraft Control", Proceedings of the IEEE, vol.66, no.10, 1978, pp.1240-1255.

H. Kopetz, G. Bauer, "The Time-Triggered Architecture", Proceedings of the IEEE, vol.91, no.1, 2003, pp. 112-126.

Citations

Randell's System Structure for Software Fault Tolerance paper laid the foundations for several decades of research into computing systems capable of tolerating residual design faults in software. Until then, fault-tolerant computing had only been concerned with physical faults affecting computer hardware. This paper introduced the concept of redundancy of design (which we now call design diversity or design dissimilarity) in which multiple software components of independent design operate as a redundant set, in a way analogous to standby sparing in hardware. The concepts of recovery blocks, of checkpointing and recovery, and the domino effect, all of which became commonplace terms in decades of research on fault-tolerance, were introduced in this paper.

Wensley, Lamport, Goldberg, Green, Levitt, Melliar-Smith, Shostak and Weinstock pioneered the notion of Software-Implemented Fault Tolerance in their famous paper SIFT: The Design and Analysis of a Fault-Tolerant Computer for Aircraft Control. The SIFT system made breakthroughs in fundamental theory and algorithms for achieving reliable distributed system operation in the presence of Byzantine failure modes, specifically focusing on the key problems of clock synchronization and consensus. The team developed and demonstrated the first software-based implementation of a fault-tolerant computer using these algorithms, and were among the first to create extensive analytical proofs of correctness of their algorithms. The impact of this work goes far beyond this implementation in that its groundbreaking conceptual framework spawned an entire new area of distributed systems theory and underlies many existing fault-tolerant computer designs.

Kopetz and Bauer's paper on the Time Triggered Architecture described a design pattern for dependable real-time computing that has had an outstanding impact on industry. Research prototypes developed at the Vienna University of Technology were refined into commercial products by a spin-off company that now supplies hardware and software products to major actors in the aerospace, automotive, railway, robotics and electrical energy industries. Indeed, time-triggered systems are the most commonly-used fault-tolerance approach in current critical real-time system architectures. Instantiations of the approach are notably deployed in the Airbus A380 and the Boeing 787.