skip to main content
research-article
Open access

Fault-tolerant algorithms for tick-generation in asynchronous logic: Robust pulse generation

Published: 08 September 2014 Publication History

Abstract

Today’s hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduces transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with similar issues for decades. However, neither the basic abstractions nor the complexity of contemporary fault-tolerant distributed algorithms match the peculiarities of hardware implementations.
This article is intended to be part of an attempt striving to bridge over this gap between theory and practice for the clock synchronization problem. Solving this task sufficiently well will allow to build an ultra-robust high-precision clocking system for hardware designs like systems-on-chips in critical applications. As our first building block, we describe and prove correct a novel distributed, Byzantine fault-tolerant, probabilistically self-stabilizing pulse synchronization protocol, called FATAL, that can be implemented using standard asynchronous digital logic: Correct FATAL nodes are guaranteed to generate pulses (i.e., unnumbered clock ticks) in a synchronized way, despite a certain fraction of nodes being faulty. FATAL uses randomization only during stabilization and, despite the strict limitations introduced by hardware designs, offers optimal resilience and smaller complexity than all existing protocols. Finally, we show how to leverage FATAL to efficiently generate synchronized, self-stabilizing, high-frequency clocks.

References

[1]
Ajtai, M., Komlós, J., and Szemerédi, E. 1983. An O(n log n) sorting network. In Proceedings of the 15th Symposium on Theory of Computing (STOC). 1--9.
[2]
Attiya, H. and Censor, K. 2008. Lower bounds for randomized consensus under a weak adversary. In Proceedings of the 27th Symposium on Principles of Distributed Computing (PODC). 315--324.
[3]
Ben-Or, M., Dolev, D., and Hoch, E. N. 2008. Fast self-stabilizing byzantine tolerant digital clock synchronization. In Proceedings of the 27th Symposium on Principles of Distributed Computing (PODC). 385--394.
[4]
Bhamidipati, R., Zaidi, A., Makineni, S., Low, K., Chen, R., Liu, K.-Y., and Dalgrehn, J. 2002. Challenges and methodologies for implementing high-performance network processors. Intel Technol. J. 6, 3, 83--92.
[5]
Biaz, S. and Welch, J. 2001. Closed form bounds for clock synchronization under simple uncertainty assumptions. Inf. Process. Lett. 80, 3, 151--157.
[6]
Chapiro, D. M. 1984. Globally-asynchronous locally-synchronous systems. Ph.D. thesis, Stanford University.
[7]
Constantinescu, C. 2003. Trends and challenges in VLSI circuit reliability. IEEE Micro 23, 4, 14--19.
[8]
Daliot, A. and Dolev, D. 2006. Self-stabilizing byzantine pulse synchronization. Comput. Res. Repository abs/cs/0608092.
[9]
Daliot, A., Dolev, D., and Parnas, H. 2003. Self-stabilizing pulse synchronization inspired by biological pacemaker networks. In Proceedings of the SSS. 32--48.
[10]
Dike, C. and Burton, E. 1999. Miller and noise effects in a synchronizing flip-flop. IEEE J. Solid-State Circ. SC-34, 6, 849--855.
[11]
Dolev, D. 1982. The Byzantine generals strike again. J. Algor. 3, 14--30.
[12]
Dolev, D., Fuegger, M., Lenzen, C., Posch, M., Schmid, U., and Steininger, A. 2014. Rigorously modeling self-stabilizing fault-tolerant circuits: An ultra-robust clocking scheme for systems-on-chip. J. Comput. Syst. Sci. 80, 4, 860--900.
[13]
Dolev, D. and Hoch, E. 2007. Byzantine self-stabilizing pulse in a bounded-delay model. In Proceedings of the 9th Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS). Vol. 4280, 350--362.
[14]
Dolev, D., Lynch, N. A., Pinter, S. S., Stark, E. W., and Weihl, W. E. 1986. Reaching approximate agreement in the presence of faults. J. ACM 33, 499--516.
[15]
Dolev, S. and Welch, J. L. 2004. Self-stabilizing clock synchronization in the presence of Byzantine faults. J. ACM 51, 5, 780--799.
[16]
Fischer, M. J. and Lynch, N. A. 1982. A lower bound for the time to assure interactive consistency. Inf. Process. Lett. 14, 183--186.
[17]
Fischer, M. J., Lynch, N. A., and Merritt, M. 1985. Easy impossibility proofs for distributed consensus problems. In Proceedings of the 4th Conference of Principles of Distributed Computing (PODC). 59--70.
[18]
Friedman, E. G. 2001. Clock distribution networks in synchronous digital integrated circuits. Proc. IEEE 89, 5, 665--692.
[19]
Fuchs, G., Függer, M., and Steininger, A. 2009. On the threat of metastability in an asynchronous fault-tolerant clock generation scheme. In Proceedings of the 15th Symposium on Asynchronous Circuits and Systems (ASYNC) (Chapel Hill, N. Carolina). 127--136.
[20]
Függer, M. 2010. Analysis of on-chip fault-tolerant distributed algorithms. Ph.D. thesis, Technische Universität Wien, Institut für Technische Informatik.
[21]
Függer, M., Dielacher, A., and Schmid, U. 2010. How to speed-up fault-tolerant clock generation in VLSI systems-on-chip via pipelining. In Proceedings of the 8th European Dependable Computing Conference (EDCC). 230--239.
[22]
Függer, M. and Schmid, U. 2012. Reconciling fault-tolerant distributed computing and systems-on-chip. Distrib. Comput. 24, 6, 323--355.
[23]
Függer, M., Schmid, U., Fuchs, G., and Kempf, G. 2006. Fault-tolerant distributed clock generation in VLSI systems-on-chip. In Proceedings of the 6th European Dependable Computing Conference (EDCC). 87--96.
[24]
Gadlage, M. J., Eaton, P. H., Benedetto, J. M., Carts, M., Zhu, V., and Turflinger, T. L. 2006. Digital device error rate trends in advanced CMOS technologies. IEEE Trans. Nuclear Sci. 53, 6, 3466--3471.
[25]
Hoch, E., Dolev, D., and Daliot, A. 2006. Self-stabilizing byzantine digital clock synchronization. In Proceedings of the 8th Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS’06). Vol. 4280, 350--362.
[26]
International Technology Roadmap for Semiconductors 2012. International Technology Roadmap for Semiconductors. http://www.itrs.net.
[27]
Kinniment, D. J., Bystrov, A., and Yakovlev, A. V. 2002. Synchronization circuit performance. IEEE J. Solid-State Circ. SC-37, 2, 202--209.
[28]
Kuhn, F., Lenzen, C., Locher, T., and Oshman, R. 2010. Optimal gradient clock synchronization in dynamic networks. In Proceedings of the 29th Symposium on Principles of Distributed Computing (PODC).
[29]
Lenzen, C., Locher, T., and Wattenhofer, R. 2010. Tight bounds for clock synchronization. In J. ACM 57, 2.
[30]
Lundelius, J. and Lynch, N. 1984. An upper and lower bound for clock synchronization. Inf. Control 62, 2--3, 190--204.
[31]
Malekpour, M. 2006. A Byzantine-fault tolerant self-stabilizing protocol for distributed clock synchronization systems. In Proceedings of the 9th Conference on Stabilization, Safety, and Security of Distributed Systems (SSS). 411--427.
[32]
Malekpour, M. 2009. A self-stabilizing Byzantine-fault-tolerant clock synchronization protocol. Tech. rep., TM-2009-215758, NASA.
[33]
Marino, L. 1981. General theory of metastable operation. IEEE Trans. Comput. C-30, 2, 107--115.
[34]
Metra, C., Francescantonio, S., and Mak, T. 2004. Implications of clock distribution faults and issues with screening them during manufacturing testing. IEEE Trans. Comput. 53, 5, 531--546.
[35]
Pease, M., Shostak, R., and Lamport, L. 1980. Reaching agreement in the presence of faults. J. ACM 27, 228--234.
[36]
Polzer, T., Handl, T., and Steininger, A. 2009. A metastability-free multi-synchronous communication scheme for SoCs. In Proceedings of the 11th Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS). 578--592.
[37]
Portmann, C. L. and Meng, T. H. Y. 1995. Supply noise and CMOS synchronization errors. IEEE J. Solid-State Circ. SC-30, 9, 1015--1017.
[38]
Restle, P., McNamara, T., Webber, D., Camporese, P., Eng, K., Jenkins, K., Allen, D., Rohn, M., Quaranta, M., Boerstler, D., Alpert, C., Carter, C., Bailey, R., Petrovick, J., Krauter, B., and McCredie, B. 2001. A clock distribution network for microprocessors. IEEE J. Solid-State Circ. 36, 5, 792--799.
[39]
Semiat, Y. and Ginosar, R. 2003. Timing measurements of synchronization circuits. In Proceedings of the 9th Symposium on Asynchronous Circuits and Systems (ASYNC). 68--77.
[40]
Srikanth, T. K. and Toueg, S. 1987. Optimal clock synchronization. J. ACM 34, 3, 626--645.
[41]
Sundaresan, K., Allen, P., and Ayazi, F. 2006. Process and temperature compensation in a 7-MHz CMOS clock oscillator. IEEE J. Solid-State Circ. 41, 2, 433--442.
[42]
Teehan, P., Greenstreet, M., and Lemieux, G. 2007. A survey and taxonomy of GALS design styles. IEEE Des. Test Comput. 24, 5, 418--428.
[43]
Widder, J. and Schmid, U. 2009. The theta-model: Achieving synchrony without clocks. Distrib. Comput. 22, 1, 29--47.

Cited By

View all

Index Terms

  1. Fault-tolerant algorithms for tick-generation in asynchronous logic: Robust pulse generation

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Journal of the ACM
        Journal of the ACM  Volume 61, Issue 5
        August 2014
        171 pages
        ISSN:0004-5411
        EISSN:1557-735X
        DOI:10.1145/2668245
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 08 September 2014
        Accepted: 01 December 2013
        Revised: 01 May 2013
        Received: 01 March 2012
        Published in JACM Volume 61, Issue 5

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Byzantine faults
        2. Clock synchronization
        3. linear convergence time
        4. metastability
        5. multi-synchronous GALS
        6. self-stabilization

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)76
        • Downloads (Last 6 weeks)9
        Reflects downloads up to 28 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Full Access

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media