skip to main content
10.1145/3299874.3317986acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

Crash Skipping: A Minimal-Cost Framework for Efficient Error Recovery in Approximate Computing Environments

Published: 13 May 2019 Publication History

Abstract

We present a lightweight technique to minimize error recovery costs in approximate computing environments. We take advantage of the key observation that if an application crashes in a "non-critical" region of its execution, then skipping the crash and allowing the execution to continue oftentimes results in "acceptable" output, due to the inherent fault-tolerance of approximate applications. By skipping application crashes, the program is given a chance to recover from an error on its own, without expending computing power towards error recovery. The system-level support required to implement our Crash Skipping technique imposes negligible overhead. Experimental results from representative approximate applications demonstrate that our technique is effective, resulting in successful error recovery for 56% of application crash cases on average, with a maximum recovery rate of 81%. By combining our technique with application restart, we obtain ~33% improvement in performance/energy consumption compared to recovering from crashes by restarting alone. This benefit is comparable to what can be achieved using aggressive checkpointing techniques, but without the significant costs in system design and complexity that such techniques impose.

References

[1]
W. Baek, T. M. Chilimbi, "Green: A framework for supporting energy-conscious programming using controlled approximation", Proc. PLDI, pp. 198--209, 2010.
[2]
C. Bienia, et al., "The PARSEC benchmark suite: Characterization and architectural implications," Proc. PACT, 2008.
[3]
L. Chakrapani, et al., "Ultraefficient (embedded) SoC architectures based on probabilistic CMOS (PCMOS) technology", Proc. DATE, pp. 1--6, 2006.
[4]
S. Che et al., "Rodinia: A benchmark suite for heterogeneous computing," Proc. IISWC, pp. 44--54, 2009.
[5]
V. K. Chippa, et al., "Analysis and characterization of inherent application resilience for approximate computing," Proc. DAC, pp. 1--9, 2013.
[6]
H. Cho, L. Leem and S. Mitra, "ERSA: Error Resilient System Architecture for Probabilistic Applications," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 4, pp. 546--558, April 2012.
[7]
H. Cho, et al., "Quantitative evaluation of soft error injection techniques for robust system design," Proc. DAC, pp. 1--10 2013.
[8]
M. Dimitrov, H. Zhou, "Unified Architectural Support for Soft- Error Protection or Software Bug Detection", Proc. PACT, 2007.
[9]
H. Esmaeilzadeh, et al., "Neural acceleration for general-purpose approximate programs", Proc. MICRO, pp. 449--460, 2012.
[10]
S. Ghosh, S. Bhunia, and K. Roy, "CRISTA: A new paradigm for low-power variation-tolerant and adaptive circuit synthesis using critical path isolation", IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 26, no. 11, pp. 1947--1956, Nov. 2007.
[11]
O. Goloubeva, et al., "Soft- Error Detection Using Control Flow Assertions", Proc. Intl. Symp. On Defect and Fault Tolerance in VLSI Systems, 2003.
[12]
J. Han, M. Orshansky, "Approximate computing: An emerging paradigm for energy-efficient design", Proc. of IEEE European Test Symposium, 2013.
[13]
S. K. S. Hari, et al., "mSWAT: Low-cost Hardware Fault Detection and Diagnosis for Multicore Systems", Proc. MICRO, 2009.
[14]
J.L. Henning, "SPEC CPU2006 benchmark descriptions," 2006.
[15]
T. Z. Islam, et al., "A Machine Learning Framework for Performance Coverage Analysis of Proxy Applications," SC16: Int'l Conference for High Performance Computing, Networking, Storage and Analysis, pp. 538--549, 2016.
[16]
A. B. Kahng, et al., "Designing a processor from the ground up to allow voltage/reliability tradeoffs", Proc. HPCA, 2010.
[17]
X. Li and D. Yeung, "Application-Level Correctness and its Impact on Fault Tolerance," Proc. HPCA, pp. 181--192, 2007.
[18]
T. Li, R. Ragel and S. Parameswaran, "Reli: Hardware/software Checkpoint and Recovery scheme for embedded processors," Proc. DATE, pp. 875--880, 2012.
[19]
S. Liu, et al., "Flikker: Saving dram refresh-power through critical data partitioning", SIGPLAN Not., vol. 46, no. 3, pp. 213--224, Mar. 2011.
[20]
C.K. Luk et al., "PIN: Building customized program analysis tools with dynamic instrumentation," ACM SIGPLAN Notices, pp. 190--200, 2005.
[21]
J. Meng, S. Chakradhar, A. Raghunathan, "Best-effort parallel execution framework for recognition and mining applications", Proc. IEEE Int. Symp. Parallel Distributed Process., pp. 1--12, 2009.
[22]
K. Pattabiraman, et al., "Dynamic Derivation of Application-Specific Error Detectors and their Implementation in Hardware", Proc. European Dependable Computing Conference, 2006.
[23]
A. Pellegrini, et al., "CrashTest: A fast high-fidelity FPGA-based resiliency analysis framework," Proc. ICCD, pp. 363--370, 2008.
[24]
M. Prvulovic, Zheng Zhang and J. Torrellas, "ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors," Proc. ISCA, pp. 111--122, 2002.
[25]
S. Sahoo, et al., "Using Likely Program Invariants to Detect Hardware Errors", Proc. DSN, 2008.
[26]
M. Samadi, et al., "Sage: Self-tuning approximation for graphics engines", Proc. MICRO, pp. 13--24, 2013.
[27]
A. Sampson, et al., "Enerj: Approximate data types for safe and general low-power computation", Proc. PLDI, pp. 164--174, 2011.
[28]
J. San, et al., "Branch and data herding: Reducing control and memory divergence for error-tolerant gpu applications", Multimedia IEEE Transactions, vol. 15, no. 2, pp. 279--290, Feb 2013.
[29]
J. San, et al., "Doppel-ganger: A cache for approximate computing", Proc. MICRO, 2015.
[30]
J. Sartori, R. Kumar, "Architecting processors to allow voltage/reliability tradeoffs", Proc. Intl. Conf. on Compilers Architectures and Synthesis for Embedded Svstems, pp. 115--124, 2011.
[31]
N. Shanbhag, "Reliable and energy-efficient digital signal processing", Proc. DAC, pp. 830--835, 2002.
[32]
K. Skadron, et al., "Improving prediction for procedure returns with return-address-stack repair mechanisms," Proc. MICRO, pp. 259--271, 1998.
[33]
S. Sidiroglou-Douskos, et al., "Managing performance vs. accuracy trade-offs with loop perforation", Proc. of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, 2011.
[34]
R. Venkatagiri, et al., "Approxilyzer: Towards a systematic framework for instruction-level approximate computing and its application to hardware resiliency," Proc. MICRO, pp. 1--14, 2016.
[35]
N. Wang, S. Patel, "ReStore: Symptom-Based Soft Error Detection in Microprocessors", IEEE Transactions on Dependable and Secure Computing, vol. 3, no. 3, July-Sept 2006.

Cited By

View all

Index Terms

  1. Crash Skipping: A Minimal-Cost Framework for Efficient Error Recovery in Approximate Computing Environments

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI
      May 2019
      562 pages
      ISBN:9781450362528
      DOI:10.1145/3299874
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 May 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. approximate computing
      2. reliability
      3. robust systems

      Qualifiers

      • Research-article

      Conference

      GLSVLSI '19
      Sponsor:
      GLSVLSI '19: Great Lakes Symposium on VLSI 2019
      May 9 - 11, 2019
      VA, Tysons Corner, USA

      Acceptance Rates

      Overall Acceptance Rate 312 of 1,156 submissions, 27%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 03 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media