skip to main content
10.1007/978-3-031-29576-8_8guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Benchmarking Considerations for Trustworthy and Responsible AI (Panel)

Published: 28 March 2023 Publication History

Abstract

Continuing growth of Artificial Intelligence (AI) adoption across enterprises and governments around the world has fueled the demand for trustworthy AI systems and applications. The need ranges from the so-called Explainable or Interpretable AI to Responsible AI, driven by the underlying demand for increasing confidence in deploying AI as part of Enterprise IT. Both internal to organizations as well as external, customer- and user-facing use cases based on AI are increasingly being expected to meet these demands. This paper describes the need for and definitions of trustworthiness and responsibility in AI systems, summarizes currently popular AI benchmarks, and deliberates on the challenges and the opportunities for assessing and benchmarking Trustworthy and Responsible aspects of AI systems and applications.

References

[1]
Bourrasset C et al. Nambiar R, Poess M, et al. Requirements for an enterprise AI benchmark Performance Evaluation and Benchmarking for the Era of Artificial Intelligence 2019 Cham Springer 71-81
[3]
Mattson P et al. MLPerf training benchmark Proc. Mach. Learn. Syst. 2020 2 336-349
[4]
Reddy, V.J., et al.: MLPerf Inference Benchmark. arXiv preprint arXiv: 1911:02549 (2019)
[6]
Transaction Processing and Performance Council, “TPC Express Benchmark ™ AI - Full Disclosure Report” (2022)
[7]
Bommasani, R., et. al.: On the opportunities and risks of foundation models. arXiv preprint https://arxiv.org/pdf/2108.07258.pdf. (2022)
[8]
Hodak M, Ellison D, and Dholakia A Nambiar R and Poess M Benchmarking AI inference: where we are in 2020 Performance Evaluation and Benchmarking 2021 Cham Springer 93-102
[9]
Hodak M, Ellison D, and Dholakia A Nambiar R and Poess M Everyone is a winner: interpreting MLPerf inference benchmark results Performance Evaluation and Benchmarking 2022 Cham Springer 50-61
[10]
Coleman, C.A., et al.: DAWNBench: an end-to-end deep learning benchmark and competition. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017) (2017)
[11]
Arrieta, A.B., et al.: Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI (2019). arXiv:1910.10045v2
[12]
Adali T, Guido RC, Ho TK, Müller KR, and Strather S Interpretability, reproducibility and replicability Guest editoriall. IEEE Signal Process. Mag. 2022 39 5-7
[13]
National Academies of Sciences, Engineering and Medicine. Reproducibility and Replicability in Science. Washington DC, USA: National Academy Press (2019)
[14]
European Union High-level Independent Group on Artificial Intelligence. “Assessment List for Trustworthy AI” (2020). https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment.
[15]
Linux Foundation AI & Data’s Trusted AI Committee Principles Working Group “Linux Foundation AI & Data’s Principles for Trusted AI” (2021). https://lfaidata.foundation/blog/2021/02/08/lf-ai-data-announces-principles-for-trusted-ai/.
[17]
Nielsen IE, Dera D, Rasool G, Ramachandran RP, and Bouaynaya NC Robust explainability IEEE Signal Process. Mag. 2022 39 73-84
[18]
Bravo-Rocca, G., Liu, P., Guitart, J., Dholakia, A., Ellison, D., Hodak, M.: Human-in-the-loop online multi-agent approach to increase trustworthiness in ML models through trust scores and data augmentation. In: IEEE COMPSAC (2022)
[19]
Jiang, H., Kim, B., Guan, M., Gupta, M.: To trust or not to trust a classifier. In: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, Canada (2018)
[20]
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810:04805v2 (2019)
[21]
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)
[22]
Rudin C Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Nat. Mach. Intell. 2019 1 5 206-215
[23]
Jacovi, A., Goldberg, Y.: Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness?” In: Proceedings of ACL, pp. 4198–4205 (2020)
[24]
Bibal, A., et al.: Is attention explanation? An introduction to the debate. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 3889–3900 (2022)

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Performance Evaluation and Benchmarking: 14th TPC Technology Conference, TPCTC 2022, Sydney, NSW, Australia, September 5, 2022, Revised Selected Papers
Sep 2022
158 pages
ISBN:978-3-031-29575-1
DOI:10.1007/978-3-031-29576-8

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 28 March 2023

Author Tags

  1. Artificial Intelligence
  2. Benchmarks
  3. Trustworthy
  4. Responsible
  5. Explainable
  6. Interpretable

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media