Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleSeptember 2024
Benchmarking Large Language Models: Opportunities and Challenges
Performance Evaluation and BenchmarkingPages 77–89https://doi.org/10.1007/978-3-031-68031-1_6AbstractWith exponentially growing popularity of Large Language Models (LLMs) and LLM-based applications like ChatGPT and Bard, the Artificial Intelligence (AI) community of developers and users are in need of representative benchmarks to enable careful ...
- ArticleSeptember 2024
Benchmarking Generative AI Performance Requires a Holistic Approach
Performance Evaluation and BenchmarkingPages 34–43https://doi.org/10.1007/978-3-031-68031-1_3AbstractThe recent focus in AI on Large Language Models (LLMs) has brought the topic of trustworthy AI to the forefront. Along with the excitement of human-level performance, the Generative AI systems enabled by LLMs have raised many concerns about ...
- ArticleMarch 2023
Benchmarking Considerations for Trustworthy and Responsible AI (Panel)
Performance Evaluation and BenchmarkingPages 110–119https://doi.org/10.1007/978-3-031-29576-8_8AbstractContinuing growth of Artificial Intelligence (AI) adoption across enterprises and governments around the world has fueled the demand for trustworthy AI systems and applications. The need ranges from the so-called Explainable or Interpretable AI to ...
- ArticleMarch 2023
More the Merrier: Comparative Evaluation of TPCx-AI and MLPerf Benchmarks for AI
Performance Evaluation and BenchmarkingPages 67–77https://doi.org/10.1007/978-3-031-29576-8_5AbstractWith AI systems and solutions increasingly being deployed across many industries, measuring performance of AI workloads remains a priority in computing. TPCx-AI is a new benchmarking suite developed by TPC to address this need. It provides an ...
- ArticleAugust 2021
Everyone is a Winner: Interpreting MLPerf Inference Benchmark Results
Performance Evaluation and BenchmarkingPages 50–61https://doi.org/10.1007/978-3-030-94437-7_4AbstractMLPerf Inference benchmark suite version 1.0 was recently released. It is a third release and the version number along with minor changes from the previous version indicate a maturity of the suite. With 33 benchmarks and almost 2,000 results it ...