skip to main content
10.1145/3311790.3397341acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article
Open access

Cluster Usage Policy Enforcement Using Slurm Plugins and an HTTP API

Published: 26 July 2020 Publication History

Abstract

Managing and limiting cluster resource usage is a critical task for computing clusters with a large number of users. By enforcing usage limits, cluster managers are able to ensure fair availability for all users, bill users accordingly, and prevent the abuse of cluster resources. As this is such a common problem, there are naturally many existing solutions. However, to allow for greater control over usage accounting and submission behavior in Slurm, we present a system composed of: a web API which exposes accounting data; Slurm plugins that communicate with a REST-like HTTP implementation of that API; and client tools that use it to report usage. Key advantages of our system include a customizable resource accounting formula based on job parameters, preemptive blocking of user jobs at submission time, project-level and user-level resource limits, and support for the development of other web and command-line clients that query the extensible web API. We deployed this system on Berkeley Research Computing’s institutional cluster, Savio, allowing us to automatically collect and store accounting data, and thereby easily enforce our cluster usage policy.

Supplemental Material

MP4 File
Presentation video

References

[1]
Google Sheets API. [n.d.]. Sheets API | Google Developers. Retrieved Feb 3, 2020 from https://developers.google.com/sheets/api
[2]
ColdFront. [n.d.]. ubccr/coldfront: HPC Resource Allocation System. Retrieved Feb 3, 2020 from https://github.com/ubccr/coldfront
[3]
Django. [n.d.]. The Web framework for perfectionists with deadlines | Django. Retrieved Feb 3, 2020 from https://www.djangoproject.com
[4]
Slurm Documentation. [n.d.]. Slurm Workload Manager - Documentation. Retrieved Feb 3, 2020 from https://slurm.schedmd.com/documentation.html
[5]
Django REST Framework. [n.d.]. Home - Django REST framework. Retrieved Feb 3, 2020 from https://www.django-rest-framework.org
[6]
Grafana. [n.d.]. Grafana: The open observability platform | Grafana Labs. Retrieved Feb 3, 2020 from https://grafana.com
[7]
Douglas Jacobsen, James Botts, and Yun He. 2016. SLURM. Our Way.Proceedings of the 2016 Cray User Group(2016).
[8]
Morris Jette and Mark Grondona. 2003. Slurm: Simple Linux Utility Resource Management. In Proceedings of ClusterWorld Conference and Expo. San Jose, CA. Retrieved Feb 3, 2020 from https://slurm.schedmd.com/slurm_design.pdf
[9]
István Koren and Ralf Klamma. 2018. The Exploitation of OpenAPI Documentation for the Generation of Web Frontends. In Proceedings of The Web Conference 2018. Lyon, France. https://doi.org/10.1145/3184558.3188740
[10]
PostgreSQL. [n.d.]. PostgreSQL: The world’s most advanced open source database. Retrieved Feb 3, 2020 from https://www.postgresql.org
[11]
OpenAPI Specification. [n.d.]. OpenAPI Specification | Swagger. Retrieved Feb 3, 2020 from https://swagger.io/specification
[12]
Jimmy Tang and Paddy Doyle. [n.d.]. slurm-bank. Retrieved Feb 3, 2020 from http://jcftang.github.io/slurm-bank
[13]
Jyun-Yan You. [n.d.]. rust-bindgen: Automatically generates Rust FFI bindings to C (and some C++) libraries.Retrieved Feb 3, 2020 from https://github.com/rust-lang/rust-bindgen

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '20: Practice and Experience in Advanced Research Computing 2020: Catch the Wave
July 2020
556 pages
ISBN:9781450366892
DOI:10.1145/3311790
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Django
  2. Slurm
  3. account management
  4. accounting
  5. application programming interface
  6. high performance computing
  7. plugin

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PEARC '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)284
  • Downloads (Last 6 weeks)38
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media