skip to main content
10.1145/3491418.3530767acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article
Open access

Phoenix: The Revival of Research Computing and the Launch of the New Cost Model at Georgia Tech

Published: 08 July 2022 Publication History

Abstract

Originating from partnerships formed by central IT and researchers supporting their own clusters, the traditional condominium and dedicated cluster models for research computing are appealing and prevalent among emerging centers throughout academia. In 2008, Georgia Institute of Technology (GT) launched a campus strategy to centralize the hosting of computing resources across multiple science and engineering disciplines under a group of expert support personnel, and in 2009 the Partnership for an Advanced Computing Environment (PACE) was formed. Due to the increases in scale over the past decade, however, the initial models created challenges for the research community, systems administrators, and GT’s leadership. In 2020, GT launched a strategic initiative to revitalize research computing through a refresh of the infrastructure and computational resources in parallel with the migration to a new state-of-the-art datacenter, Coda, followed by the transition to a new consumption-based cost model. These efforts have resulted in an overall increase in cluster utilization, access to more hardware, a decrease in queue wait times, a reduction in resource provision times, and increase in return on investment, suggesting that such a model is highly advantageous for academic research computing centers. Presented here are the methods employed in making the change to the new cost model, data supporting these claims, and the ongoing improvements to continue meeting the needs of the GT research community whose research is accelerated by the deployment of the new cost model and the Phoenix cluster that ranked #277 on the Top500 November 2020 list.

References

[1]
Mehmet Belgin and Lew Lefton. 2021. PACE Advisory Committee Activity Summary. https://pace.gatech.edu/sites/default/files/pace-advisory-committee-summary_1.pdf. Accessed: 2022-02-09.
[2]
Evan F. Bollig and James C. Wilgenbusch. 2018. From Bare Metal to Virtual: Lessons Learned When a Supercomputing Institute Deploys Its First Cloud. In Proceedings of the Practice and Experience on Advanced Research Computing (Pittsburgh, PA, USA) (PEARC ’18). Association for Computing Machinery, New York, NY, USA, Article 13, 8 pages. https://doi.org/10.1145/3219104.3219164
[3]
James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-Generation Compute Benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering (Berlin, Germany) (ICPE ’18). Association for Computing Machinery, New York, NY, USA, 41–42. https://doi.org/10.1145/3185768.3185771
[4]
Alan Chalker, Curtis W. Hillegas, Alan Sill, Sharon Broude Geva, and Craig A. Stewart. 2020. Cloud and On-Premises Data Center Usage, Expenditures, and Approaches to Return on Investment: A Survey of Academic Research Computing Organizations. Association for Computing Machinery, New York, NY, USA, 26–33. https://doi.org/10.1145/3311790.3396642
[5]
The Economist. 2021. Crypto-miners are probably to blame for the graphics-chip shortage. https://www.economist.com/graphic-detail/2021/06/19/crypto-miners-are-probably-to-blame-for-the-graphics-chip-shortage. Accessed: 2022-01-25.
[6]
IO500 Foundation. 2020. io500Result. https://io500.org/submissions/view/551. Accessed: 2022-02-17.
[7]
Graph500.org. 2021. November 2021 BFS. http://graph500.org/?page_id=1009. Accessed: 2022-02-17.
[8]
John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (sep 2006), 1–17. https://doi.org/10.1145/1186736.1186737
[9]
David Lifka, R Reynolds, Susan Mehringer, and Paul Redfern. 2010. {nd}. Overview of the Cornell University Center for Advanced Computing Sustainable Funding Model. In NSF Workshop on Sustainable Funding and Business Models for High Performance Computing Centers.NSF, Ithaca, NY.
[10]
Fang Liu, Michael D. Weiner, Kevin Manalo, Aaron Jezghani, Christopher J. Blanton, Christopher Stone, Kenneth Suda, Zhang Nuyun, Dan Zhou, Mehmet Belgin, Semir Sarajlic, and Ruben Lara. 2021. Human-in-the-Loop Automatic Data Migration for a Large Research Computing Data Center. In 2021 International Conference on Computational Science and Computational Intelligence (CSCI). Conference Publishing Service. Forthcoming.
[11]
Glen MacLachlan, Jason Hurlburt, Marco Suarez, Kai Leung Wong, William Burke, Terrence Lewis, Andrew Gallo, Jaroslav Flidr, Raoul Gabiam, Janis Nicholas, and Brian Ensor. 2020. Building a Shared Resource HPC Center Across University Schools and Institutes: A Case Study. arxiv:2003.13629 [cs.DC]
[12]
Glenn Martin and Jamie Schnaitter. 2021. UCF Advanced Research Computing Center. https://arcc.ist.ucf.edu/. Accessed: 2022-02-17.
[13]
Jeffrey Mervis. 2021. U.S. law sets stage for boost to artificial intelligence research. https://www.science.org/content/article/us-law-sets-stage-boost-artificial-intelligence-research.
[14]
OCGA. 2021. Budgets - Indirect Costs (IDC). https://blink.ucsd.edu/research/preparing-proposals/budgets/indirect.html#Modified-Total-Direct-Costs-(MT. Accessed: 2022-02-18.
[15]
PACE. 2020. PACE Participation Calcultor. https://docs.pace.gatech.edu/moreInformation/participation/#pace-participation-calculator. Accessed: 2022-02-4.
[16]
[16] PACE.2021. https://docs.pace.gatech.edu/phoenix_cluster/submit_jobs_phnx/#queues-on-the-phoenix-cluster. Accessed: 2022-02-17.
[17]
PACE. 2021. Sample NSF application boilerplate describing PACE. https://pace.gatech.edu/sample-nsf-application-boilerplate-describing-pace. Accessed: 2022-02-09.
[18]
PACE. 2021. Understanding the PACE Monthly Statement. https://docs.pace.gatech.edu/moreInformation/statements/. Accessed: 2022-02-14.
[19]
PACE. 2022. Phoenix Cluster Utilization Per Node Class. https://pace.gatech.edu/phoenix-cluster-utilization-node-class. Accessed: 2022-02-17.
[20]
Jeffrey T. Palmer. 2020. Open XDMoD 9.0.0 Release Notes. https://github.com/ubccr/xdmod/releases/tag/v9.0.0. Accessed: 2022-02-16.
[21]
Jeffrey T. Palmer, Steven M. Gallo, Thomas R. Furlani, Matthew D. Jones, Robert L. DeLeon, Joseph P. White, Nikolay Simakov, Abani K. Patra, Jeanette Sperhac, Thomas Yearke, Ryan Rathsam, Martins Innus, Cynthia D. Cornelius, James C. Browne, William L. Barth, and Richard T. Evans. 2015. Open XDMoD: A Tool for the Comprehensive Management of High-Performance Computing Resources. Computing in Science Engineering 17, 4 (2015), 52–62. https://doi.org/10.1109/MCSE.2015.68
[22]
Jed Pressgrove. 2021. Supply Chain in Crisis: How Can IT Shops Weather the Storm?https://www.govtech.com/computing/supply-chain-in-crisis-how-can-it-shops-weather-the-storm. Accessed: 2022-02-02.
[23]
Semir Sarajlic, Neranjan Edirisinghe, Yuriy Lukinov, Michael Walters, Brock Davis, and Gregori Faroux. 2016. Orion: Discovery Environment for HPC Research and Bridging XSEDE Resources. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale (Miami, USA) (XSEDE16). Association for Computing Machinery, New York, NY, USA, Article 54, 5 pages. https://doi.org/10.1145/2949550.2952770
[24]
Semir Sarajlic, Neranjan Edirisinghe, Yubao Wu, Yi Jiang, and Gregori Faroux. 2017. Training-Based Workforce Development in Advanced Computing for Research and Education (ACoRE). In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact (New Orleans, LA, USA) (PEARC17). Association for Computing Machinery, New York, NY, USA, Article 71, 4 pages. https://doi.org/10.1145/3093338.3104178
[25]
Patrick Schmitz, Claire Mizumoto, John Hicks, Dana Brunson, Gail Krovitz, James Bottum, Joel Cutcher-Gershenfeld, Karen Wetzel, and Thomas Cheatham. 2020. A Research Computing and Data Capabilities Model for Strategic Decision-Making. Association for Computing Machinery, New York, NY, USA, 77–84. https://doi.org/10.1145/3311790.3396643
[26]
SPEC. 2018. SPEC CPU2006 Results. https://www.spec.org/cpu2006/results/. Accessed: 2022-02-09.
[27]
SPEC. 2022. SPEC CPU2017 Results. https://www.spec.org/cpu2017/results/. Accessed: 2022-02-09.
[28]
Craig A. Stewart, David Y. Hancock, Julie Wernert, Thomas Furlani, David Lifka, Alan Sill, Nicholas Berente, Donald F. McMullen, Thomas Cheatham, Amy Apon, Ron Payne, and Shawn D. Slavin. 2019. Assessment of Financial Returns on Investments in Cyberinfrastructure Facilities: A Survey of Current Methods. In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning) (Chicago, IL, USA) (PEARC ’19). Association for Computing Machinery, New York, NY, USA, Article 33, 8 pages. https://doi.org/10.1145/3332186.3332228
[29]
Elizabeth Thomson. 2019. CODA OPENING Celebrates Innovation, Industry Collaboration. https://news.gatech.edu/archive/features/coda-opening-celebrates-innovation-industry-collaboration.shtml. Accessed: 2022-02-06.
[30]
Top500. 2020. Top500 Phoenix Entry. https://top500.org/system/179906/. Accessed: 2022-02-17.
[31]
Strahinja Trecakov and Nicholas Von Wolff. 2021. Doing More with Less: Growth, Improvements, and Management of NMSU’s Computing Capabilities. In Practice and Experience in Advanced Research Computing. Association for Computing Machinery, New York, NY, USA, Article 48, 4 pages. https://doi.org/10.1145/3437359.3465610
[32]
UW-IT. 2021. Indirect Cost (F&A) Waiver for UW-IT Research Storage, Compute and Cloud Services.

Cited By

View all
  • (2023)ICE 2.0: Restructuring and Growing an Instructional HPC ClusterProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624131(591-597)Online publication date: 12-Nov-2023
  • (2023)Semi-Automatic Hybrid Software Deployment Workflow in a Research Computing CenterPractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3593607(68-74)Online publication date: 23-Jul-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '22: Practice and Experience in Advanced Research Computing 2022: Revolutionary: Computing, Connections, You
July 2022
455 pages
ISBN:9781450391610
DOI:10.1145/3491418
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2022

Check for updates

Author Tags

  1. HPC
  2. cluster
  3. cost model
  4. datacenter

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PEARC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)284
  • Downloads (Last 6 weeks)44
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)ICE 2.0: Restructuring and Growing an Instructional HPC ClusterProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624131(591-597)Online publication date: 12-Nov-2023
  • (2023)Semi-Automatic Hybrid Software Deployment Workflow in a Research Computing CenterPractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3593607(68-74)Online publication date: 23-Jul-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media