skip to main content
10.1145/2834899.2834905acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Performance and energy efficiency analysis of 64-bit ARM using GAMESS

Published: 15 November 2015 Publication History

Abstract

Power efficiency is one of the key challenges facing the HPC co-design community, sparking interest in the ARM processor architecture as a low-power high-efficiency alternative to the high-powered systems that dominate today. Recent advances in the ARM architecture, including the introduction of 64-bit support, have only fueled more interest in ARM. While ARM-based clusters have proven to be useful for data server applications, their viability for HPC applications requires an in-depth analysis of on-node and inter-node performance. To that end, as a co-design exercise, the viability of a commercially available 64-bit ARM cluster is investigated in terms of performance and energy efficiency with the widely used quantum chemistry package GAMESS. The performance and energy efficiency metrics are also compared to a conventional x86 Intel Ivy Bridge system. A 2:1 Moonshot core to Ivy Bridge core performance ratio is observed for the GAMESS calculation types considered. Doubling the number of cores to complete the execution faster on the 64-bit ARM cluster leads to better energy efficiency compared to the Ivy Bridge system; i.e., a 32-core execution of GAMESS calculation has approximately the same performance and better energy-to-solution than a 16-core execution of the same calculation on the Ivy Bridge system.

References

[1]
Server remote management with HP Integrated Lights Out (iLO). http://tinyurl.com/o6so5bk.
[2]
WattsUp? Meters. https://www.wattsupmeters.com/.
[3]
CoMD Proxy Application. http://www.exmatex.org/comd.html, 2015.
[4]
D. Abdurachmanov, B. Bockelman, P. Elmer, G. Eulisse, R. Knight, and S. Muzaffar. Heterogeneous high throughput scientific computing with apm x-gene and intel xeon phi. Journal of Physics: Conference Series, 608(1):012033, 2015.
[5]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmarks - summary and preliminary results. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Supercomputing '91, pages 158--165, New York, NY, USA, 1991. ACM.
[6]
R. F. Barrett, C. T. Vaughan, and M. A. Heroux. Minighost: a miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing. Sandia National Laboratories, Tech. Rep. SAND, 5294832, 2011.
[7]
P. N. Brown, R. D. Falgout, and J. E. Jones. Semicoarsening Multigrid on Distributed Memory Machines. SIAM J. Sci. Comput., 21(5):1823--1834, 2000.
[8]
Cavium. ThunderX ARM Processors. http://tinyurl.com/mj2ayo4, 2015.
[9]
M. F. Cloutier, C. Paradis, and V. M. Weaver. Design and analysis of a 32-bit embedded high-performance cluster optimized for energy and performance. In Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, Co-HPC '14, pages 1--8, Piscataway, NJ, USA, 2014. IEEE Press.
[10]
T. H. Dunning. Gaussian basis sets for use in correlated molecular calculations. i. the atoms boron through neon and hydrogen. The Journal of Chemical Physics, 90(2):1007--1023, 1989.
[11]
EP Analytics, Inc. EPAX Toolkit: Binary Analysis for ARM. http://epaxtoolkit.com, 2014.
[12]
J. D. et al. iPerf -- The network bandwidth measurement tool. https://iperf.fr/, 2015.
[13]
G. D. Fletcher, D. G. Fedorov, S. R. Pruitt, T. L. Windus, and M. S. Gordon. Large-scale mp2 calculations on the blue gene architecture using the fragment molecular orbital method. Journal of Chemical Theory and Computation, 8(1):75--79, 2012.
[14]
G. D. Fletcher, M. W. Schmidt, B. M. Bode, and M. S. Gordon. The distributed data interface in gamess. Computer Physics Communications, 128:190--200, 2000.
[15]
R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors, 1996.
[16]
M. S. Gordon, D. G. Fedorov, S. R. Pruitt, and L. V. Slipchenko. Fragmentation methods: A route to accurate calculations on large systems. Chemical Reviews, 112(1):632--672, 2012. 21866983.
[17]
M. Head-Gordon, J. A. Pople, and M. J. Frisch. Mp2 energy evaluation by direct methods. Chemical Physics Letters, 153(6):503--506, 1988.
[18]
N. Hemsoth. Moonshot Moves HPC Closer to ARM's Reach. http://tinyurl.com/pvuwnpb, 2015.
[19]
M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards, A. Williams, M. Rajan, E. R. Keiter, H. K. Thornquist, and R. W. Numrich. Improving performance via mini-applications. Sandia National Laboratories, Tech. Rep. SAND2009-5574, 2009.
[20]
A. Jundt, A. Cauble-Chantrenne, A. Tiwari, J. Peraza, M. Laurenzano, and L. Carrington. Compute bottlenecks on the new 64-bit arm. In International Workshop on Energy Efficient Supercomputing (E2SC), E2SC '15. IEEE, 2015. To Appear.
[21]
I. Karlin, A. Bhatele, J. Keasler, B. Chamberlain, J. Cohen, Z. DeVito, R. Haque, D. Laney, E. Luke, F. Wang, et al. Exploring traditional and emerging parallel programming models using a proxy application. In Proceedings of the 2013 IEEE International Symposium on Parallel and Distributed Processing, IPDPS, 2013.
[22]
K. Keipert, G. Mitra, V. Sundriyal, S. S. Leang, M. Sosonkina, A. P. Rendell, and M. S. Gordon. Energy efficient computational chemistry: Comparison of x86 and arm systems. Journal of Chemical Theory and Computation, 2015. http://dx.doi.org/10.1021/acs.jctc.5b00713, Articles ASAP.
[23]
M. Laurenzano, M. Tikir, L. Carrington, and A. Snavely. Pebil: Efficient static binary instrumentation for linux. In Performance Analysis of Systems Software (ISPASS), 2010 IEEE International Symposium on, pages 175--183, march 2010.
[24]
M. Laurenzano, A. Tiwari, A. Jundt, J. Peraza, J. Ward, William A., R. Campbell, and L. Carrington. Characterizing the performance-energy tradeoff of small arm cores in hpc computation. In Euro-Par 2014 Parallel Processing, volume 8632, pages 124--137. Springer International Publishing, 2014.
[25]
K. London, S. Moore, P. Mucci, K. Seymour, and R. Luczak. The papi cross-platform interface to hardware performance counters. In Department of Defense Users' Group Conference Proceedings, pages 18--21, 2001.
[26]
L. McVoy and C. Staelin. lmbench: portable tools for performance analysis. In Proceedings of the 1996 annual conference on USENIX Annual Technical Conference, ATEC '96, pages 23--23, Berkeley, CA, USA, 1996. USENIX Association.
[27]
M. Meswani, M. Laurenzano, L. Carrington, and A. Snavely. Modeling and predicting disk i/o time of hpc applications. In High Performance Computing Modernization Program Users Group Conference (HPCMP-UGC), 2010 DoD, pages 478--486, June 2010.
[28]
R. Olson, M. Schmidt, M. Gordon, and A. Rendell. Enabling the efficient use of smp clusters: The gamess/ddi model. In Supercomputing, 2003 ACM/IEEE Conference, pages 41--41, Nov 2003.
[29]
E. L. Padoin, L. L. Pilla, M. Castro, F. Z. Boito, P. O. A. Navaux, and J.-F. Méhaut. Performance/energy trade-off in scientific computing: the case of arm big. little and intel sandy bridge. IET Computers & Digital Techniques (CDT), 2014.
[30]
N. Rajovic, N. Puzovic, L. Vilanova, C. Villavieja, and A. Ramirez. The low-power architecture approach towards exascale computing. In Proceedings of the Second Workshop on Scalable Algorithms for Large-scale Systems, ScalA '11, pages 1--2, New York, NY, USA, 2011. ACM.
[31]
N. Rajovic, A. Rico, N. Puzovic, C. Adeniyi-Jones, and A. Ramirez. Tibidabo1: Making the case for an arm-based {HPC} system. Future Generation Computer Systems, 36:322--334, 2014.
[32]
M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis, and J. A. Montgomery. General atomic and molecular electronic structure system. Journal of Computational Chemistry, 14(11):1347--1363, 1993.
[33]
L. Stanisic, B. Videau, J. Cronsioe, A. Degomme, V. Marangozova-Martin, A. Legrand, and J.-F. Mehaut. Performance analysis of hpc applications on low-power embedded platforms. In Design, Automation Test in Europe Conference Exhibition (DATE), 2013, March 2013.
[34]
P. Stanley-Marbell and V. Cabezas. Performance, power, and thermal analysis of low-power processors for scale-out systems. In Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, pages 863--870, May 2011.
[35]
M. Tikir, M. Laurenzano, L. Carrington, and A. Snavely. Psins: An open source event tracer and execution simulator for mpi applications. In H. Sips, D. Epema, and H.-X. Lin, editors, Euro-Par 2009 Parallel Processing, volume 5704 of Lecture Notes in Computer Science, pages 135--148. Springer Berlin Heidelberg, 2009.
[36]
F. Weigend, M. HÃd'ser, H. Patzelt, and R. Ahlrichs. Ri-mp2: optimized auxiliary basis sets and demonstration of efficiency. Chemical Physics Letters, 294:143--152, 1998.
[37]
R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC '98, pages 1--27, Washington, DC, USA, 1998. IEEE Computer Society.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Co-HPC '15: Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing
November 2015
61 pages
ISBN:9781450339926
DOI:10.1145/2834899
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC15
Sponsor:

Acceptance Rates

Co-HPC '15 Paper Acceptance Rate 7 of 13 submissions, 54%;
Overall Acceptance Rate 7 of 13 submissions, 54%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media