Article

Free access

Effects of communication latency, overhead, and bandwidth in a cluster architecture

Authors:

Richard P. Martin,

Amin M. Vahdat,

David E. Culler,

Thomas E. AndersonAuthors Info & Claims

ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture

Pages 85 - 97

https://doi.org/10.1145/264107.264146

Published: 01 May 1997 Publication History

Abstract

This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased from 3 to 103 µs. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and per-message bandwidth, indicating that further improvements in communication performance will continue to improve application performance.

References

[1]

A. Agarwal, R. Binchini, D. Chaiken, K. Johnson, D, Kranz, J, Kubiatowiez, B. Lira, K. Mackenzie, and D. Yeung. The MIT Ale~,vlf~ Machine: Architecture and Performance. In Proceedhlgs of the 22nd International Symposium on Computer Architecture, pp, 2-13, May 1995.

Digital Library

[2]

A. Alexandrov, M. Ioneseu, K. Sehauser, and C, Seheiman, LogGP: Incorporating Long Messages into the LogP model - One step closer towards a realistic model for parallel computation. In 7th Annual gym. posture on Parallel Algorithms and Architectures, May 1995,

Digital Library

[3]

T.E. Anderson, D.E. Culler, D.A. Patterson, and the NOW Team, A Case for NOW (Networks of Workslalions), IEEE Mh.'ro, vol, 15, pp, 54-64, Feb. 1995.

Digital Library

[4]

A. C. Arpaci-Dusseau, R. H. Arpaei-Dusseau, D, E, Culler, J M, Hellerstein, and D. A. Patterson. High-Performance Sorting on Networks of Workstations. In Proceedings of l997 ACM SIGMOD biternational Conference on Management of Data, May 1997,

Digital Library

[5]

R. Arpaci, DE. Culler, A. Krishnamurthy, S, Steinberg, and K, Yellck, Empirical Evaluation of the CRAY-T3D: A Compiler Perspective, In Proceedings of the 22nd International Symposium on Computer At. chitecture, 1995.

Digital Library

[6]

E. Barton, J. Crownie, and M. MeLaren. Message Passing on tho Meiko C$-2. in Parallel Computhzg, vol, 20, pp, 497-507, Apt, 1994,

Digital Library

[7]

M.A. Blumrieh, K. Li, R. Alpert, C. Dubnieki, E.W, Fellen, and J. Sandberg. Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer. In Proceedings of tile 21st International Symposium on Computer Architecture, Apr. 1994.

Digital Library

[8]

NJ. Boden, D. Cohen, R.E. Felderman, A.E. Kulawik, C.L. Seitz, J.N. Seizovie, and W. Su. MyrinethA Gigabet-per-Second Local-Area Network. IEEE Micro, vol. 15, pp. 29-38, Feb. 1995.

Digital Library

[9]

S. Borkar. Supporting Systolieand Memory Communicationin iWarp. In The 17th Annual International Symposium on Computer Architecture, pp. 70--81, Seattle, WA, USA, May 1990.

Digital Library

[10]

E.D. Brooks III, B. C. Gorda, K. H. Warren, and T.S. Welcome. BBN TC2000 Architecture and Programming Models.

[11]

J.B. Carter, A. Davis, R. Kuramkote, C. Kuo, L.B. Stoller, and M. Swanson. Avalanche: A Communication and Memory Architecture for Scalable Parallel Computing. Technical Report UUCS-95- 022, University of Utah, 1995.

[12]

D. Chiou, B.S. Ang, Arvind, M.J. Beckede, G.A. Boughton, R. Greiner, J.E. Hicks, and J.C. Hoe. StarT-NG: Delivering Seamless Parallel Computing. In EURO-PAR'95 Conference, Aug. 1995.

Digital Library

[13]

D.E. Culler, A.C. Dusseau, S.C. Goldstein, A. Kxishnamurthy, S. Lumetta, T. yon Eieken, and K. Yelick. Parallel Programming in Split-C. In Proceedings of Supercomputing '93, pp. 262-273,1993.

Digital Library

[14]

D,E. Culler, R.M. Karl), D.A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eieken. LogP: Towards a Realistic Model of Parallel Computation. In Fourth ACM SIGPLANSymposium on Principles and Practice of Parallel Programming, pp. 262- 273, 1993.

Digital Library

[15]

D.E. Culler, L.T. Liu, R.P. Martin, and C.O. Yoshikawa. Assessing Fast Network Interfaces. In IEEE Micro, vol. 16, pp. 35-43, Feb. 1996.

Digital Library

[16]

R. Cypher, A. Ho, S. Konstantinidou, and P. Messina. Architectural Requirements of Parallel Scientific Applications with Explicit Communication. In Proceedings of the 20th International Symposium on Computer Architecture, pp. 2-13, May 1993.

Digital Library

[17]

W.J. Dally, J. S. Keen, and M. D. Noakes. The J-MachineArehiteeture and Evaluation. In COMPCON, pp. 183-188,Feb. 1993.

[18]

D.L. Dill, A. Drexler, A.J. Hu, and C.H Yang. Protocol Verification as a Hardware Design Aid. In International Conference on Computer Design: VLSI in Computers and Processors, 1992.

Digital Library

[19]

A.C. Dusseau, D.E. Culler, K.E. Schauser, and R.P. Martin. Fast Parallel Sorting Under LogP: Experiencewith the CM-5. In IEEE Transactions on Parallel and Distributed Systems, vol. 7, pp. 791-805, Aug. 1996.

Digital Library

[20]

S. Frank, H. Burkhard II, and J. Rthnie. The KSR I: Bridging the Gap Between Shared Memory and MPPs. In COMPCON, pp. 285-294, Feb. 1993.

[21]

R.B. Gillett. Memory Channel Network for PCI. In IEEE Micro, vol. 16, pp. 12-18,Feb. 1996.

Digital Library

[22]

V.G. Grafe and J. E. Hoeh. The Epsilon-2 Hybrid Dataflow Architecture. In COMPCON, pp. 88-93,Mar. 1990.

[23]

J.R. Gurd, C. C. Kerkham, and I. Watson. The Manchester Prototype Dataflow Computer. In Communications oftheACM, vol. 28, pp. 34-- 52, Jan. 1985.

Digital Library

[24]

M. Heinrieh, J. Kuskin, D. Ofelt, J. Heinlein, J. Baxter, J. P. Singh, R. Simoni, K. Gharachorloo, D .Nakahira, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 274-285, Oct. 1994.

Digital Library

[25]

C. Holt, M. Heinrieh, J. P. Singh, E. Rothberg, and J. Hennessy. The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Mulfiprocessors. Technical Report CSL-TR.-95-660, Stanford University, Jan. 1995.

Digital Library

[26]

R.W. Horst. TNet: A Reliable System Area Network. 1EEE Micro, vol. 15, pp. 37-45,Feb. 1994.

Digital Library

[27]

J. Kay and J. Pasquale. The Importance of Non-Data-TouchingOverheads in TCP/IP. In Proceedings of the 1993 SIGCOMM, pp. 259- 268, San Francisco, CA, Sept. 1993.

Digital Library

[28]

K. Keeton, D. A. Patterson, and T. E. Anderson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects III, Aug. 1995.

[29]

J. Kuskin, D. Ofelt, M. Heinrieh, J. Heinlein, R. Simoni, K. Gharachofloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The Stanford FLASH Multiproeessot. In Proceedings of the 21st International Symposium on Computer Architecture, pp. 302-313, Apr. 1994.

Digital Library

[30]

A. R. Lebeck and D. A. Wood. Dynamic Self-invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors. in Proceedings of the 22nd International Symposium on Computer Architecture, June 1995.

Digital Library

[31]

C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D.S. Wells, M. C. Wond, S. Yang, and R. Zak. The Network Architecture of the CM-5. In Symposium on Parallel and Distributed Algorithms, pp. 272-285,June 1992.

Digital Library

[32]

D. Lenoski, J. Laudon, T. joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy. The DASH Prototype: Implementation and Performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 92-103, May 1992.

Digital Library

[33]

S. Lumetta, A. Krishnamurthy, and D. E. Culler. Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines. In Proceedings of Supercomputing '95, 1995.

Digital Library

[34]

R. P. Martin. HPAM: An Active Message Layer for a Network of Workstations. In Proceedings of the 2nd Hot Interconnects Conference, Aug. 1994.

[35]

S. Pakin, M. Lauria, and A. Chien. High Performance Messaging on Workstations: Illinois Fast Messages (FM) forMyrinet. In Supercomputing '95, 1995.

Digital Library

[36]

G.M. Papadopoulos and D. E. Culler. Monsoon: An Explicit Token- Store Architecture. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pp. 82-91, May 1990.

Digital Library

[37]

P. Pierce and G. Regnier. The Paragon Implementation of the NX Message Passing Interface. In Proceedings of the Scalable High- Performance Computing Conference, pp. 184-190, May 1994.

[38]

S.K. Reinhardt, J. R. Larus, and D. A. Wood. Tempest and Typhoon: User-Level Shared Memory. In Proceedings of the 21st International Symposium on Computer Architecture, pp. 325-336, Apr. 1994.

Digital Library

[39]

M. Rosenblum, S. A. Herrod, 15. Witchel, and A .Gupta. Complete Computer Simulation: The SimOS Approach. In IEEE Parallel and Distributed Technology, Fall 1995.

Digital Library

[40]

S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An Architecture of a Dataflow Single Chip Processor. In Proc. of the 16th Annual Int. Symp. on Comp. Arch., pp. 46--53, June 1989.

Digital Library

[41]

S.L. Scott. Synchronization and Communication in the T3E Multiprocessor. In Proceedings of the 7th International Conference on Architectural Supportfor Programming Languages and Operating Systems, Oct. 1996.

Digital Library

[42]

U. Stem and D. L. Dill. Parallelizing the Murphi Verifier. Submitted for publication.

[43]

T. yon Eieken, A. Basu, V. Buch, and W. Vogels. U-Net: A User- Level Network Interface for Parallel and Distributed Computing. In Proceedings ofthe Fifteenth SOSP, pp. 40-53, Copper Mountain, CO, Dee. 1995.

Digital Library

[44]

T. yon Eicken, D. E. Culler, S. C. Goldstein, and K. E. Sehauser. Active Messages: aMechanismforintegrated Communication and Computation. In Proc. of the 19th int'l Symposiumon ComputerArchitecture, May 1992.

Digital Library

[45]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, pp. 24-36, June 1995.

Digital Library

Cited By

Kolomvatsos KAnagnostopoulos CKoziri MLoukopoulos T(2020)Proactive & Time-Optimized Data Synopsis Management at the EdgeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.3021377(1-1)Online publication date: 2020
https://doi.org/10.1109/TKDE.2020.3021377
Kolomvatsos K(2020)A Proactive Uncertainty Driven Model for Data Synopses Management in Pervasive Applications2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS50907.2020.00164(1266-1273)Online publication date: Dec-2020
https://doi.org/10.1109/HPCC-SmartCity-DSS50907.2020.00164
Endo WTaura K(2018)Parallelized Software Offloading of Low-Level Communication with User-Level ThreadsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3149457.3149475(289-298)Online publication date: 28-Jan-2018
https://dl.acm.org/doi/10.1145/3149457.3149475
Show More Cited By

Index Terms

Effects of communication latency, overhead, and bandwidth in a cluster architecture

Recommendations

Effects of communication latency, overhead, and bandwidth in a cluster architecture
Special Issue: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)

This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be ...
Effect of Communication Latency, Overhead, and Bandwidth on a Cluster
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

While the desire to use commodity parts in the communication architecture of a DSM multiprocessor offers advantages in cost and design time, the impact on application performance is unclear. We study this performance impact through detailed simulation, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture

June 1997

350 pages

ISBN:0897919017

DOI:10.1145/264107

Chairmen:
Andrew R. Pleszkun
Univ. of Colorado-Boulder, CO
,
Trevor Mudge
Univ. of Michigan

ACM SIGARCH Computer Architecture News Volume 25, Issue 2
Special Issue: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)
May 1997
349 pages
ISSN:0163-5964
DOI:10.1145/384286
Editors:
Andrew R. Pleszkun
Univ. of Colorado-Boulder, CO
,
Trevor Mudge
Univ. of Michigan
Issue’s Table of Contents

Copyright © 1997 Authors.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1997

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA97

Sponsor:

SIGARCH

ISCA97: International Conference on Computer Architecture

June 1 - 4, 1997

Colorado, Denver, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

146
Total Citations
View Citations
819
Total Downloads

Downloads (Last 12 months)159
Downloads (Last 6 weeks)17

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kolomvatsos KAnagnostopoulos CKoziri MLoukopoulos T(2020)Proactive & Time-Optimized Data Synopsis Management at the EdgeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.3021377(1-1)Online publication date: 2020
https://doi.org/10.1109/TKDE.2020.3021377
Kolomvatsos K(2020)A Proactive Uncertainty Driven Model for Data Synopses Management in Pervasive Applications2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS50907.2020.00164(1266-1273)Online publication date: Dec-2020
https://doi.org/10.1109/HPCC-SmartCity-DSS50907.2020.00164
Endo WTaura K(2018)Parallelized Software Offloading of Low-Level Communication with User-Level ThreadsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3149457.3149475(289-298)Online publication date: 28-Jan-2018
https://dl.acm.org/doi/10.1145/3149457.3149475
Wagle BKellar SSerio AKaiser H(2018)Methodology for Adaptive Active Message Coalescing in Task Based Runtime Systems2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00173(1133-1140)Online publication date: May-2018
https://doi.org/10.1109/IPDPSW.2018.00173
(2016)Virtualized I/OAttaining High Performance Communications10.1201/b10249-17(261-282)Online publication date: 19-Apr-2016
https://doi.org/10.1201/b10249-17
Boeres CRebello V(2016)Towards Optimal Static Task Scheduling for Realistic Machine Models: Theory and PracticeThe International Journal of High Performance Computing Applications10.1177/109434200301700200717:2(173-189)Online publication date: 26-Jul-2016
https://doi.org/10.1177/1094342003017002007
Wang QCherkasova LLi JVolos HAvritzer AIosup AZhu XBecker S(2016)Interconnect Emulator for Aiding Performance Analysis of Distributed Memory ApplicationsProceedings of the 7th ACM/SPEC on International Conference on Performance Engineering10.1145/2851553.2851574(75-83)Online publication date: 12-Mar-2016
https://dl.acm.org/doi/10.1145/2851553.2851574
Soares Tdos Santos RLobosco M(2016)A Parallel Model for Heterogeneous ClusterAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-49956-7_6(76-90)Online publication date: 19-Nov-2016
https://doi.org/10.1007/978-3-319-49956-7_6
Getmanskiy VChalyshev VKryzhanovsky DLopatin ILeksikov E(2015)Optimizing Processes Mapping for Tasks with Non-uniform Data Exchange Run on Cluster with Different InterconnectsHigh Performance Computing10.1007/978-3-319-20119-1_17(231-239)Online publication date: 20-Jun-2015
https://doi.org/10.1007/978-3-319-20119-1_17
Simakov NWhite JDeLeon RGhadersohi AFurlani TJones MGallo SPatra A(2015)Application kernelsConcurrency and Computation: Practice & Experience10.1002/cpe.356427:17(5238-5260)Online publication date: 10-Dec-2015
https://dl.acm.org/doi/10.1002/cpe.3564
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents