skip to main content
10.1145/264107.264146acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Effects of communication latency, overhead, and bandwidth in a cluster architecture

Published: 01 May 1997 Publication History

Abstract

This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased from 3 to 103 µs. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and per-message bandwidth, indicating that further improvements in communication performance will continue to improve application performance.

References

[1]
A. Agarwal, R. Binchini, D. Chaiken, K. Johnson, D, Kranz, J, Kubiatowiez, B. Lira, K. Mackenzie, and D. Yeung. The MIT Ale~,vlf~ Machine: Architecture and Performance. In Proceedhlgs of the 22nd International Symposium on Computer Architecture, pp, 2-13, May 1995.
[2]
A. Alexandrov, M. Ioneseu, K. Sehauser, and C, Seheiman, LogGP: Incorporating Long Messages into the LogP model - One step closer towards a realistic model for parallel computation. In 7th Annual gym. posture on Parallel Algorithms and Architectures, May 1995,
[3]
T.E. Anderson, D.E. Culler, D.A. Patterson, and the NOW Team, A Case for NOW (Networks of Workslalions), IEEE Mh.'ro, vol, 15, pp, 54-64, Feb. 1995.
[4]
A. C. Arpaci-Dusseau, R. H. Arpaei-Dusseau, D, E, Culler, J M, Hellerstein, and D. A. Patterson. High-Performance Sorting on Networks of Workstations. In Proceedings of l997 ACM SIGMOD biternational Conference on Management of Data, May 1997,
[5]
R. Arpaci, DE. Culler, A. Krishnamurthy, S, Steinberg, and K, Yellck, Empirical Evaluation of the CRAY-T3D: A Compiler Perspective, In Proceedings of the 22nd International Symposium on Computer At. chitecture, 1995.
[6]
E. Barton, J. Crownie, and M. MeLaren. Message Passing on tho Meiko C$-2. in Parallel Computhzg, vol, 20, pp, 497-507, Apt, 1994,
[7]
M.A. Blumrieh, K. Li, R. Alpert, C. Dubnieki, E.W, Fellen, and J. Sandberg. Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer. In Proceedings of tile 21st International Symposium on Computer Architecture, Apr. 1994.
[8]
NJ. Boden, D. Cohen, R.E. Felderman, A.E. Kulawik, C.L. Seitz, J.N. Seizovie, and W. Su. MyrinethA Gigabet-per-Second Local-Area Network. IEEE Micro, vol. 15, pp. 29-38, Feb. 1995.
[9]
S. Borkar. Supporting Systolieand Memory Communicationin iWarp. In The 17th Annual International Symposium on Computer Architecture, pp. 70--81, Seattle, WA, USA, May 1990.
[10]
E.D. Brooks III, B. C. Gorda, K. H. Warren, and T.S. Welcome. BBN TC2000 Architecture and Programming Models.
[11]
J.B. Carter, A. Davis, R. Kuramkote, C. Kuo, L.B. Stoller, and M. Swanson. Avalanche: A Communication and Memory Architecture for Scalable Parallel Computing. Technical Report UUCS-95- 022, University of Utah, 1995.
[12]
D. Chiou, B.S. Ang, Arvind, M.J. Beckede, G.A. Boughton, R. Greiner, J.E. Hicks, and J.C. Hoe. StarT-NG: Delivering Seamless Parallel Computing. In EURO-PAR'95 Conference, Aug. 1995.
[13]
D.E. Culler, A.C. Dusseau, S.C. Goldstein, A. Kxishnamurthy, S. Lumetta, T. yon Eieken, and K. Yelick. Parallel Programming in Split-C. In Proceedings of Supercomputing '93, pp. 262-273,1993.
[14]
D,E. Culler, R.M. Karl), D.A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eieken. LogP: Towards a Realistic Model of Parallel Computation. In Fourth ACM SIGPLANSymposium on Principles and Practice of Parallel Programming, pp. 262- 273, 1993.
[15]
D.E. Culler, L.T. Liu, R.P. Martin, and C.O. Yoshikawa. Assessing Fast Network Interfaces. In IEEE Micro, vol. 16, pp. 35-43, Feb. 1996.
[16]
R. Cypher, A. Ho, S. Konstantinidou, and P. Messina. Architectural Requirements of Parallel Scientific Applications with Explicit Communication. In Proceedings of the 20th International Symposium on Computer Architecture, pp. 2-13, May 1993.
[17]
W.J. Dally, J. S. Keen, and M. D. Noakes. The J-MachineArehiteeture and Evaluation. In COMPCON, pp. 183-188,Feb. 1993.
[18]
D.L. Dill, A. Drexler, A.J. Hu, and C.H Yang. Protocol Verification as a Hardware Design Aid. In International Conference on Computer Design: VLSI in Computers and Processors, 1992.
[19]
A.C. Dusseau, D.E. Culler, K.E. Schauser, and R.P. Martin. Fast Parallel Sorting Under LogP: Experiencewith the CM-5. In IEEE Transactions on Parallel and Distributed Systems, vol. 7, pp. 791-805, Aug. 1996.
[20]
S. Frank, H. Burkhard II, and J. Rthnie. The KSR I: Bridging the Gap Between Shared Memory and MPPs. In COMPCON, pp. 285-294, Feb. 1993.
[21]
R.B. Gillett. Memory Channel Network for PCI. In IEEE Micro, vol. 16, pp. 12-18,Feb. 1996.
[22]
V.G. Grafe and J. E. Hoeh. The Epsilon-2 Hybrid Dataflow Architecture. In COMPCON, pp. 88-93,Mar. 1990.
[23]
J.R. Gurd, C. C. Kerkham, and I. Watson. The Manchester Prototype Dataflow Computer. In Communications oftheACM, vol. 28, pp. 34-- 52, Jan. 1985.
[24]
M. Heinrieh, J. Kuskin, D. Ofelt, J. Heinlein, J. Baxter, J. P. Singh, R. Simoni, K. Gharachorloo, D .Nakahira, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 274-285, Oct. 1994.
[25]
C. Holt, M. Heinrieh, J. P. Singh, E. Rothberg, and J. Hennessy. The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Mulfiprocessors. Technical Report CSL-TR.-95-660, Stanford University, Jan. 1995.
[26]
R.W. Horst. TNet: A Reliable System Area Network. 1EEE Micro, vol. 15, pp. 37-45,Feb. 1994.
[27]
J. Kay and J. Pasquale. The Importance of Non-Data-TouchingOverheads in TCP/IP. In Proceedings of the 1993 SIGCOMM, pp. 259- 268, San Francisco, CA, Sept. 1993.
[28]
K. Keeton, D. A. Patterson, and T. E. Anderson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects III, Aug. 1995.
[29]
J. Kuskin, D. Ofelt, M. Heinrieh, J. Heinlein, R. Simoni, K. Gharachofloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The Stanford FLASH Multiproeessot. In Proceedings of the 21st International Symposium on Computer Architecture, pp. 302-313, Apr. 1994.
[30]
A. R. Lebeck and D. A. Wood. Dynamic Self-invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors. in Proceedings of the 22nd International Symposium on Computer Architecture, June 1995.
[31]
C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D.S. Wells, M. C. Wond, S. Yang, and R. Zak. The Network Architecture of the CM-5. In Symposium on Parallel and Distributed Algorithms, pp. 272-285,June 1992.
[32]
D. Lenoski, J. Laudon, T. joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy. The DASH Prototype: Implementation and Performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 92-103, May 1992.
[33]
S. Lumetta, A. Krishnamurthy, and D. E. Culler. Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines. In Proceedings of Supercomputing '95, 1995.
[34]
R. P. Martin. HPAM: An Active Message Layer for a Network of Workstations. In Proceedings of the 2nd Hot Interconnects Conference, Aug. 1994.
[35]
S. Pakin, M. Lauria, and A. Chien. High Performance Messaging on Workstations: Illinois Fast Messages (FM) forMyrinet. In Supercomputing '95, 1995.
[36]
G.M. Papadopoulos and D. E. Culler. Monsoon: An Explicit Token- Store Architecture. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pp. 82-91, May 1990.
[37]
P. Pierce and G. Regnier. The Paragon Implementation of the NX Message Passing Interface. In Proceedings of the Scalable High- Performance Computing Conference, pp. 184-190, May 1994.
[38]
S.K. Reinhardt, J. R. Larus, and D. A. Wood. Tempest and Typhoon: User-Level Shared Memory. In Proceedings of the 21st International Symposium on Computer Architecture, pp. 325-336, Apr. 1994.
[39]
M. Rosenblum, S. A. Herrod, 15. Witchel, and A .Gupta. Complete Computer Simulation: The SimOS Approach. In IEEE Parallel and Distributed Technology, Fall 1995.
[40]
S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. An Architecture of a Dataflow Single Chip Processor. In Proc. of the 16th Annual Int. Symp. on Comp. Arch., pp. 46--53, June 1989.
[41]
S.L. Scott. Synchronization and Communication in the T3E Multiprocessor. In Proceedings of the 7th International Conference on Architectural Supportfor Programming Languages and Operating Systems, Oct. 1996.
[42]
U. Stem and D. L. Dill. Parallelizing the Murphi Verifier. Submitted for publication.
[43]
T. yon Eieken, A. Basu, V. Buch, and W. Vogels. U-Net: A User- Level Network Interface for Parallel and Distributed Computing. In Proceedings ofthe Fifteenth SOSP, pp. 40-53, Copper Mountain, CO, Dee. 1995.
[44]
T. yon Eicken, D. E. Culler, S. C. Goldstein, and K. E. Sehauser. Active Messages: aMechanismforintegrated Communication and Computation. In Proc. of the 19th int'l Symposiumon ComputerArchitecture, May 1992.
[45]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, pp. 24-36, June 1995.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture
June 1997
350 pages
ISBN:0897919017
DOI:10.1145/264107
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 25, Issue 2
    Special Issue: Proceedings of the 24th annual international symposium on Computer architecture (ISCA '97)
    May 1997
    349 pages
    ISSN:0163-5964
    DOI:10.1145/384286
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1997

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA97
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)159
  • Downloads (Last 6 weeks)17
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media