skip to main content
10.1145/1216919.1216936acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
Article

A practical FPGA-based framework for novel CMP research

Published: 18 February 2007 Publication History

Abstract

Chip-multiprocessors are quickly gaining momentum in all segments of computing. However, the practical success of CMPs strongly depends on addressing the difficulty of multithreaded application development. To address this challenge, it is necessary to co-develop new CMP architecture with novel programming models. Currently, architecture research relies on software simulators which are too slow to facilitate interesting experiments with CMP software without using small datasets or significantly reducing the level of detail in the simulated models. An alternative to simulation is to exploit the rich capabilities of modern FPGAs to create FPGA-based platforms for novel CMP research. This paper presents ATLAS, the first prototype for CMPs with hardware support for Transactional Memory (TM), a technology aiming to simplify parallel programming. ATLAS uses the BEE2 multi-FPGA board to provide a system with 8 PowerPC cores that run at 100MHz and runs Linux. ATLAS provides significant benefits for CMP research such as 100x performance improvement over a software simulator and good visibility that helps with software tuning and architectural improvements. In addition to presenting and evaluating ATLAS, we share our observations about building a FPGA-based framework for CMP research. Specifically, we address issues such as overall performance, challenges of mapping ASIC-style CMP RTL on to FPGAs, software support, the selection criteria for the base processor, and the challenges of using pre-designed IP libraries.

References

[1]
H. Sutter, "The free lunch is over: A fundamental turn toward concurrency in software," Dr. Dobb's Journal, vol. 30, March 2005.
[2]
B. Lewis and D. J. Berg, Multithreaded Programming with Pthreads. Prentice Hall, 1998.
[3]
M. Herlihy and J. E. B. Moss, "Transactional memory: Architectural support for lock-free data structures," in Proceedings of the 20th International Symposium on Computer Architecture, pp. 289--300, 1993.
[4]
L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun, "Transactional memory coherence and consistency," in Proceedings of the 31st International Symposium on Computer Architecture, pp. 102--113, June 2004.
[5]
C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie, "Unbounded Transactional Memory," in Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA'05), (San Franscisco, California), pp. 316--327, February 2005.
[6]
R. Rajwar, M. Herlihy, and K. Lai, "Virtualizing Transactional Memory," in ISCA '05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, (Washington, DC, USA), pp. 494--505, IEEE Computer Society, June 2005.
[7]
K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood, "LogTM: Log-Based Transactional Memory," in 12th International Conference on High-Performance Computer Architecture, February 2006.
[8]
N. Shavit and D. Touitou, "Software transactional memory," in Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing, (Ottawa, Canada), pp. 204--213, August 1995.
[9]
M. Herlihy, V. Luchangco, M. Moir, and I. William N. Scherer, "Software transactional memory for dynamic-sized data structures," in PODC '03: Proceedings of the twenty-second annual symposium on Principles of distributed computing, (New York, NY, USA), pp. 92--101, ACM Press, July 2003.
[10]
T. Harris and K. Fraser, "Language support for lightweight transactions," in OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pp. 388--402, ACM Press, 2003.
[11]
A. Welc, S. Jagannathan, and A. L. Hosking, "Transactional monitors for concurrent objects," in Proceedings of the European Conference on Object-Oriented Programming (M. Odersky, ed.), vol. 3086 of Lecture Notes in Computer Science, pp. 519--542, Springer-Verlag, 2004.
[12]
B. Saha, A.-R. Adl-Tabatabai, R. L. Hudson, C. Cao Minh, and B. Hertzberg, "A high performance software transactional memory system for a multi-core runtime," in PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, (New York, NY, USA), ACM Press, March 2006.
[13]
M. F. Ringenburg and D. Grossman, "Atomcaml: first-class atomicity via rollback," in ICFP '05: Proceedings of the tenth ACM SIGPLAN international conference on Functional programming, (New York, NY, USA), pp. 92--104, ACM Press, 2005.
[14]
V. J. Marathe, W. N. Scherer III, and M. L. Scott, "Adaptive Software Transactional Memory," in 19th International Symposium on Distributed Computing, September 2005.
[15]
Arvind, K. Asanovic, D. Chiou, J. C. Hoe, C. Kozyrakis, S.-L. Lu, M. Oskin, D. Patterson, J. Rabaey, and J. Wawrzynek, "RAMP: Research accelerator for multiple processors - a community vision for a shared experimental parallel HW/SW platform," tech. rep., 2005.
[16]
C. Chang, J. Wawrzynek, and R. W. Brodersen, "BEE2: A high-end reconfigurable computing system," IEEE Design and Test of Computers, vol. 22, pp. 114--125, Mar/Apr 2005.
[17]
J. Chung, H. Chafi, C. Cao Minh, A. McDonald, B. D. Carlstrom, C. Kozyrakis, and K. Olukotun, "The Common Case Transactional Behavior of Multithreaded Programs," in Proceedings of the 12th International Conference on High-Performance Computer Architecture, February 2006.
[18]
J. Chung, C. Cao Minh, A. McDonald, H. Chafi, B. D. Carlstrom, T. Skare, C. Kozyrakis, and K. Olukotun, "Tradeoffs in transactional memory virtualization," in ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, ACM Press, Oct 2006.
[19]
A. McDonald, J. Chung, H. Chafi, C. Cao Minh, B. D. Carlstrom, L. Hammond, C. Kozyrakis, and K. Olukotun, "Characterization of TCC on Chip-Multiprocessors," in PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, (Washington, DC, USA), pp. 63--74, IEEE Computer Society, September 2005.
[20]
K. Oner, L. A. Barroso, S. Iman, J. Jeong, K. Ramamurthy, and M. Dubois, "The design of RPM: an FPGA-based multiprocessor emulator," in FPGA '95: Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays, pp. 60--66, 1995.
[21]
J. D. Davis, S. E. Richardson, C. Charitsis, and K. Olukotun, "A chip prototyping substrate: the flexible architecture for simulation and testing (fast)," vol. 33, pp. 34--43, New York, NY, USA: ACM Press, 2005.
[22]
D. Chiou, H. Sunjeliwala, D. Sunwoo, J. Xu, and N. Patil, "Fpga-based fast, cycle-accurate, full-system simulators," in 2nd Workshop on Architecture Research using FPGA Platforms, 12th International Symposium on High-Performance Computer Architecture (HPCA-12), February 2006.
[23]
N. Dave, M. Pellauer, Arvind, and J. Emer, "Implementing a functional/timing partitioned microprocessor simulator with an fpga," in 2nd Workshop on Architecture Research using FPGA Platforms, 12th International Symposium on High-Performance Computer Architecture (HPCA-12), February 2006.
[24]
D. A. Penry, D. Fay, D. Hodgdon, R. Wells, G. Schelle, D. I. August, and D. A. Connors, "Exploiting parallelism and structure to accelerate the simulation of chip multi-processors," in Proceedings of the 12th International Conference on High-Performance Computer Architecture, February 2006.
[25]
J. Hong, E. Nurvitadhi, and S.-L. L. Lu, "Design, implementation, and verification of active cache emulator (ace)," in FPGA '06: Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field programmable gate arrays, (New York, NY, USA), pp. 63--72, ACM Press, 2006.
[26]
F. J. Mesa-Martinez et al., "SCOORE: Santa Cruz out-of-order RISC engine, FPGA design issues," in Workshop on Architectural Research Prototyping (WARP), held in conjunction with ISCA-33, 2006.
[27]
L. Hammond, B. D. Carlstrom, V. Wong, B. Hertzberg, M. Chen, C. Kozyrakis, and K. Olukotun, "Programming with transactional coherence and consistency (TCC)," in ASPLOS-XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, (New York, NY, USA), pp. 1--13, ACM Press, October 2004.
[28]
R. A. Hankins, G. N. Chinya, J. D. Collins, P. H. Wang, R. Rakvic, H. Wang, and J. P. Shen, "Multiple instruction stream processor," in ISCA '06: Proceedings of the 33rd International Symposium on Computer Architecture, (Washington, DC, USA), pp. 114--127, IEEE Computer Society, 2006.
[29]
H. Chafi, C. Cao Minh, A. McDonald, B. D. Carlstrom, J. Chung, L. Hammond, C. Kozyrakis, and K. Olukotun, "TAPE: A Transactional Application Profiling Environment," in ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 199--208, June 2005.
[30]
A. S. G. Gibeling and K. Asanovic, "The RAMP architecture & description language," tech. rep., 2005.
[31]
J. D. Gilbert, S. H. Hunt, D. Gunadi, and G. Srinivasa, "TULSA, A Dual P4 Core Large Shared Cache Intel Xeon Processor for the MP Server Market Segment, Intel," in Conference Record of Hot Chips 18, 2006.

Cited By

View all

Index Terms

  1. A practical FPGA-based framework for novel CMP research

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FPGA '07: Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
    February 2007
    248 pages
    ISBN:9781595936004
    DOI:10.1145/1216919
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 February 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FPGA-based emulation
    2. chip multi-processor
    3. transactional memory

    Qualifiers

    • Article

    Conference

    FPGA07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 125 of 627 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media