Heterogeneous Element Processor

Last updated
Nameplate from the Denelcor HEP H1000 HEP nameplate.jpg
Nameplate from the Denelcor HEP H1000

The Heterogeneous Element Processor (HEP) [1] was introduced by Denelcor, Inc. in 1982. The HEP's architect was Burton Smith. The machine was designed to solve fluid dynamics problems for the Ballistic Research Laboratory. [2] A HEP system, as the name implies, was pieced together from many heterogeneous components -- processors, data memory modules, and I/O modules. The components were connected via a switched network.

A single processor, called a PEM (Process Execution Module), in a HEP system (up to sixteen PEMs could be connected) was rather unconventional; via a "program status word (PSW) queue" up to fifty processes could be maintained in hardware at once. The largest system ever delivered had 4 PEMs. The eight-stage instruction pipeline allowed instructions from eight different processes to proceed at once. In fact, only one instruction from a given process was allowed to be present in the pipeline at any point in time. Therefore, the full processor throughput of 10 MIPS could only be achieved when eight or more processes were active; no single process could achieve throughput greater than 1.25 MIPS. This type of multithreading processing classifies today the HEP as a barrel processor, while it was described as an MIMD pipelined processor [3] by its designers. The hardware implementation of the HEP PEM was emitter-coupled logic.

Processes were classified as either user-level or supervisor-level. User-level processes could create supervisor-level processes, which were used to manage user-level processes and perform I/O. Processes of the same class were required to be grouped into one of seven user tasks and seven supervisor tasks.

Each processor, in addition to the PSW queue and instruction pipeline, contained instruction memory, 2,048 64-bit general purpose registers and 4,096 constant registers. Constant registers were differentiated by the fact that only supervisor processes could modify their contents. The processors themselves contained no data memory; instead, data memory modules could be separately attached to the switched network.

The HEP memory consisted of completely separate instruction memory (up to 128 MBs) and data memory (up to 1 GB). Users saw 64-bit words, but in reality, data memory words were 72-bit with the extra bits used for state, see next paragraph, parity, tagging, and other uses.

The HEP implemented a type of mutual exclusion in which all registers and locations in data memory had associated "empty" and "full" states. Reading from a location set the state to "empty," while writing to it set the state to "full." A programmer could allow processes to halt after trying to read from an empty location or write to a full location, enforcing critical sections.

The switched network [4] between elements resembled, in many ways, a modern computer network. On the network were sets of nodes, each of which had three links. When a packet arrived at a node, it consulted a routing table and attempted to forward the packet closer to its destination. If a node became congested, any incoming packets were passed on without routing. Packets treated in such a manner had their priority level increased; when several packets vied for a single node, a packet with a higher priority level would be routed before ones with lower priority levels.

Another component of the switched network was the sO System, with its own memory and many individual DEC UNIBUS buses attached for disks and other peripherals. The system also had the ability to save the full/empty bits not normally visible directly. The initial IO System performance was shown to be woefully inadequate due to the high latency in starting the IO operations. Ron Natalie (from BRL) and Burton Smith designed a new system out of spare parts on napkins at a local steakhouse and put it into operation in the course of the ensuing week.

The HEP's primary application programming language was a unique Fortran variant. In time C, Pascal, and SISAL were added. The syntax of data variables using full-empty bits prepended '$' before their name. So 'A' would name a local variable, but $A would be a locking full-empty variable. Application dead-lock was thus possible. Problematic, failure to '$' could introduce unintended numerical inaccuracy.

The first HEP operating system was HEPOS. Mike Muuss was involved in a Unix port for the Ballistic Research Laboratory. HEPOS was not a Unix-like operating system.

Although it was known to have poor cost-performance, the HEP received attention due to what were, at the time, several revolutionary features. The HEP had the performance of a CDC 7600-class computer in the Cray-1 era. HEP systems were leased by the Ballistic Research Laboratory (four PEM system), Los Alamos, [1] the Argonne National Laboratory (single PEM), the National Security Agency and Shoko Ltd (Japan, 1 PEM). Germany's Messerschmitt (three PEMS system) is the only client who bought it. [5] Denelcor also delivered a two PEM system to the University of Georgia in exchange for them providing software assistance (the system had also been offered to the University of Maryland). [6] Messerschmitt was the only client to put the HEP into use for "real" applications; the other clients used it for experimenting with parallel algorithms. The BRL system was used to prepare a movie using the BRL-CAD software as its only real application. Faster and larger designs for HEP-2 and HEP-3 were started but never completed. The architectural concept would later be embodied with the code-name Horizon.

External image
Searchtool.svg A picture of the system installed at BRL is available on Mike Muss Computer History Archive.

See also

Related Research Articles

<span class="mw-page-title-main">Central processing unit</span> Central computer component which executes instructions

A central processing unit (CPU), also called a central processor, main processor, or just processor, is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

<span class="mw-page-title-main">CDC 6600</span> Mainframe computer by Control Data

The CDC 6600 was the flagship of the 6000 series of mainframe computer systems manufactured by Control Data Corporation. Generally considered to be the first successful supercomputer, it outperformed the industry's prior recordholder, the IBM 7030 Stretch, by a factor of three. With performance of up to three megaFLOPS, the CDC 6600 was the world's fastest computer from 1964 to 1969, when it relinquished that status to its successor, the CDC 7600.

Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. There are many variations on this basic theme, and the definition of multiprocessing can vary with context, mostly as a function of how CPUs are defined.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

<span class="mw-page-title-main">Multiple instruction, multiple data</span> Computing technique employed to achieve parallelism

In computing, multiple instruction, multiple data (MIMD) is a technique employed to achieve parallelism. Machines using MIMD have a number of processor cores that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data.

<span class="mw-page-title-main">ILLIAC IV</span> First massively parallel computer

The ILLIAC IV was the first massively parallel computer. The system was originally designed to have 256 64-bit floating point units (FPUs) and four central processing units (CPUs) able to process 1 billion operations per second. Due to budget constraints, only a single "quadrant" with 64 FPUs and a single CPU was built. Since the FPUs all processed the same instruction – ADD, SUB etc. – in modern terminology, the design would be considered to be single instruction, multiple data, or SIMD.

Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966 and extended in 1972. The classification system has stuck, and it has been used as a tool in the design of modern processors and their functionalities. Since the rise of multiprocessing central processing units (CPUs), a multiprogramming context has evolved as an extension of the classification system. Vector processing, covered by Duncan's taxonomy, is missing from Flynn's work because the Cray-1 was released in 1977: Flynn's second paper was published in 1972.

Alliant Computer Systems Corporation was a computer company that designed and manufactured parallel computing systems. Together with Pyramid Technology and Sequent Computer Systems, Alliant's machines pioneered the symmetric multiprocessing market. One of the more successful companies in the group, over 650 Alliant systems were produced over their lifetime. The company was hit by a series of financial problems and went bankrupt in 1992.

<span class="mw-page-title-main">CDC 7600</span> 1967 supercomputer

The CDC 7600 was designed by Seymour Cray to be the successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s. The 7600 ran at 36.4 MHz and had a 65 Kword primary memory using magnetic core and variable-size secondary memory. It was generally about ten times as fast as the CDC 6600 and could deliver about 10 MFLOPS on hand-compiled code, with a peak of 36 MFLOPS. In addition, in benchmark tests in early 1970 it was shown to be slightly faster than its IBM rival, the IBM System/360, Model 195. When the system was released in 1967, it sold for around $5 million in base configurations, and considerably more as options and features were added.

<span class="mw-page-title-main">CDC Cyber</span> Range of mainframe-class supercomputers

The CDC Cyber range of mainframe-class supercomputers were the primary products of Control Data Corporation (CDC) during the 1970s and 1980s. In their day, they were the computer architecture of choice for scientific and mathematically intensive computing. They were used for modeling fluid flow, material science stress analysis, electrochemical machining analysis, probabilistic analysis, energy and academic computing, radiation shielding modeling, and other applications. The lineup also included the Cyber 18 and Cyber 1000 minicomputers. Like their predecessor, the CDC 6600, they were unusual in using the ones' complement binary representation.

<span class="mw-page-title-main">Multiple instruction, single data</span> Parallel computing architecture

In computing, multiple instruction, single data (MISD) is a type of parallel computing architecture where many functional units perform different operations on the same data. Pipeline architectures belong to this type, though a purist might say that the data is different after processing by each stage in the pipeline. Fault tolerance executing the same instructions redundantly in order to detect and mask errors, in a manner known as task replication, may be considered to belong to this type. Applications for this architecture are much less common than MIMD and SIMD, as the latter two are often more appropriate for common data parallel techniques. Specifically, they allow better scaling and use of computational resources. However, one prominent example of MISD in computing are the Space Shuttle flight control computers.

In computing, single program, multiple data (SPMD) is a term that has been used to refer to computational models for exploiting parallelism where-by multiple processors cooperate in the execution of a program in order to obtain results faster.

In computer science, stream processing is a programming paradigm which views streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.

The Intel Personal SuperComputer was a product line of parallel computers in the 1980s and 1990s. The iPSC/1 was superseded by the Intel iPSC/2, and then the Intel iPSC/860.

<span class="mw-page-title-main">Cray XMT</span>

Cray XMT is a scalable multithreaded shared memory supercomputer architecture by Cray, based on the third generation of the Tera MTA architecture, targeted at large graph problems. Presented in 2005, it supersedes the earlier unsuccessful Cray MTA-2. It uses the Threadstorm3 CPUs inside Cray XT3 blades. Designed to make use of commodity parts and existing subsystems for other commercial systems, it alleviated the shortcomings of Cray MTA-2's high cost of fully custom manufacture and support. It brought various substantial improvements over Cray MTA-2, most notably nearly tripling the peak performance, and vastly increased maximum CPU count to 8,192 and maximum memory to 128 TB, with a data TLB of maximal 512 TB.

The FPS AP-120B was a 38-bit, pipeline-oriented array processor manufactured by Floating Point Systems. It was designed to be attached to a host computer such as a DEC PDP-11 as a fast number-cruncher. Data transfer was accomplished using direct memory access.

<span class="mw-page-title-main">OpenComRTOS</span> Real-time operating system

OpenComRTOS is a commercial network-centric, formally developed real-time operating system (RTOS), aimed mainly at the embedded system market.

Zero ASIC Corporation, formerly Adapteva, Inc., is a fabless semiconductor company focusing on low power many core microprocessor design. The company was the second company to announce a design with 1,000 specialized processing cores on a single integrated circuit.

SUPRENUM was a German research project to develop a parallel computer from 1985 through 1990. It was a major effort which was aimed at developing a national expertise in massively parallel processing both at hardware and at software level.

Isra Vision Parsytec AG, a subsidiary of Isra Vision, was originally founded in 1985 as Parsytec in Aachen, Germany.

References

  1. 1 2 "Los Alamos experiences with the hep computer". Parallel MIMD Computation: HEP Supercomputer and Its Applications. The MIT Press. 1985-06-27. ISBN   978-0-262-25653-7 . Retrieved 2024-12-09.
  2. "The History of Computing at BRL".
  3. Parallel MIMD Computation: HEP Supercomputer and Its Applications. The MIT Press. 1985-06-27. ISBN   978-0-262-25653-7 . Retrieved 2024-12-09.
  4. Moore, James W. (1983). "The HEP parallel processor" (PDF). Los Alamos Science. 1983: 72–75. Retrieved 2024-12-09.
  5. Kulas, Mary C. (1987). Emerging Technologies Multi/ParaIlel Processing (Competitive Analysis). Digital Equipment Corporation (DEC) - Internal report. Retrieved 2024-12-09.
  6. Padua, David (2011). Encyclopedia of Parallel Computing, Volume 4. New York, NY: Springer Verlag.