In computing, remote direct memory access (RDMA) is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters.
RDMA supports zero-copy networking by enabling the network adapter to transfer data from the wire directly to application memory or from application memory directly to the wire, eliminating the need to copy data between application memory and the data buffers in the operating system. Such transfers require no work to be done by CPUs, caches, or context switches, and transfers continue in parallel with other system operations. This reduces latency in message transfer.
However, this strategy presents several problems related to the fact that the target node is not notified of the completion of the request (single-sided communications).
As of 2018 RDMA had achieved broader acceptance as a result of implementation enhancements that enable good performance over ordinary networking infrastructure. [1] For example RDMA over Converged Ethernet (RoCE) now is able to run over either lossy or lossless infrastructure. In addition iWARP enables an Ethernet RDMA implementation at the physical layer using TCP/IP as the transport, combining the performance and latency advantages of RDMA with a low-cost, standards-based solution. [2] The RDMA Consortium and the DAT Collaborative [3] have played key roles in the development of RDMA protocols and APIs for consideration by standards groups such as the Internet Engineering Task Force and the Interconnect Software Consortium. [4]
Hardware vendors have started working on higher-capacity RDMA-based network adapters, with rates of 100 Gbit/s reported. [5] [6] Software vendors, such as IBM, [7] Red Hat and Oracle Corporation, support these APIs in their latest products, [8] and as of 2013 [update] engineers have started developing network adapters that implement RDMA over Ethernet. [9] Both Red Hat Enterprise Linux and Red Hat Enterprise MRG [10] have support for RDMA. Microsoft supports RDMA in Windows Server 2012 via SMB Direct. VMware ESXi also supports RDMA as of 2015.
Common RDMA implementations include the Virtual Interface Architecture, RDMA over Converged Ethernet (RoCE), InfiniBand, Omni-Path and iWARP.
Applications access control structures using well-defined APIs originally designed for the InfiniBand Protocol (although the APIs can be used for any of the underlying RDMA implementations). Using send and completion queues, applications perform RDMA operations by submitting work queue entries (WQEs) into the submission queue (SQ) and getting notified of responses from the completion queue (CQ). [11]
RDMA can transport data reliably or unreliably over the Reliably Connected (RC) and Unreliable Datagram (UD) transport protocols, respectively. The former has the benefit of preserving requests (no requests are lost), while the latter requires fewer queue pairs when handling multiple connections. This is due to the fact that UD is connection-less, allowing a single host to communicate with any other using a single queue. [12]
Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call system. NFS is an open IETF standard defined in a Request for Comments (RFC), allowing anyone to implement the protocol.
InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. It is designed to be scalable and uses a switched fabric network topology. Between 2014 and June 2016, it was the most commonly used interconnect in the TOP500 list of supercomputers.
In computer science, inter-process communication (IPC), also spelled interprocess communication, are the mechanisms provided by an operating system for processes to manage shared data. Typically, applications can use IPC, categorized as clients and servers, where the client requests data and the server responds to client requests. Many applications are both clients and servers, as commonly seen in distributed computing.
A network interface controller is a computer hardware component that connects a computer to a computer network.
Server Message Block (SMB) is a communication protocol used to share files, printers, serial ports, and miscellaneous communications between nodes on a network. On Microsoft Windows, the SMB implementation consists of two vaguely named Windows services: "Server" and "Workstation". It uses NTLM or Kerberos protocols for user authentication. It also provides an authenticated inter-process communication (IPC) mechanism.
Message-oriented middleware (MOM) is software or hardware infrastructure supporting sending and receiving messages between distributed systems. Message-oriented middleware is in contrast to streaming-oriented middleware where data is communicated as a sequence of bytes with no explicit message boundaries. Note that streaming protocols are almost always built above protocols using discrete messages such as frames (Ethernet), datagrams (UDP), packets (IP), cells (ATM), et al.
TCP offload engine (TOE) is a technology used in some network interface cards (NIC) to offload processing of the entire TCP/IP stack to the network controller. It is primarily used with high-speed network interfaces, such as gigabit Ethernet and 10 Gigabit Ethernet, where processing overhead of the network stack becomes significant. TOEs are often used as a way to reduce the overhead associated with Internet Protocol (IP) storage protocols such as iSCSI and Network File System (NFS).
iWARP is a computer networking protocol that implements remote direct memory access (RDMA) for efficient data transfer over Internet Protocol networks. Contrary to some accounts, iWARP is not an acronym.
The iSCSI Extensions for RDMA (iSER) is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA). RDMA can be provided by the Transmission Control Protocol (TCP) with RDMA services (iWARP), which uses an existing Ethernet setup and therefore has lower hardware costs, RoCE, which does not need the TCP layer and therefore provides lower latency, or InfiniBand. iSER permits data to be transferred directly into and out of SCSI computer memory buffers without intermediate data copies and with minimal CPU involvement.
The OpenFabrics Alliance is a non-profit organization that promotes remote direct memory access (RDMA) switched fabric technologies for server and storage connectivity. These high-speed data-transport technologies are used in high-performance computing facilities, in research and various industries.
In computing the SCSI RDMA Protocol (SRP) is a protocol that allows one computer to access SCSI devices attached to another computer via remote direct memory access (RDMA). The SRP protocol is also known as the SCSI Remote Protocol. The use of RDMA makes higher throughput and lower latency possible than what is generally available through e.g. the TCP/IP communication protocol.
A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN).
In computer science, memory virtualization decouples volatile random access memory (RAM) resources from individual systems in the data center, and then aggregates those resources into a virtualized memory pool available to any computer in the cluster. The memory pool is accessed by the operating system or applications running on top of the operating system. The distributed memory pool can then be utilized as a high-speed cache, a messaging layer, or a large, shared memory resource for a CPU or a GPU application.
The Linux-IOTarget (LIO) is an open-source Small Computer System Interface (SCSI) target implementation included with the Linux kernel.
Chelsio Communications is a privately held technology company headquartered in Sunnyvale, California with a design center in Bangalore, India. Early venture capital funding came from Horizons Ventures, Invesco, Investor Growth Capital, NTT Finance, Vendanta Capital, Abacus Capital Group, Pacesetter Capital Group, and New Enterprise Associates. A third round of funding raised $25 million in late 2004. LSI Corporation was added as investor in 2006 in the series D round. By January 2008, a $25M financing round was announced as series E. In 2009, an additional $17M was raised from previous investors plus Mobile Internet Capital.
RDMA over Converged Ethernet (RoCE) is a network protocol which allows remote direct memory access (RDMA) over an Ethernet network. There are multiple RoCE versions. RoCE v1 is an Ethernet link layer protocol and hence allows communication between any two hosts in the same Ethernet broadcast domain. RoCE v2 is an internet layer protocol which means that RoCE v2 packets can be routed. Although the RoCE protocol benefits from the characteristics of a converged Ethernet network, the protocol can also be used on a traditional or non-converged Ethernet network.
Mellanox Technologies Ltd. was an Israeli-American multinational supplier of computer networking products based on InfiniBand and Ethernet technology. Mellanox offered adapters, switches, software, cables and silicon for markets including high-performance computing, data centers, cloud computing, computer data storage and financial services.
RTP-MIDI is a protocol to transport MIDI messages within Real-time Transport Protocol (RTP) packets over Ethernet and WiFi networks. It is completely open and free, and is compatible both with LAN and WAN application fields. Compared to MIDI 1.0, RTP-MIDI includes new features like session management, device synchronization and detection of lost packets, with automatic regeneration of lost data. RTP-MIDI is compatible with real-time applications, and supports sample-accurate synchronization for each MIDI message.
SoftEther VPN is free open-source, cross-platform, multi-protocol VPN client and VPN server software, developed as part of Daiyuu Nobori's master's thesis research at the University of Tsukuba. VPN protocols such as SSL VPN, L2TP/IPsec, OpenVPN, and Microsoft Secure Socket Tunneling Protocol are provided in a single VPN server. It was released using the GPLv2 license on January 4, 2014. The license was switched to Apache License 2.0 on January 21, 2019.
SHMEM is a family of parallel programming libraries, providing one-sided, RDMA, parallel-processing interfaces for low-latency distributed-memory supercomputers. The SHMEM acronym was subsequently reverse engineered to mean "Symmetric Hierarchical MEMory”. Later it was expanded to distributed memory parallel computer clusters, and is used as parallel programming interface or as low-level interface to build partitioned global address space (PGAS) systems and languages. “Libsma”, the first SHMEM library, was created by Richard Smith at Cray Research in 1993 as a set of thin interfaces to access the CRAY T3D's inter-processor-communication hardware. SHMEM has been implemented by Cray Research, SGI, Cray Inc., Quadrics, HP, GSHMEM, IBM, QLogic, Mellanox, Universities of Houston and Florida; there is also open-source OpenSHMEM.
The demonstration will show Microsoft's Windows Server 2012 SMB Direct running at line-rate 40Gb using RDMA over Ethernet (iWARP).