In computing, a futex (short for "fast userspace mutex") is a kernel system call that programmers can use to implement basic locking, or as a building block for higher-level locking abstractions such as semaphores and POSIX mutexes or condition variables.
A futex consists of a kernel-space wait queue that is attached to an atomic integer in userspace. Multiple processes or threads operate on the integer entirely in userspace (using atomic operations to avoid interfering with one another), and only resort to relatively expensive[ citation needed ] system calls to request operations on the wait queue (for example to wake up waiting processes, or to put the current process on the wait queue). A properly programmed futex-based lock will not use system calls except when the lock has contention; since most operations do not require arbitration between processes, this will not happen in most cases.
This section is missing information about FUTEX2 by Valve Corporation, mainly intended to mimic WaitForMultipleObjects in wine/proton “fsync”.(November 2021) |
Hubertus Franke (IBM Thomas J. Watson Research Center), Matthew Kirkwood, Ingo Molnár (Red Hat), and Rusty Russell (IBM Linux Technology Center) originated the futex mechanism on Linux in 2002. [1] In the same year, discussions took place on a proposal to make futexes accessible via the file system by creating a special node in /dev
or /proc
. However, Linus Torvalds strongly opposed this idea and rejected any related patches. [2]
Futexes then appeared for the first time in version 2.5.7 of the Linux kernel development series; the semantics stabilized as of version 2.5.40, and futexes have been part of the Linux kernel mainline since the December 2003 release of 2.6.x stable kernel series.
Futex functionality has been implemented in Microsoft Windows since Windows 8 or Windows Server 2012 under the name WaitOnAddress. [3]
In 2013, Microsoft patented futex-related [4] WaitOnAddress and the patent was granted in 2014. [5]
In May 2014, the CVE system announced a vulnerability discovered in the Linux kernel's futex subsystem that allowed denial-of-service attacks or local privilege escalation. [6] [7]
In May 2015, the Linux kernel introduced a deadlock bug via Commit b0c29f79ecea that caused a hang in user applications. The bug affected many enterprise Linux distributions, including 3.x and 4.x kernels, and Red Hat Enterprise Linux version 5, 6 and 7, SUSE Linux 12 and Amazon Linux. [8]
Futexes have been implemented in OpenBSD since 2016. [9]
The futex mechanism is one of the core concepts of the Zircon kernel [10] in Google's Fuchsia operating system since at least April 2018. [11]
Apple implemented futex in iOS/iPadOS/tvOS 17.4, macOS 14.4, watchOS 10.4 and visionOS 1.1. [12]
Futexes have two basic operations, WAIT
and WAKE
.
WAIT(addr, val)
addr
is val
, puts the current thread to sleep.WAKE(addr, num)
num
number of threads waiting on the address addr
.For more advanced uses, there are a number of other operations, the most used being CMP_REQUEUE
and WAKE_OP
, which both function as more generic WAKE
operations. [13]
CMP_REQUEUE(old_addr, new_addr, num_wake, num_move, val)
old_addr
is val
, wakes num_wake
threads waiting on the address old_addr
, and enqueues num_move
threads waiting on the address old_addr
to now wait on the address new_addr
. This can be used to avoid the thundering herd problem on wake. [14] [15] WAKE_OP(addr1, addr2, num1, num2, op, op_arg, cmp, cmp_arg)
addr2
, perform op
with op_arg
on it, and store the result back to addr2
. Then it will wake num1
threads waiting on addr1
, and, if the previously read value from addr2
matches cmp_arg
using comparison cmp
, will wake num2
threads waiting on addr2
. This very flexible and generic wake mechanism is useful for implementing many synchronization primitives.In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processes to share a single central processing unit (CPU), and is an essential feature of a multiprogramming or multitasking operating system. In a traditional CPU, each process - a program in execution - utilizes the various CPU registers to store data and hold the current state of the running process. However, in a multitasking operating system, the operating system switches between processes or threads to allow the execution of multiple processes simultaneously. For every switch, the operating system must save the state of the currently running process, followed by loading the next process state, which will run on the CPU. This sequence of operations that stores the state of the running process and the loading of the following running process is called a context switch.
An operating system (OS) is system software that manages computer hardware and software resources, and provides common services for computer programs.
XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.
In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. In many cases, a thread is a component of a process.
In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.
In computing, scheduling is the action of assigning resources to perform tasks. The resources may be processors, network links or expansion cards. The tasks may be threads, processes or data flows.
A network interface controller is a computer hardware component that connects a computer to a computer network.
In computer science, read-copy-update (RCU) is a synchronization mechanism that avoids the use of lock primitives while multiple threads concurrently read and update elements that are linked through pointers and that belong to shared data structures.
In computer programming, a callback is a function that is stored as data and designed to be called by another function – often back to the original abstraction layer.
DTrace is a comprehensive dynamic tracing framework originally created by Sun Microsystems for troubleshooting kernel and application problems on production systems in real time. Originally developed for Solaris, it has since been released under the free Common Development and Distribution License (CDDL) in OpenSolaris and its descendant illumos, and has been ported to several other Unix-like systems.
In concurrent programming, a monitor is a synchronization construct that prevents threads from concurrently accessing a shared object's state and allows them to wait for the state to change. They provide a mechanism for threads to temporarily give up exclusive access in order to wait for some condition to be met, before regaining exclusive access and resuming their task. A monitor consists of a mutex (lock) and at least one condition variable. A condition variable is explicitly 'signalled' when the object's state is modified, temporarily passing the mutex to another thread 'waiting' on the conditional variable.
Netlink is a socket family used for inter-process communication (IPC) between both the kernel and userspace processes, and between different userspace processes, in a way similar to the Unix domain sockets available on certain Unix-like operating systems, including its original incarnation as a Linux kernel interface, as well as in the form of a later implementation on FreeBSD. Similarly to the Unix domain sockets, and unlike INET sockets, Netlink communication cannot traverse host boundaries. However, while the Unix domain sockets use the file system namespace, Netlink sockets are usually addressed by process identifiers (PIDs).
In computer science, a readers–writer is a synchronization primitive that solves one of the readers–writers problems. An RW lock allows concurrent access for read-only operations, whereas write operations require exclusive access. This means that multiple threads can read the data in parallel but an exclusive lock is needed for writing or modifying data. When a writer is writing the data, all other writers and readers will be blocked until the writer is finished writing. A common use might be to control access to a data structure in memory that cannot be updated atomically and is invalid until the update is complete.
OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.
Grand Central Dispatch, is a technology developed by Apple Inc. to optimize application support for systems with multi-core processors and other symmetric multiprocessing systems. It is an implementation of task parallelism based on the thread pool pattern. The fundamental idea is to move the management of the thread pool out of the hands of the developer, and closer to the operating system. The developer injects "work packages" into the pool oblivious of the pool's architecture. This model improves simplicity, portability and performance.
The Linux kernel is a free and open-source, UNIX-like kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and soon after was adopted as the kernel for the GNU operating system (OS) which was created to be a free replacement for Unix.
nftables is a subsystem of the Linux kernel providing filtering and classification of network packets/datagrams/frames. It has been available since Linux kernel 3.13 released on 19 January 2014.
KSMBD is an open-source in-kernel CIFS/SMB server created by Namjae Jeon for the Linux kernel. Initially the goal is to provide improved file I/O performance, but the bigger goal is to have some new features which are much easier to develop and maintain inside the kernel and expose the layers fully. Directions can be attributed to sections where Samba is moving to a few modules inside the kernel to have features like Remote direct memory access (RDMA) to work with actual performance gain.
Fuchsia is an open-source capability-based operating system developed by Google. In contrast to Google's Linux-based operating systems such as ChromeOS and Android, Fuchsia is based on a custom kernel named Zircon. It publicly debuted as a self-hosted git repository in August 2016 without any official corporate announcement. After years of development, its official product launch was in 2021 on the first-generation Google Nest Hub, replacing its original Linux-based Cast OS.
io_uring is a Linux kernel system call interface for storage device asynchronous I/O operations addressing performance issues with similar interfaces provided by functions like read
/write
or aio_read
/aio_write
etc. for operations on data accessed by file descriptors.