In computer science, a tail call is a subroutine call performed as the final action of a procedure. [1] If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion (or tail-end recursion) is particularly useful, and is often easy to optimize in implementations.
Tail calls can be implemented without adding a new stack frame to the call stack. Most of the frame of the current procedure is no longer needed, and can be replaced by the frame of the tail call, modified as appropriate (similar to overlay for processes, but for function calls). The program can then jump to the called subroutine. Producing such code instead of a standard call sequence is called tail-call elimination or tail-call optimization. Tail-call elimination allows procedure calls in tail position to be implemented as efficiently as goto statements, thus allowing efficient structured programming. In the words of Guy L. Steele, "in general, procedure calls may be usefully thought of as GOTO statements which also pass parameters, and can be uniformly coded as [machine code] JUMP instructions." [2]
Not all programming languages require tail-call elimination. However, in functional programming languages, tail-call elimination is often guaranteed by the language standard, allowing tail recursion to use a similar amount of memory as an equivalent loop. The special case of tail-recursive calls, when a function calls itself, may be more amenable to call elimination than general tail calls. When the language semantics do not explicitly support general tail calls, a compiler can often still optimize sibling calls, or tail calls to functions which take and return the same types as the caller. [3]
When a function is called, the computer must "remember" the place it was called from, the return address , so that it can return to that location with the result once the call is complete. Typically, this information is saved on the call stack, a list of return locations in the order that the call locations were reached. For tail calls, there is no need to remember the caller – instead, tail-call elimination makes only the minimum necessary changes to the stack frame before passing it on, [4] and the tail-called function will return directly to the original caller. The tail call doesn't have to appear lexically after all other statements in the source code; it is only important that the calling function return immediately after the tail call, returning the tail call's result if any, since the calling function is bypassed when the optimization is performed.
For non-recursive function calls, this is usually an optimization that saves only a little time and space, since there are not that many different functions available to call. When dealing with recursive or mutually recursive functions where recursion happens through tail calls, however, the stack space and the number of returns saved can grow to be very significant, since a function can call itself, directly or indirectly, creating a new call stack frame each time. Tail-call elimination often reduces asymptotic stack space requirements from linear, or O(n), to constant, or O(1). Tail-call elimination is thus required by the standard definitions of some programming languages, such as Scheme, [5] [6] and languages in the ML family among others. The Scheme language definition formalizes the intuitive notion of tail position exactly, by specifying which syntactic forms allow having results in tail context. [7] Implementations allowing an unlimited number of tail calls to be active at the same moment, thanks to tail-call elimination, can also be called 'properly tail recursive'. [5]
Besides space and execution efficiency, tail-call elimination is important in the functional programming idiom known as continuation-passing style (CPS), which would otherwise quickly run out of stack space.
A tail call can be located just before the syntactical end of a function:
functionfoo(data){a(data);returnb(data);}
Here, both a(data)
and b(data)
are calls, but b
is the last thing the procedure executes before returning and is thus in tail position. However, not all tail calls are necessarily located at the syntactical end of a subroutine:
functionbar(data){if(a(data)){returnb(data);}returnc(data);}
Here, both calls to b
and c
are in tail position. This is because each of them lies in the end of if-branch respectively, even though the first one is not syntactically at the end of bar
's body.
In this code:
functionfoo1(data){returna(data)+1;}
functionfoo2(data){varret=a(data);returnret;}
functionfoo3(data){varret=a(data);return(ret==0)?1:ret;}
the call to a(data)
is in tail position in foo2
, but it is not in tail position either in foo1
or in foo3
, because control must return to the caller to allow it to inspect or modify the return value before returning it.
The following program is an example in Scheme: [8]
;; factorial : number -> number;; to calculate the product of all positive;; integers less than or equal to n.(define(factorialn)(if(=n0)1(*n(factorial(-n1)))))
This is not written in a tail-recursive style, because the multiplication function ("*") is in the tail position. This can be compared to:
;; factorial : number -> number;; to calculate the product of all positive;; integers less than or equal to n.(define(factorialn)(fact-iter1n))(define(fact-iterproductn)(if(=n0)product(fact-iter(*productn)(-n1))))
This program assumes applicative-order evaluation. The inner procedure fact-iter
calls itself last in the control flow. This allows an interpreter or compiler to reorganize the execution which would ordinarily look like this: [8]
call factorial (4) call fact-iter (1 4) call fact-iter (4 3) call fact-iter (12 2) call fact-iter (24 1) return 24 return 24 return 24 return 24 return 24
into the more efficient variant, in terms of both space and time:
call factorial (4) call fact-iter (1 4) replace arguments with (4 3) replace arguments with (12 2) replace arguments with (24 1) return 24 return 24
This reorganization saves space because no state except for the calling function's address needs to be saved, either on the stack or on the heap, and the call stack frame for fact-iter
is reused for the intermediate results storage. This also means that the programmer need not worry about running out of stack or heap space for extremely deep recursions. In typical implementations, the tail-recursive variant will be substantially faster than the other variant, but only by a constant factor.
Some programmers working in functional languages will rewrite recursive code to be tail recursive so they can take advantage of this feature. This often requires addition of an "accumulator" argument (product
in the above example) to the function.
Tail recursion modulo cons is a generalization of tail-recursion optimization introduced by David H. D. Warren [9] in the context of compilation of Prolog, seen as an explicitly set once language. It was described (though not named) by Daniel P. Friedman and David S. Wise in 1974 [10] as a LISP compilation technique. As the name suggests, it applies when the only operation left to perform after a recursive call is to prepend a known value in front of the list returned from it (or to perform a constant number of simple data-constructing operations, in general). This call would thus be a tail call save for ("modulo") the said cons operation. But prefixing a value at the start of a list on exit from a recursive call is the same as appending this value at the end of the growing list on entry into the recursive call, thus building the list as a side effect, as if in an implicit accumulator parameter. The following Prolog fragment illustrates the concept:
% Prolog, tail recursive modulo cons:partition([],_,[],[]).partition([X|Xs],Pivot,[X|Rest],Bigs):-X@<Pivot,!,partition(Xs,Pivot,Rest,Bigs).partition([X|Xs],Pivot,Smalls,[X|Rest]):-partition(Xs,Pivot,Smalls,Rest). | -- In Haskell, guarded recursion:partition[]_=([],[])partition(x:xs)p|x<p=(x:a,b)|otherwise=(a,x:b)where(a,b)=partitionxsp |
% Prolog, with explicit unifications:% non-tail recursive translation:partition([],_,[],[]).partition(L,Pivot,Smalls,Bigs):-L=[X|Xs],(X@<Pivot->partition(Xs,Pivot,Rest,Bigs),Smalls=[X|Rest];partition(Xs,Pivot,Smalls,Rest),Bigs=[X|Rest]). | % Prolog, with explicit unifications:% tail-recursive translation:partition([],_,[],[]).partition(L,Pivot,Smalls,Bigs):-L=[X|Xs],(X@<Pivot->Smalls=[X|Rest],partition(Xs,Pivot,Rest,Bigs);Bigs=[X|Rest],partition(Xs,Pivot,Smalls,Rest)). |
Thus in tail-recursive translation such a call is transformed into first creating a new list node and setting its first
field, and then making the tail call with the pointer to the node's rest
field as argument, to be filled recursively. The same effect is achieved when the recursion is guarded under a lazily evaluated data constructor, which is automatically achieved in lazy programming languages like Haskell.
The following fragment defines a recursive function in C that duplicates a linked list (with some equivalent Scheme and Prolog code as comments, for comparison):
typedefstructlist{void*value;structlist*next;}list;list*duplicate(constlist*ls){list*head=NULL;if(ls!=NULL){list*p=duplicate(ls->next);head=malloc(sizeof*head);head->value=ls->value;head->next=p;}returnhead;} | ;; in Scheme,(define(duplicatels)(if(not(null?ls))(cons(carls)(duplicate(cdrls)))'())) |
%% in Prolog,duplicate([X|Xs],R):-duplicate(Xs,Ys),R=[X|Ys].duplicate([],[]). |
In this form the function is not tail recursive, because control returns to the caller after the recursive call duplicates the rest of the input list. Even if it were to allocate the head node before duplicating the rest, it would still need to plug in the result of the recursive call into the next
field after the call. [lower-alpha 1] So the function is almost tail recursive. Warren's method pushes the responsibility of filling the next
field into the recursive call itself, which thus becomes tail call. [lower-alpha 2] Using sentinel head node to simplify the code,
typedefstructlist{void*value;structlist*next;}list;voidduplicate_aux(constlist*ls,list*end);list*duplicate(constlist*ls){listhead;duplicate_aux(ls,&head);returnhead.next;}voidduplicate_aux(constlist*ls,list*end){if(ls!=NULL){end->next=malloc(sizeof*end);end->next->value=ls->value;duplicate_aux(ls->next,end->next);}else{end->next=NULL;}} | ;; in Scheme,(define(duplicatels)(let((head(list1)))(letdup((lsls)(endhead))(cond((not(null?ls))(set-cdr!end(list(carls)))(dup(cdrls)(cdrend)))))(cdrhead))) |
%% in Prolog,duplicate([X|Xs],R):-R=[X|Ys],duplicate(Xs,Ys).duplicate([],[]). |
The callee now appends to the end of the growing list, rather than have the caller prepend to the beginning of the returned list. The work is now done on the way forward from the list's start, before the recursive call which then proceeds further, instead of backward from the list's end, after the recursive call has returned its result. It is thus similar to the accumulating parameter technique, turning a recursive computation into an iterative one.
Characteristically for this technique, a parent frame is created on the execution call stack, which the tail-recursive callee can reuse as its own call frame if the tail-call optimization is present.
The tail-recursive implementation can now be converted into an explicitly iterative implementation, as an accumulating loop:
typedefstructlist{void*value;structlist*next;}list;list*duplicate(constlist*ls){listhead,*end;end=&head;while(ls!=NULL){end->next=malloc(sizeof*end);end->next->value=ls->value;ls=ls->next;end=end->next;}end->next=NULL;returnhead.next;} | ;; in Scheme,(define(duplicatels)(let((head(list1)))(do((endhead(cdrend))(lsls(cdrls)))((null?ls)(cdrhead))(set-cdr!end(list(carls)))))) |
%% in Prolog,%% N/A |
In a paper delivered to the ACM conference in Seattle in 1977, Guy L. Steele summarized the debate over the GOTO and structured programming, and observed that procedure calls in the tail position of a procedure can be best treated as a direct transfer of control to the called procedure, typically eliminating unnecessary stack manipulation operations. [2] Since such "tail calls" are very common in Lisp, a language where procedure calls are ubiquitous, this form of optimization considerably reduces the cost of a procedure call compared to other implementations. Steele argued that poorly-implemented procedure calls had led to an artificial perception that the GOTO was cheap compared to the procedure call. Steele further argued that "in general procedure calls may be usefully thought of as GOTO statements which also pass parameters, and can be uniformly coded as [machine code] JUMP instructions", with the machine code stack manipulation instructions "considered an optimization (rather than vice versa!)". [2] Steele cited evidence that well-optimized numerical algorithms in Lisp could execute faster than code produced by then-available commercial Fortran compilers because the cost of a procedure call in Lisp was much lower. In Scheme, a Lisp dialect developed by Steele with Gerald Jay Sussman, tail-call elimination is guaranteed to be implemented in any interpreter. [11]
Tail recursion is important to some high-level languages, especially functional and logic languages and members of the Lisp family. In these languages, tail recursion is the most commonly used way (and sometimes the only way available) of implementing iteration. The language specification of Scheme requires that tail calls are to be optimized so as not to grow the stack. Tail calls can be made explicitly in Perl, with a variant of the "goto" statement that takes a function name: goto &NAME;
[12]
However, for language implementations which store function arguments and local variables on a call stack (which is the default implementation for many languages, at least on systems with a hardware stack, such as the x86), implementing generalized tail-call optimization (including mutual tail recursion) presents an issue: if the size of the callee's activation record is different from that of the caller, then additional cleanup or resizing of the stack frame may be required. For these cases, optimizing tail recursion remains trivial, but general tail-call optimization may be harder to implement efficiently.
For example, in the Java virtual machine (JVM), tail-recursive calls can be eliminated (as this reuses the existing call stack), but general tail calls cannot be (as this changes the call stack). [13] [14] As a result, functional languages such as Scala that target the JVM can efficiently implement direct tail recursion, but not mutual tail recursion.
The GCC, LLVM/Clang, and Intel compiler suites perform tail-call optimization for C and other languages at higher optimization levels or when the -foptimize-sibling-calls
option is passed. [15] [16] [17] Though the given language syntax may not explicitly support it, the compiler can make this optimization whenever it can determine that the return types for the caller and callee are equivalent, and that the argument types passed to both function are either the same, or require the same amount of total storage space on the call stack. [18]
Various implementation methods are available.
Tail calls are often optimized by interpreters and compilers of functional programming and logic programming languages to more efficient forms of iteration. For example, Scheme programmers commonly express while loops as calls to procedures in tail position and rely on the Scheme compiler or interpreter to substitute the tail calls with more efficient jump instructions. [19]
For compilers generating assembly directly, tail-call elimination is easy: it suffices to replace a call opcode with a jump one, after fixing parameters on the stack. From a compiler's perspective, the first example above is initially translated into pseudo-assembly language (in fact, this is valid x86 assembly):
foo:callBcallAret
Tail-call elimination replaces the last two lines with a single jump instruction:
foo:callBjmpA
After subroutine A
completes, it will then return directly to the return address of foo
, omitting the unnecessary ret
statement.
Typically, the subroutines being called need to be supplied with parameters. The generated code thus needs to make sure that the call frame for A is properly set up before jumping to the tail-called subroutine. For instance, on platforms where the call stack does not just contain the return address, but also the parameters for the subroutine, the compiler may need to emit instructions to adjust the call stack. On such a platform, for the code:
function foo(data1, data2) B(data1) return A(data2)
(where data1
and data2
are parameters) a compiler might translate that as: [lower-alpha 3]
foo:movreg,[sp+data1]; fetch data1 from stack (sp) parameter into a scratch register.pushreg; put data1 on stack where B expects itcallB; B uses data1pop; remove data1 from stackmovreg,[sp+data2]; fetch data2 from stack (sp) parameter into a scratch register.pushreg; put data2 on stack where A expects itcallA; A uses data2pop; remove data2 from stack.ret
A tail-call optimizer could then change the code to:
foo:movreg,[sp+data1]; fetch data1 from stack (sp) parameter into a scratch register.pushreg; put data1 on stack where B expects itcallB; B uses data1pop; remove data1 from stackmovreg,[sp+data2]; fetch data2 from stack (sp) parameter into a scratch register.mov[sp+data1],reg; put data2 where A expects itjmpA; A uses data2 and returns immediately to caller.
This code is more efficient both in terms of execution speed and use of stack space.
Since many Scheme compilers use C as an intermediate target code, the tail recursion must be encoded in C without growing the stack, even if the C compiler does not optimize tail calls. Many implementations achieve this by using a device known as a trampoline, a piece of code that repeatedly calls functions. All functions are entered via the trampoline. When a function has to tail-call another, instead of calling it directly and then returning the result, it returns the address of the function to be called and the call parameters back to the trampoline (from which it was called itself), and the trampoline takes care of calling this function next with the specified parameters. This ensures that the C stack does not grow and iteration can continue indefinitely.
It is possible to implement trampolines using higher-order functions in languages that support them, such as Groovy, Visual Basic .NET and C#. [20]
Using a trampoline for all function calls is rather more expensive than the normal C function call, so at least one Scheme compiler, Chicken, uses a technique first described by Henry Baker from an unpublished suggestion by Andrew Appel, [21] in which normal C calls are used but the stack size is checked before every call. When the stack reaches its maximum permitted size, objects on the stack are garbage-collected using the Cheney algorithm by moving all live data into a separate heap. Following this, the stack is unwound ("popped") and the program resumes from the state saved just before the garbage collection. Baker says "Appel's method avoids making a large number of small trampoline bounces by occasionally jumping off the Empire State Building." [21] The garbage collection ensures that mutual tail recursion can continue indefinitely. However, this approach requires that no C function call ever returns, since there is no guarantee that its caller's stack frame still exists; therefore, it involves a much more dramatic internal rewriting of the program code: continuation-passing style.
Tail recursion can be related to the while statement, an explicit iteration, for instance by transforming
procedure foo(x) ifp(x) return bar(x) elsereturn foo(baz(x))
into
procedure foo(x) whiletrueifp(x) return bar(x) elsex ← baz(x)
where x may be a tuple involving more than one variable: if so, care must be taken in implementing the assignment statement x ← baz(x) so that dependencies are respected. One may need to introduce auxiliary variables or use a swap construct.
More generally,
procedure foo(x) ifp(x) return bar(x) else ifq(x) return baz(x) ... else ifr(x) return foo(qux(x)) ... elsereturn foo(quux(x))
can be transformed into
procedure foo(x) whiletrueifp(x) return bar(x) else ifq(x) return baz(x) ... else ifr(x) x ← qux(x) ... elsex ← quux(x)
For instance, this Julia program gives a non-tail recursive definition fact
of the factorial:
functionfactorial(n)ifn==0return1elsereturnn*factorial(n-1)endend
Indeed, n * factorial(n - 1)
wraps the call to factorial
. But it can be transformed into a tail-recursive definition by adding an argument a
called an accumulator. [8]
This Julia program gives a tail-recursive definition fact_iter
of the factorial:
functionfactorial(n::Integer,a::Integer)ifn==0:returnaelsereturnfactorial(n-1,n*a)endendfunctionfactorial(n::Integer)returnfactorial(n,1)end
This Julia program gives an iterative definition fact_iter
of the factorial:
functionfact_iter(n::Integer,a::Integer)whilen>0a=n*an=n-1endreturnaendfunctionfactorial(n::Integer)returnfact_iter(n,one(n))end
recur
special form. [22] tailrec
modifier for functions [30] goto &NAME;
[32] @tailrec
annotation, which makes it a compilation error if the function is not tail recursive [39] if(ls!=NULL){head=malloc(sizeof*head);head->value=ls->value;head->next=duplicate(ls->next);}
if(ls!=NULL){head=malloc(sizeof*head);head->value=ls->value;duplicate(ls->next,&(head->next));}
call
instruction first pushes the current code location onto the stack and then performs an unconditional jump to the code location indicated by the label. The ret
instruction first pops a code location off the stack, then performs an unconditional jump to the retrieved code location. Lisp is a family of programming languages with a long history and a distinctive, fully parenthesized prefix notation. Originally specified in the late 1950s, it is the second-oldest high-level programming language still in common use, after Fortran. Lisp has changed since its early days, and many dialects have existed over its history. Today, the best-known general-purpose Lisp dialects are Common Lisp, Scheme, Racket, and Clojure.
In mathematics and computer science, mutual recursion is a form of recursion where two mathematical or computational objects, such as functions or datatypes, are defined in terms of each other. Mutual recursion is very common in functional programming and in some problem domains, such as recursive descent parsers, where the datatypes are naturally mutually recursive.
Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics and computer science, where a function being defined is applied within its own definition. While this apparently defines an infinite number of instances, it is often done in such a way that no infinite loop or infinite chain of references can occur.
In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.
In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without changing the source code, while macro expansion occurs prior to compilation, and results in different text that is then processed by the compiler.
In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again. Memoization has also been used in other contexts, such as in simple mutually recursive descent parsing. It is a type of caching, distinct from other forms of caching such as buffering and page replacement. In the context of some logic programming languages, memoization is also known as tabling.
In compiler construction, name mangling is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern programming languages.
In functional programming, continuation-passing style (CPS) is a style of programming in which control is passed explicitly in the form of a continuation. This is contrasted with direct style, which is the usual style of programming. Gerald Jay Sussman and Guy L. Steele, Jr. coined the phrase in AI Memo 349 (1975), which sets out the first version of the Scheme programming language. John C. Reynolds gives a detailed account of the numerous discoveries of continuations.
Cilk, Cilk++, Cilk Plus and OpenCilk are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loops and the fork–join idiom.
In computer programming, a nested function is a named function that is defined within another, enclosing, block and is lexically scoped within the enclosing block – meaning it is only callable by name within the body of the enclosing block and can use identifiers declared in outer blocks, including outer functions. The enclosing block is typically, but not always, another function.
In software, a stack overflow occurs if the call stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory. When a program attempts to use more space than is available on the call stack, the stack is said to overflow, typically resulting in a program crash.
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This type of stack is also known as an execution stack, program stack, control stack, run-time stack, or machine stack, and is often shortened to simply the "stack". Although maintenance of the call stack is important for the proper functioning of most software, the details are normally hidden and automatic in high-level programming languages. Many computer instruction sets provide special instructions for manipulating stacks.
In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines or functions receive parameters from their caller and how they return a result. When some code calls a function, design choices have been taken for where and how parameters are passed to that function, and where and how results are returned from that function, with these transfers typically done via certain registers or within a stack frame on the call stack. There are design choices for how the tasks of preparing for a function call and restoring the environment after the function has completed are divided between the caller and the callee. Some calling convention specifies the way every function should get called. The correct calling convention should be used for every function call, to allow the correct and reliable execution of the whole program using these functions.
In computer science, anonymous recursion is recursion which does not explicitly call a function by name. This can be done either explicitly, by using a higher-order function – passing in a function as an argument and calling it – or implicitly, via reflection features which allow one to access certain functions depending on the current context, especially "the current function" or sometimes "the calling function of the current function".
Scala is a strong statically typed high-level general-purpose programming language that supports both object-oriented programming and functional programming. Designed to be concise, many of Scala's design decisions are intended to address criticisms of Java.
In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.
The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.
Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length. IPO differs from other compiler optimizations by analyzing the entire program as opposed to a single function or block of code.
In compiler optimization, escape analysis is a method for determining the dynamic scope of pointers – where in the program a pointer can be accessed. It is related to pointer analysis and shape analysis.
Goto is a statement found in many computer programming languages. It performs a one-way transfer of control to another line of code; in contrast a function call normally returns control. The jumped-to locations are usually identified using labels, though some languages use line numbers. At the machine code level, a goto
is a form of branch or jump statement, in some cases combined with a stack adjustment. Many languages support the goto
statement, and many do not.
In computer programming, a function is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times.