Loop-invariant code motion

Last updated

In computer programming, loop-invariant code consists of statements or expressions (in an imperative programming language) that can be moved outside the body of a loop without affecting the semantics of the program. Loop-invariant code motion (also called hoisting or scalar promotion) is a compiler optimization that performs this movement automatically.

Contents

Example

In the following code sample, two optimizations can be applied.

inti=0;while(i<n){x=y+z;a[i]=6*i+x*x;++i;}

Although the calculation x = y + z and x * x is loop-invariant, precautions must be taken before moving the code outside the loop. It is possible that the loop condition is false (for example, if n holds a negative value), and in such case, the loop body should not be executed at all. One way of guaranteeing correct behaviour is using a conditional branch outside of the loop. Evaluating the loop condition can have side effects, so an additional evaluation by the if construct should be compensated by replacing the while loop with a do {} while . If the code used do {} while in the first place, the whole guarding process is not needed, as the loop body is guaranteed to execute at least once.

inti=0;if(i<n){x=y+z;intconstt1=x*x;do{a[i]=6*i+t1;++i;}while(i<n);}

This code can be optimized further. For example, strength reduction could remove the two multiplications inside the loop (6*i and a[i]), and induction variable elimination could then elide i completely. Since 6 * i must be in lock step with i itself, there is no need to have both.

Invariant code detection

Usually, a reaching definitions analysis is used to detect whether a statement or expression is loop invariant.

For example, if all reaching definitions for the operands of some simple expression are outside of the loop, the expression can be moved out of the loop.

Recent work using data-flow dependence analysis [1] allows to detect not only invariant commands but larger code fragments such as an inner loop. The analysis also detects quasi-invariants of arbitrary degrees, that is commands or code fragments that become invariant after a fixed number of iterations of the loop body.

Benefits

Loop-invariant code which has been hoisted out of a loop is executed less often, providing a speedup. Another effect of this transformation is allowing constants to be stored in registers and not having to calculate the address and access the memory (or cache line) at each iteration.

However, if too many variables are created, there will be high register pressure, especially on processors with few registers, like the 32-bit x86. If the compiler runs out of registers, some variables will be spilled. To counteract this, the inverse optimization can be performed, rematerialization.

See also

Further reading

Related Research Articles

In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluations.

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory use, storage size, and power consumption. Optimization is generally implemented as a sequence of optimizing transformations, algorithms that transform code to produce semantically equivalent code optimized for some aspect. It is typically CPU and memory intensive. In practice, factors such as available memory and a programmer's willingness to wait for compilation limit the optimizations that a compiler might provide. Research indicates that some optimization problems are NP-complete, or even undecidable.

Standard ML (SML) is a general-purpose, high-level, modular, functional programming language with compile-time type checking and type inference. It is popular for writing compilers, for programming language research, and for developing theorem provers.

Constant folding and constant propagation are related compiler optimizations used by many modern compilers. An advanced form of constant propagation known as sparse conditional constant propagation can more accurately propagate constants and simultaneously remove dead code.

In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without changing the source code, while macro expansion occurs prior to compilation, and results in different text that is then processed by the compiler.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">For loop</span> Control flow statement for repeated execution

In computer science, a for-loop or for loop is a control flow statement for specifying iteration. Specifically, a for-loop functions by running a section of code repeatedly until a certain condition has been satisfied.

In compiler construction, strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts strong multiplications inside a loop into weaker additions – something that frequently occurs in array addressing.(Cooper, Simpson & Vick 1995, p. 1)

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification of the programming language in which the source code is written. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

In computer science, a loop invariant is a property of a program loop that is true before each iteration. It is a logical assertion, sometimes checked with a code assertion. Knowing its invariant(s) is essential in understanding the effect of a loop.

Loop splitting is a compiler optimization technique. It attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range.

Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation can be undertaken manually by the programmer or by an optimizing compiler. On modern processors, loop unrolling is often counterproductive, as the increased code size can cause more cache misses; cf. Duff's device.

In computer programming, a nested function is a named function that is defined within another, enclosing, block and is lexically scoped within the enclosing block – meaning it is only callable by name within the body of the enclosing block and can use identifiers declared in outer blocks, including outer functions. The enclosing block is typically, but not always, another function.

In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster.

Automatic parallelization, also auto parallelization, or autoparallelization refers to converting sequential code into multi-threaded and/or vectorized code in order to use multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine. Fully automatic parallelization of sequential programs is a challenge because it requires complex program analysis and the best approach may depend upon parameter values that are not known at compilation time.

Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, which processes one operation on multiple pairs of operands at once. For example, modern conventional computers, including specialized supercomputers, typically have vector operations that simultaneously perform operations such as the following four additions :

In computer science, an induction variable is a variable that gets increased or decreased by a fixed amount on every iteration of a loop or is a linear function of another induction variable.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In compiler theory, a greatest common divisor test is the test used in study of loop optimization and loop dependence analysis to test the dependency between loop statements.

Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.

References

  1. Moyen, Jean-Yves; Rubiano, Thomas; Seiller, Thomas (2017). "Loop Quasi-Invariant Chunk Detection". Automated Technology for Verification and Analysis. Lecture Notes in Computer Science. Vol. 10482. pp. 91–108. doi:10.1007/978-3-319-68167-2_7. ISBN   978-3-319-68166-5.