Memoization

Last updated

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again. Memoization has also been used in other contexts (and for purposes other than speed gains), such as in simple mutually recursive descent parsing. [1] It is a type of caching, distinct from other forms of caching such as buffering and page replacement. In the context of some logic programming languages, memoization is also known as tabling. [2]

Contents

Etymology

The term memoization was coined by Donald Michie in 1968 [3] and is derived from the Latin word memorandum ('to be remembered'), usually truncated as memo in American English, and thus carries the meaning of 'turning [the results of] a function into something to be remembered'. While memoization might be confused with memorization (because they are etymological cognates), memoization has a specialized meaning in computing.

Overview

A memoized function "remembers" the results corresponding to some set of specific inputs. Subsequent calls with remembered inputs return the remembered result rather than recalculating it, thus eliminating the primary cost of a call with given parameters from all but the first call made to the function with those parameters. The set of remembered associations may be a fixed-size set controlled by a replacement algorithm or a fixed set, depending on the nature of the function and its use. A function can only be memoized if it is referentially transparent; that is, only if calling the function has exactly the same effect as replacing that function call with its return value. (Special case exceptions to this restriction exist, however.) While related to lookup tables, since memoization often uses such tables in its implementation, memoization populates its cache of results transparently on the fly, as needed, rather than in advance.

Memoized functions are optimized for speed in exchange for a higher use of computer memory space. The time/space "cost" of algorithms has a specific name in computing: computational complexity . All functions have a computational complexity in time (i.e. they take time to execute) and in space.

Although a space–time tradeoff occurs (i.e., space used is speed gained), this differs from some other optimizations that involve time-space trade-off, such as strength reduction, in that memoization is a run-time rather than compile-time optimization. Moreover, strength reduction potentially replaces a costly operation such as multiplication with a less costly operation such as addition, and the results in savings can be highly machine-dependent (non-portable across machines), whereas memoization is a more machine-independent, cross-platform strategy.

Consider the following pseudocode function to calculate the factorial of n:

function factorial (n is a non-negative integer)     if n is 0 then         return 1 [by the convention that0! = 1]     else         return factorial(n – 1) times n [recursively invoke factorialwith the parameter 1 less than n]     end if end function

For every integer n such that n ≥ 0, the final result of the function factorial is invariant; if invoked as x = factorial(3), the result is such that x will always be assigned the value 6. The non-memoized implementation above, given the nature of the recursive algorithm involved, would require n + 1 invocations of factorial to arrive at a result, and each of these invocations, in turn, has an associated cost in the time it takes the function to return the value computed. Depending on the machine, this cost might be the sum of:

  1. The cost to set up the functional call stack frame.
  2. The cost to compare n to 0.
  3. The cost to subtract 1 from n.
  4. The cost to set up the recursive call stack frame. (As above.)
  5. The cost to multiply the result of the recursive call to factorial by n.
  6. The cost to store the return result so that it may be used by the calling context.

In a non-memoized implementation, every top-level call to factorial includes the cumulative cost of steps 2 through 6 proportional to the initial value of n.

A memoized version of the factorial function follows:

function factorial (n is a non-negative integer)     if n is 0 then         return 1 [by the convention that0! = 1]     else if n is in lookup-table then         return lookup-table-value-for-n     else         let x = factorial(n – 1) times n [recursively invoke factorialwith the parameter 1 less than n]         store x in lookup-table in the nth slot [remember the result of n! for later]         return x     end if end function

In this particular example, if factorial is first invoked with 5, and then invoked later with any value less than or equal to five, those return values will also have been memoized, since factorial will have been called recursively with the values 5, 4, 3, 2, 1, and 0, and the return values for each of those will have been stored. If it is then called with a number greater than 5, such as 7, only 2 recursive calls will be made (7 and 6), and the value for 5! will have been stored from the previous call. In this way, memoization allows a function to become more time-efficient the more often it is called, thus resulting in eventual overall speed-up.

Some other considerations

Functional programming

Memoization is heavily used in compilers for functional programming languages, which often use call by name evaluation strategy. To avoid overhead with calculating argument values, compilers for these languages heavily use auxiliary functions called thunks to compute the argument values, and memoize these functions to avoid repeated calculations.

Automatic memoization

While memoization may be added to functions internally and explicitly by a computer programmer in much the same way the above memoized version of factorial is implemented, referentially transparent functions may also be automatically memoized externally. [1] The techniques employed by Peter Norvig have application not only in Common Lisp (the language in which his paper demonstrated automatic memoization), but also in various other programming languages. Applications of automatic memoization have also been formally explored in the study of term rewriting [4] and artificial intelligence. [5]

In programming languages where functions are first-class objects (such as Lua, Python, or Perl [6] ), automatic memoization can be implemented by replacing (at run-time) a function with its calculated value once a value has been calculated for a given set of parameters. The function that does this value-for-function-object replacement can generically wrap any referentially transparent function. Consider the following pseudocode (where it is assumed that functions are first-class values):

function memoized-call (F is a function object parameter)     if F has no attached array values then         allocate an associative array called values;         attach values to F;     end if;      if F.values[arguments] is empty then         F.values[arguments] = F(arguments);     end if;      return F.values[arguments]; end function

In order to call an automatically memoized version of factorial using the above strategy, rather than calling factorial directly, code invokes memoized-call(factorial(n)). Each such call first checks to see if a holder array has been allocated to store results, and if not, attaches that array. If no entry exists at the position values[arguments] (where arguments are used as the key of the associative array), a real call is made to factorial with the supplied arguments. Finally, the entry in the array at the key position is returned to the caller.

The above strategy requires explicit wrapping at each call to a function that is to be memoized. In those languages that allow closures, memoization can be effected implicitly via a functor factory that returns a wrapped memoized function object in a decorator pattern. In pseudocode, this can be expressed as follows:

function construct-memoized-functor (F is a function object parameter)     allocate a function object called memoized-version;      let memoized-version(arguments) be         if self has no attached array values then [self is a reference to this object]             allocate an associative array called values;             attach values to self;         end if;          if self.values[arguments] is empty then             self.values[arguments] = F(arguments);         end if;          return self.values[arguments];     end let;      return memoized-version; end function

Rather than call factorial, a new function object memfact is created as follows:

 memfact = construct-memoized-functor(factorial)

The above example assumes that the function factorial has already been defined before the call to construct-memoized-functor is made. From this point forward, memfact(n) is called whenever the factorial of n is desired. In languages such as Lua, more sophisticated techniques exist which allow a function to be replaced by a new function with the same name, which would permit:

 factorial = construct-memoized-functor(factorial)

Essentially, such techniques involve attaching the original function object to the created functor and forwarding calls to the original function being memoized via an alias when a call to the actual function is required (to avoid endless recursion), as illustrated below:

function construct-memoized-functor (F is a function object parameter)     allocate a function object called memoized-version;      let memoized-version(arguments) be         if self has no attached array values then [self is a reference to this object]             allocate an associative array called values;             attach values to self;             allocate a new function object called alias;             attach alias to self; [for later ability to invoke F indirectly]             self.alias = F;         end if;          if self.values[arguments] is empty then             self.values[arguments] = self.alias(arguments); [not a direct call to F]         end if;          return self.values[arguments];     end let;      return memoized-version; end function

(Note: Some of the steps shown above may be implicitly managed by the implementation language and are provided for illustration.)

Parsers

When a top-down parser tries to parse an ambiguous input with respect to an ambiguous context-free grammar (CFG), it may need an exponential number of steps (with respect to the length of the input) to try all alternatives of the CFG in order to produce all possible parse trees. This eventually would require exponential memory space. Memoization was explored as a parsing strategy in 1991 by Peter Norvig, who demonstrated that an algorithm similar to the use of dynamic programming and state-sets in Earley's algorithm (1970), and tables in the CYK algorithm of Cocke, Younger and Kasami, could be generated by introducing automatic memoization to a simple backtracking recursive descent parser to solve the problem of exponential time complexity. [1] The basic idea in Norvig's approach is that when a parser is applied to the input, the result is stored in a memotable for subsequent reuse if the same parser is ever reapplied to the same input.

Richard Frost and Barbara Szydlowski also used memoization to reduce the exponential time complexity of parser combinators, describing the result as a memoizing purely functional top-down backtracking language processor. [7] Frost showed that basic memoized parser combinators can be used as building blocks to construct complex parsers as executable specifications of CFGs. [8] [9]

Memoization was again explored in the context of parsing in 1995 by Mark Johnson and Jochen Dörre. [10] [11] In 2002, it was examined in considerable depth by Bryan Ford in the form called packrat parsing. [12]

In 2007, Frost, Hafiz and Callaghan[ citation needed ] described a top-down parsing algorithm that uses memoization for refraining redundant computations to accommodate any form of ambiguous CFG in polynomial time (Θ(n4) for left-recursive grammars and Θ(n3) for non left-recursive grammars). Their top-down parsing algorithm also requires polynomial space for potentially exponential ambiguous parse trees by 'compact representation' and 'local ambiguities grouping'. Their compact representation is comparable with Tomita's compact representation of bottom-up parsing. [13] Their use of memoization is not only limited to retrieving the previously computed results when a parser is applied to a same input position repeatedly (which is essential for polynomial time requirement); it is specialized to perform the following additional tasks:

Frost, Hafiz and Callaghan also described the implementation of the algorithm in PADL’08[ citation needed ] as a set of higher-order functions (called parser combinators) in Haskell, which enables the construction of directly executable specifications of CFGs as language processors. The importance of their polynomial algorithm's power to accommodate ‘any form of ambiguous CFG’ with top-down parsing is vital with respect to the syntax and semantics analysis during natural language processing. The X-SAIGA site has more about the algorithm and implementation details.

While Norvig increased the power of the parser through memoization, the augmented parser was still as time complex as Earley's algorithm, which demonstrates a case of the use of memoization for something other than speed optimization. Johnson and Dörre [11] demonstrate another such non-speed related application of memoization: the use of memoization to delay linguistic constraint resolution to a point in a parse where sufficient information has been accumulated to resolve those constraints. By contrast, in the speed optimization application of memoization, Ford demonstrated that memoization could guarantee that parsing expression grammars could parse in linear time even those languages that resulted in worst-case backtracking behavior. [12]

Consider the following grammar:

S → (A c) | (B d) A → X (a|b) B → X b X → x [X]

(Notation note: In the above example, the production S → (A c) | (B d) reads: "An S is either an A followed by a c or a B followed by a d." The production X → x [X] reads "An X is an x followed by an optional X.")

This grammar generates one of the following three variations of string: xac, xbc, or xbd (where x here is understood to mean one or more x's.) Next, consider how this grammar, used as a parse specification, might effect a top-down, left-right parse of the string xxxxxbd:

The rule A will recognize xxxxxb (by first descending into X to recognize one x, and again descending into X until all the x's are consumed, and then recognizing the b), and then return to S, and fail to recognize a c. The next clause of S will then descend into B, which in turn again descends into X and recognizes the x's by means of many recursive calls to X, and then a b, and returns to S and finally recognizes a d.

The key concept here is inherent in the phrase again descends into X. The process of looking forward, failing, backing up, and then retrying the next alternative is known in parsing as backtracking, and it is primarily backtracking that presents opportunities for memoization in parsing. Consider a function RuleAcceptsSomeInput(Rule, Position, Input), where the parameters are as follows:

Let the return value of the function RuleAcceptsSomeInput be the length of the input accepted by Rule, or 0 if that rule does not accept any input at that offset in the string. In a backtracking scenario with such memoization, the parsing process is as follows:

When the rule A descends into X at offset 0, it memoizes the length 5 against that position and the rule X. After having failed at d, B then, rather than descending again into X, queries the position 0 against rule X in the memoization engine, and is returned a length of 5, thus saving having to actually descend again into X, and carries on as if it had descended into X as many times as before.

In the above example, one or many descents into X may occur, allowing for strings such as xxxxxxxxxxxxxxxxbd. In fact, there may be any number of x's before the b. While the call to S must recursively descend into X as many times as there are x's, B will never have to descend into X at all, since the return value of RuleAcceptsSomeInput(X, 0, xxxxxxxxxxxxxxxxbd) will be 16 (in this particular case).

Those parsers that make use of syntactic predicates are also able to memoize the results of predicate parses, as well, thereby reducing such constructions as:

S → (A)? A A → /* some rule */

to one descent into A.

If a parser builds a parse tree during a parse, it must memoize not only the length of the input that matches at some offset against a given rule, but also must store the sub-tree that is generated by that rule at that offset in the input, since subsequent calls to the rule by the parser will not actually descend and rebuild that tree. For the same reason, memoized parser algorithms that generate calls to external code (sometimes called a semantic action routine) when a rule matches must use some scheme to ensure that such rules are invoked in a predictable order.

Since, for any given backtracking or syntactic predicate capable parser not every grammar will need backtracking or predicate checks, the overhead of storing each rule's parse results against every offset in the input (and storing the parse tree if the parsing process does that implicitly) may actually slow down a parser. This effect can be mitigated by explicit selection of those rules the parser will memoize. [14]

See also

Related Research Articles

In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluations.

<span class="mw-page-title-main">Recursion</span> Process of repeating items in a self-similar way

Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics and computer science, where a function being defined is applied within its own definition. While this apparently defines an infinite number of instances, it is often done in such a way that no infinite loop or infinite chain of references can occur.

<span class="mw-page-title-main">Scheme (programming language)</span> Dialect of Lisp

Scheme is a dialect of the Lisp family of programming languages. Scheme was created during the 1970s at the MIT Computer Science and Artificial Intelligence Laboratory and released by its developers, Guy L. Steele and Gerald Jay Sussman, via a series of memos now known as the Lambda Papers. It was the first dialect of Lisp to choose lexical scope and the first to require implementations to perform tail-call optimization, giving stronger support for functional programming and associated techniques such as recursive algorithms. It was also one of the first programming languages to support first-class continuations. It had a significant influence on the effort that led to the development of Common Lisp.

In computer science, denotational semantics is an approach of formalizing the meanings of programming languages by constructing mathematical objects that describe the meanings of expressions from the languages. Other approaches providing formal semantics of programming languages include axiomatic semantics and operational semantics.

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

Standard ML (SML) is a general-purpose, high-level, modular, functional programming language with compile-time type checking and type inference. It is popular for writing compilers, for programming language research, and for developing theorem provers.

In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow.

Type inference, sometimes called type reconstruction, refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical type systems, but also natural languages in some branches of computer science and linguistics.

Top-down parsing in computer science is a parsing strategy where one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. LL parsers are a type of parser that uses a top-down parsing strategy.

Top-Down Parsing Language (TDPL) is a type of analytic formal grammar developed by Alexander Birman in the early 1970s in order to study formally the behavior of a common class of practical top-down parsers that support a limited form of backtracking. Birman originally named his formalism the TMG Schema (TS), after TMG, an early parser generator, but it was later given the name TDPL by Aho and Ullman in their classic anthology The Theory of Parsing, Translation and Compiling.

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

The Packrat parser is a type of parser that shares similarities with the recursive descent parser in its construction. However, it differs because it takes parsing expression grammars (PEGs) as input rather than LL grammars.

In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion is particularly useful, and is often easy to optimize in implementations.

Computable functions are the basic objects of study in computability theory. Computable functions are the formalized analogue of the intuitive notion of algorithms, in the sense that a function is computable if there exists an algorithm that can do the job of the function, i.e. given an input of the function domain it can return the corresponding output. Computable functions are used to discuss computability without referring to any concrete model of computation such as Turing machines or register machines. Any definition, however, must make reference to some specific model of computation but all valid definitions yield the same class of functions. Particular models of computability that give rise to the set of computable functions are the Turing-computable functions and the general recursive functions.

In computer science, corecursion is a type of operation that is dual to recursion. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, corecursion works synthetically, starting from a base case and building it up, iteratively producing data further removed from a base case. Put simply, corecursive algorithms use the data that they themselves produce, bit by bit, as they become available, and needed, to produce further bits of data. A similar but distinct concept is generative recursion, which may lack a definite "direction" inherent in corecursion and recursion.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In computer programming, a pure function is a function that has the following properties:

  1. the function return values are identical for identical arguments, and
  2. the function has no side effects.

In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of indices representing locations in the string where parsing stopped successfully. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.

The syntax and semantics of Prolog, a programming language, are the sets of rules that define how a Prolog program is written and how it is interpreted, respectively. The rules are laid out in ISO standard ISO/IEC 13211 although there are differences in the Prolog implementations.

This article describes the features in the programming language Haskell.

References

  1. 1 2 3 Norvig, Peter (1991). "Techniques for Automatic Memoization with Applications to Context-Free Parsing". Computational Linguistics. 17 (1): 91–98.
  2. Warren, David S. (1992-03-01). "Memoing for logic programs". Communications of the ACM. 35 (3): 93–111. doi: 10.1145/131295.131299 . ISSN   0001-0782.
  3. Michie, Donald (1968). "'Memo' Functions and Machine Learning" (PDF). Nature . 218 (5136): 19–22. Bibcode:1968Natur.218...19M. doi:10.1038/218019a0. S2CID   4265138.
  4. Hoffman, Berthold (1992). "Term Rewriting with Sharing and Memoïzation". In Kirchner, H.; Levi, G. (eds.). Algebraic and Logic Programming: Third International Conference, Proceedings, Volterra, Italy, 2–4 September 1992. Lecture Notes in Computer Science. Vol. 632. Berlin: Springer. pp. 128–142. doi:10.1007/BFb0013824. ISBN   978-3-540-55873-6.
  5. Mayfield, James; et al. (1995). "Using Automatic Memoization as a Software Engineering Tool in Real-World AI Systems" (PDF). Proceedings of the Eleventh IEEE Conference on Artificial Intelligence for Applications (CAIA '95). pp. 87–93. doi:10.1109/CAIA.1995.378786. hdl:11603/12722. ISBN   0-8186-7070-3. S2CID   8963326.
  6. "Bricolage: Memoization".
  7. Frost, Richard; Szydlowski, Barbara (1996). "Memoizing Purely Functional Top-Down Backtracking Language Processors". Sci. Comput. Program. 27 (3): 263–288. doi: 10.1016/0167-6423(96)00014-7 .
  8. Frost, Richard (1994). "Using Memoization to Achieve Polynomial Complexity of Purely Functional Executable Specifications of Non-Deterministic Top-Down Parsers". SIGPLAN Notices. 29 (4): 23–30. doi:10.1145/181761.181764. S2CID   10616505.
  9. Frost, Richard (2003). "Monadic Memoization towards Correctness-Preserving Reduction of Search". Canadian Conference on AI 2003. Lecture Notes in Computer Science. Vol. 2671. pp. 66–80. doi:10.1007/3-540-44886-1_8. ISBN   978-3-540-40300-5.
  10. Johnson, Mark (1995). "Memoization of Top-Down Parsing". Computational Linguistics. 21 (3): 405–417. arXiv: cmp-lg/9504016 . Bibcode:1995cmp.lg....4016J.
  11. 1 2 Johnson, Mark & Dörre, Jochen (1995). "Memoization of Coroutined Constraints". Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge, Massachusetts. arXiv: cmp-lg/9504028 .{{cite book}}: CS1 maint: location missing publisher (link)
  12. 1 2 Ford, Bryan (2002). Packrat Parsing: a Practical Linear-Time Algorithm with Backtracking (Master’s thesis). Massachusetts Institute of Technology. hdl:1721.1/87310.
  13. Tomita, Masaru (1985). Efficient Parsing for Natural Language. Boston: Kluwer. ISBN   0-89838-202-5.
  14. Acar, Umut A.; et al. (2003). "Selective Memoization". Proceedings of the 30th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 15–17 January 2003. Vol. 38. New Orleans, Louisiana. pp. 14–25. arXiv: 1106.0447 . doi:10.1145/640128.604133.{{cite book}}: |journal= ignored (help)CS1 maint: location missing publisher (link)
Examples of memoization in various programming languages