Mixing (mathematics)

Last updated
Repeated application of the baker's map to points colored red and blue, initially separated. The baker's map is mixing, shown by the red and blue points being completely mixed after several iterations. Baker's map mixing.gif
Repeated application of the baker's map to points colored red and blue, initially separated. The baker's map is mixing, shown by the red and blue points being completely mixed after several iterations.

In mathematics, mixing is an abstract concept originating from physics: the attempt to describe the irreversible thermodynamic process of mixing in the everyday world: e.g. mixing paint, mixing drinks, industrial mixing.

Contents

The concept appears in ergodic theory—the study of stochastic processes and measure-preserving dynamical systems. Several different definitions for mixing exist, including strong mixing, weak mixing and topological mixing, with the last not requiring a measure to be defined. Some of the different definitions of mixing can be arranged in a hierarchical order; thus, strong mixing implies weak mixing. Furthermore, weak mixing (and thus also strong mixing) implies ergodicity: that is, every system that is weakly mixing is also ergodic (and so one says that mixing is a "stronger" condition than ergodicity).

Informal explanation

The mathematical definition of mixing aims to capture the ordinary every-day process of mixing, such as mixing paints, drinks, cooking ingredients, industrial process mixing, smoke in a smoke-filled room, and so on. To provide the mathematical rigor, such descriptions begin with the definition of a measure-preserving dynamical system, written as .

The set is understood to be the total space to be filled: the mixing bowl, the smoke-filled room, etc. The measure is understood to define the natural volume of the space and of its subspaces. The collection of subspaces is denoted by , and the size of any given subset is ; the size is its volume. Naively, one could imagine to be the power set of ; this doesn't quite work, as not all subsets of a space have a volume (famously, the Banach–Tarski paradox). Thus, conventionally, consists of the measurable subsets—the subsets that do have a volume. It is always taken to be a Borel set—the collection of subsets that can be constructed by taking intersections, unions and set complements; these can always be taken to be measurable.

The time evolution of the system is described by a map . Given some subset , its map will in general be a deformed version of – it is squashed or stretched, folded or cut into pieces. Mathematical examples include the baker's map and the horseshoe map, both inspired by bread-making. The set must have the same volume as ; the squashing/stretching does not alter the volume of the space, only its distribution. Such a system is "measure-preserving" (area-preserving, volume-preserving).

A formal difficulty arises when one tries to reconcile the volume of sets with the need to preserve their size under a map. The problem arises because, in general, several different points in the domain of a function can map to the same point in its range; that is, there may be with . Worse, a single point has no size. These difficulties can be avoided by working with the inverse map ; it will map any given subset to the parts that were assembled to make it: these parts are . It has the important property of not "losing track" of where things came from. More strongly, it has the important property that any (measure-preserving) map is the inverse of some map . The proper definition of a volume-preserving map is one for which because describes all the pieces-parts that came from.

One is now interested in studying the time evolution of the system. If a set eventually visits all of over a long period of time (that is, if approaches all of for large ), the system is said to be ergodic. If every set behaves in this way, the system is a conservative system, placed in contrast to a dissipative system, where some subsets wander away, never to be returned to. An example would be water running downhill—once it's run down, it will never come back up again. The lake that forms at the bottom of this river can, however, become well-mixed. The ergodic decomposition theorem states that every ergodic system can be split into two parts: the conservative part, and the dissipative part.

Mixing is a stronger statement than ergodicity. Mixing asks for this ergodic property to hold between any two sets , and not just between some set and . That is, given any two sets , a system is said to be (topologically) mixing if there is an integer such that, for all and , one has that . Here, denotes set intersection and is the empty set.

The above definition of topological mixing should be enough to provide an informal idea of mixing (it is equivalent to the formal definition, given below). However, it made no mention of the volume of and , and, indeed, there is another definition that explicitly works with the volume. Several, actually; one has both strong mixing and weak mixing; they are inequivalent, although a strong mixing system is always weakly mixing. The measure-based definitions are not compatible with the definition of topological mixing: there are systems which are one, but not the other. The general situation remains cloudy: for example, given three sets , one can define 3-mixing. As of 2020, it is not known if 2-mixing implies 3-mixing. (If one thinks of ergodicity as "1-mixing", then it is clear that 1-mixing does not imply 2-mixing; there are systems that are ergodic but not mixing.)

The concept of strong mixing is made in reference to the volume of a pair of sets. Consider, for example, a set of colored dye that is being mixed into a cup of some sort of sticky liquid, say, corn syrup, or shampoo, or the like. Practical experience shows that mixing sticky fluids can be quite hard: there is usually some corner of the container where it is hard to get the dye mixed into. Pick as set that hard-to-reach corner. The question of mixing is then, can , after a long enough period of time, not only penetrate into but also fill with the same proportion as it does elsewhere?

One phrases the definition of strong mixing as the requirement that

The time parameter serves to separate and in time, so that one is mixing while holding the test volume fixed. The product is a bit more subtle. Imagine that the volume is 10% of the total volume, and that the volume of dye will also be 10% of the grand total. If is uniformly distributed, then it is occupying 10% of , which itself is 10% of the total, and so, in the end, after mixing, the part of that is in is 1% of the total volume. That is, This product-of-volumes has more than passing resemblance to Bayes theorem in probabilities; this is not an accident, but rather a consequence that measure theory and probability theory are the same theory: they share the same axioms (the Kolmogorov axioms), even as they use different notation.

The reason for using instead of in the definition is a bit subtle, but it follows from the same reasons why was used to define the concept of a measure-preserving map. When looking at how much dye got mixed into the corner , one wants to look at where that dye "came from" (presumably, it was poured in at the top, at some time in the past). One must be sure that every place it might have "come from" eventually gets mixed into .

Mixing in dynamical systems

Let be a measure-preserving dynamical system, with T being the time-evolution or shift operator. The system is said to be strong mixing if, for any , one has

For shifts parametrized by a continuous variable instead of a discrete integer n, the same definition applies, with replaced by with g being the continuous-time parameter.

A dynamical system is said to be weak mixing if one has

In other words, is strong mixing if in the usual sense, weak mixing if

in the Cesàro sense, and ergodic if in the Cesàro sense. Hence, strong mixing implies weak mixing, which implies ergodicity. However, the converse is not true: there exist ergodic dynamical systems which are not weakly mixing, and weakly mixing dynamical systems which are not strongly mixing. The Chacon system was historically the first example given of a system that is weak-mixing but not strong-mixing. [1]

Theorem. Weak mixing implies ergodicity.

Proof. If the action of the map decomposes into two components , then we have , so weak mixing implies , so one of has zero measure, and the other one has full measure.

Covering families

Given a topological space, such as the unit interval (whether it has its end points or not), we can construct a measure on it by taking the open sets, then take their unions, complements, unions, complements, and so on to infinity, to obtain all the Borel sets. Next, we define a measure on the Borel sets, then add in all the subsets of measure-zero ("negligible sets"). This is how we obtain the Lebesgue measure and the Lebesgue measurable sets.

In most applications of ergodic theory, the underlying space is almost-everywhere isomorphic to an open subset of some , and so it is a Lebesgue measure space. Verifying strong-mixing can be simplified if we only need to check a smaller set of measurable sets.

A covering family is a set of measurable sets, such that any open set is a disjoint union of sets in it. Compare this with base in topology, which is less restrictive as it allows non-disjoint unions.

Theorem. For Lebesgue measure spaces, if is measure-preserving, and for all in a covering family, then is strong mixing.

Proof. Extend the mixing equation from all in the covering family, to all open sets by disjoint union, to all closed sets by taking the complement, to all measurable sets by using the regularity of Lebesgue measure to approximate any set with open and closed sets. Thus, for all measurable .

L2 formulation

The properties of ergodicity, weak mixing and strong mixing of a measure-preserving dynamical system can also be characterized by the average of observables. By von Neumann's ergodic theorem, ergodicity of a dynamical system is equivalent to the property that, for any function , the sequence converges strongly and in the sense of Cesàro to , i.e.,

A dynamical system is weakly mixing if, for any functions and

A dynamical system is strongly mixing if, for any function , the sequence converges weakly to , i.e., for any function

Since the system is assumed to be measure preserving, this last line is equivalent to saying that the covariance , so that the random variables and become orthogonal as grows. Actually, since this works for any function , one can informally see mixing as the property that the random variables and become independent as grows.

Products of dynamical systems

Given two measured dynamical systems and one can construct a dynamical system on the Cartesian product by defining We then have the following characterizations of weak mixing: [2]

Proposition. A dynamical system is weakly mixing if and only if, for any ergodic dynamical system , the system is also ergodic.
Proposition. A dynamical system is weakly mixing if and only if is also ergodic. If this is the case, then is also weakly mixing.

Generalizations

The definition given above is sometimes called strong 2-mixing, to distinguish it from higher orders of mixing. A strong 3-mixing system may be defined as a system for which

holds for all measurable sets A, B, C. We can define strong k-mixing similarly. A system which is strongk-mixing for all k = 2,3,4,... is called mixing of all orders.

It is unknown whether strong 2-mixing implies strong 3-mixing. It is known that strong m-mixing implies ergodicity.

Examples

Irrational rotations of the circle, and more generally irreducible translations on a torus, are ergodic but neither strongly nor weakly mixing with respect to the Lebesgue measure.

Many maps considered as chaotic are strongly mixing for some well-chosen invariant measure, including: the dyadic map, Arnold's cat map, horseshoe maps, Kolmogorov automorphisms, and the Anosov flow (the geodesic flow on the unit tangent bundle of compact manifolds of negative curvature.)

The dyadic map is "shift to left in binary". In general, for any , the "shift to left in base " map is strongly mixing on the covering family , therefore it is strongly mixing on , and therefore it is strongly mixing on .

Similarly, for any finite or countable alphabet , we can impose a discrete probability distribution on it, then consider the probability distribution on the "coin flip" space, where each "coin flip" can take results from . We can either construct the singly-infinite space or the doubly-infinite space . In both cases, the shift map (one letter to the left) is strongly mixing, since it is strongly mixing on the covering family of cylinder sets. The Baker's map is isomorphic to a shift map, so it is strongly mixing.

Topological mixing

A form of mixing may be defined without appeal to a measure, using only the topology of the system. A continuous map is said to be topologically transitive if, for every pair of non-empty open sets , there exists an integer n such that

where is the nth iterate of f. In the operator theory, a topologically transitive bounded linear operator (a continuous linear map on a topological vector space) is usually called hypercyclic operator. A related idea is expressed by the wandering set.

Lemma: If X is a complete metric space with no isolated point, then f is topologically transitive if and only if there exists a hypercyclic point , that is, a point x such that its orbit is dense in X.

A system is said to be topologically mixing if, given open sets and , there exists an integer N, such that, for all , one has

For a continuous-time system, is replaced by the flow , with g being the continuous parameter, with the requirement that a non-empty intersection hold for all .

A weak topological mixing is one that has no non-constant continuous (with respect to the topology) eigenfunctions of the shift operator.

Topological mixing neither implies, nor is implied by either weak or strong mixing: there are examples of systems that are weak mixing but not topologically mixing, and examples that are topologically mixing but not strong mixing.

Mixing in stochastic processes

Let be a stochastic process on a probability space . The sequence space into which the process maps can be endowed with a topology, the product topology. The open sets of this topology are called cylinder sets. These cylinder sets generate a σ-algebra, the Borel σ-algebra; this is the smallest σ-algebra that contains the topology.

Define a function , called the strong mixing coefficient, as

for all . The symbol , with denotes a sub-σ-algebra of the σ-algebra; it is the set of cylinder sets that are specified between times a and b, i.e. the σ-algebra generated by .

The process is said to be strongly mixing if as . That is to say, a strongly mixing process is such that, in a way that is uniform over all times and all events, the events before time and the events after time tend towards being independent as ; more colloquially, the process, in a strong sense, forgets its history.

Mixing in Markov processes

Suppose were a stationary Markov process with stationary distribution and let denote the space of Borel-measurable functions that are square-integrable with respect to the measure . Also let

denote the conditional expectation operator on Finally, let

denote the space of square-integrable functions with mean zero.

The ρ-mixing coefficients of the process {xt} are

The process is called ρ-mixing if these coefficients converge to zero as t → ∞, and “ρ-mixing with exponential decay rate” if ρt < eδt for some δ > 0. For a stationary Markov process, the coefficients ρt may either decay at an exponential rate, or be always equal to one. [3]

The α-mixing coefficients of the process {xt} are

The process is called α-mixing if these coefficients converge to zero as t → ∞, it is "α-mixing with exponential decay rate" if αt<γeδt for some δ > 0, and it is α-mixing with a sub-exponential decay rate if αt<ξ(t) for some non-increasing function satisfying

as . [3]

The α-mixing coefficients are always smaller than the ρ-mixing ones: αtρt, therefore if the process is ρ-mixing, it will necessarily be α-mixing too. However, when ρt = 1, the process may still be α-mixing, with sub-exponential decay rate.

The β-mixing coefficients are given by

The process is called β-mixing if these coefficients converge to zero as t → ∞, it is β-mixing with an exponential decay rate if βt<γeδt for some δ > 0, and it is β-mixing with a sub-exponential decay rate if βtξ(t) → 0 as t → ∞ for some non-increasing function satisfying

as . [3]

A strictly stationary Markov process is β-mixing if and only if it is an aperiodic recurrent Harris chain. The β-mixing coefficients are always bigger than the α-mixing ones, so if a process is β-mixing it will also be α-mixing. There is no direct relationship between β-mixing and ρ-mixing: neither of them implies the other.

Related Research Articles

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

In mathematics, the Lp spaces are function spaces defined using a natural generalization of the p-norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue, although according to the Bourbaki group they were first introduced by Frigyes Riesz.

Distributions, also known as Schwartz distributions or generalized functions, are objects that generalize the classical notion of functions in mathematical analysis. Distributions make it possible to differentiate functions whose derivatives do not exist in the classical sense. In particular, any locally integrable function has a distributional derivative.

In mathematics, Fatou's lemma establishes an inequality relating the Lebesgue integral of the limit inferior of a sequence of functions to the limit inferior of integrals of these functions. The lemma is named after Pierre Fatou.

<span class="mw-page-title-main">Jensen's inequality</span> Theorem of convex functions

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.

In mathematics, a measure-preserving dynamical system is an object of study in the abstract formulation of dynamical systems, and ergodic theory in particular. Measure-preserving systems obey the Poincaré recurrence theorem, and are a special case of conservative systems. They provide the formal, mathematical basis for a broad range of physical systems, and, in particular, many systems from classical mechanics as well as systems in thermodynamic equilibrium.

In mathematics, the total variation identifies several slightly different concepts, related to the (local or global) structure of the codomain of a function or a measure. For a real-valued continuous function f, defined on an interval [a, b] ⊂ R, its total variation on the interval of definition is a measure of the one-dimensional arclength of the curve with parametric equation xf(x), for x ∈ [a, b]. Functions whose total variation is finite are called functions of bounded variation.

In mathematics, the Gauss–Kuzmin–Wirsing operator is the transfer operator of the Gauss map that takes a positive number to the fractional part of its reciprocal. It is named after Carl Gauss, Rodion Kuzmin, and Eduard Wirsing. It occurs in the study of continued fractions; it is also related to the Riemann zeta function.

In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies that the average behavior of the system can be deduced from the trajectory of a "typical" point. Equivalently, a sufficiently large collection of random samples from a process can represent the average statistical properties of the entire process. Ergodicity is a property of the system; it is a statement that the system cannot be reduced or factored into smaller components. Ergodic theory is the study of systems possessing ergodicity.

In mathematics, more specifically measure theory, there are various notions of the convergence of measures. For an intuitive general sense of what is meant by convergence of measures, consider a sequence of measures μn on a space, sharing a common collection of measurable sets. Such a sequence might represent an attempt to construct 'better and better' approximations to a desired measure μ that is difficult to obtain directly. The meaning of 'better and better' is subject to all the usual caveats for taking limits; for any error tolerance ε > 0 we require there be N sufficiently large for nN to ensure the 'difference' between μn and μ is smaller than ε. Various notions of convergence specify precisely what the word 'difference' should mean in that description; these notions are not equivalent to one another, and vary in strength.

In mathematics, the attractor of a random dynamical system may be loosely thought of as a set to which the system evolves after a long enough time. The basic idea is the same as for a deterministic dynamical system, but requires careful treatment because random dynamical systems are necessarily non-autonomous. This requires one to consider the notion of a pullback attractor or attractor in the pullback sense.

In mathematics, an invariant measure is a measure that is preserved by some function. The function may be a geometric transformation. For examples, circular angle is invariant under rotation, hyperbolic angle is invariant under squeeze mapping, and a difference of slopes is invariant under shear mapping.

In mathematics, uniform integrability is an important concept in real analysis, functional analysis and measure theory, and plays a vital role in the theory of martingales.

In mathematical analysis, and especially in real, harmonic analysis and functional analysis, an Orlicz space is a type of function space which generalizes the Lp spaces. Like the Lp spaces, they are Banach spaces. The spaces are named for Władysław Orlicz, who was the first to define them in 1932.

In mathematics, the Pettis integral or Gelfand–Pettis integral, named after Israel M. Gelfand and Billy James Pettis, extends the definition of the Lebesgue integral to vector-valued functions on a measure space, by exploiting duality. The integral was introduced by Gelfand for the case when the measure space is an interval with Lebesgue measure. The integral is also called the weak integral in contrast to the Bochner integral, which is the strong integral.

In mathematics, especially measure theory, a set function is a function whose domain is a family of subsets of some given set and that (usually) takes its values in the extended real number line which consists of the real numbers and

In mathematics, lifting theory was first introduced by John von Neumann in a pioneering paper from 1931, in which he answered a question raised by Alfréd Haar. The theory was further developed by Dorothy Maharam (1958) and by Alexandra Ionescu Tulcea and Cassius Ionescu Tulcea (1961). Lifting theory was motivated to a large extent by its striking applications. Its development up to 1969 was described in a monograph of the Ionescu Tulceas. Lifting theory continued to develop since then, yielding new results and applications.

In mathematics, a Markov odometer is a certain type of topological dynamical system. It plays a fundamental role in ergodic theory and especially in orbit theory of dynamical systems, since a theorem of H. Dye asserts that every ergodic nonsingular transformation is orbit-equivalent to a Markov odometer.

In mathematics, a filter on a set informally gives a notion of which subsets are "large". Filter quantifiers are a type of logical quantifier which, informally, say whether or not a statement is true for "most" elements of Such quantifiers are often used in combinatorics, model theory, and in other fields of mathematical logic where (ultra)filters are used.

In the mathematical discipline of ergodic theory, a Sinai–Ruelle–Bowen (SRB) measure is an invariant measure that behaves similarly to, but is not an ergodic measure. In order to be ergodic, the time average would need to be equal the space average for almost all initial states , with being the phase space. For an SRB measure , it suffices that the ergodicity condition be valid for initial states in a set of positive Lebesgue measure.

References

  1. Matthew Nicol and Karl Petersen, (2009) "Ergodic Theory: Basic Examples and Constructions", Encyclopedia of Complexity and Systems Science, Springer https://doi.org/10.1007/978-0-387-30440-3_177
  2. Theorem 2.36, Manfred Einsiedler and Thomas Ward, Ergodic theory with a view towards number theory, (2011) Springer ISBN   978-0-85729-020-5
  3. 1 2 3 Chen, Hansen & Carrasco (2010)