We flesh out the abstract definition of the previous section with examples. The first set, dealing with phase transitions, is what motivated the original work, but the others followed with surprising relevance.
3.1. Phase Transitions
In the usual definition of phase transitions—a breakdown in analyticity—there is no such thing as supercooled water. For example, Isakov [
9] proved that you cannot analytically continue the Ising model to a metastable phase (magnetization and magnetic field opposite to one another). However, if you look in handbooks [
10] you can find the density of water at
C. The problem is that the usual definition does not deal with non-uniform approximations. Once you let the number of particles or the volume go to infinity—which you must do to get a breakdown in analyticity—
somewhere in this vast volume a critical droplet [
11] will form and you no longer have a pure
metastable phase. However, for real water in finite volumes, even macroscopic volumes, a critical droplet can take a long time—aeons—to form (for water you have to go to
C to get short survival times and to about
C for supercooling to cease [
12]. This means that the time for formation of a critical droplet can be larger (in appropriate units) than the system size, and non-uniform limits are called for.
However, in stochastic dynamics a completely different definition is possible. As mentioned earlier, there is always an eigenvalue 1, and (for irreducible matrices) it is unique. What about the next few? A sign of a first order transition between two phases is if the next eigenvalue () is very close to 1 and all others are much smaller. This applies also to metastable phases. Moreover the left eigenvector is nearly constant on states in X belonging to each phase, making its value a discriminator between the two phases; hence the term “observable.” We will see shortly why all this is true.
It is simpler though to start with a Brownian motion example, namely a random walk on a potential surface. Take as the potential a surface with four minima, as in
Figure 1. The space
X is the 15-by-15 array of points on which the walk takes place. (So
R is 225-by-225.) The stochastic process usually goes downhill (to lower values of the potential), but with some probability (dependent on the temperature) will go uphill. Step sizes are one and periodic boundary conditions are used. For low temperature it is clear what the result will be. The walker stays in one minimum for a long time, then every once in a while will move to one of the others (usually not crossing the center), where it also stays a long time.
The matrix
R implements the rules just stated and can be diagonalized to yield eigenvalues and left eigenvectors. What is important is how close the eigenvalues are to 1, and for the example given the first five differences are
(for
). Note the increase in
at
. As we shall see, this suggests a 3-dimensional plot of the first 3 (non-trivial) left eigenvectors. This is shown in
Figure 2.
This is a perfect tetrahedron with the points having the largest probability shown with the largest markers. These correspond to the minima of the potential. Points near them have slightly lower probability while the peaks in
Figure 1, while not zero, have very small probability. (For the example given, each of the minima has probability 0.07, while the peak in the center has probability
.)
The next pair of figures (
Figure 3) require a bit of analysis, but lie behind the designation “observable representation” (OR). The upper figure shows the values of the three first (non-trivial) eigenvectors, and it is a mess. (The temperature is lower than for
Figure 2 to accentuate features.) However, confining attention to only the “big” probability states, which in this case is all those greater than
, one has the lower graph. Remarkably, each eigenvector only takes a few values, and any triplet of values characterizes a minimum (or a phase). This is shown in
Table 1. That table is like a code. Each eigenvector is nearly constant on a phase, but each phase has a unique characterization, using those constants. For this reason the leading left eigenvectors are called the “observables.”
Note that there are points that are not at the vertices of the tetrahedron. These points have low probability and are not part of any phase. However, they have another interesting property. Suppose you give their
barycentric coordinates (see
Appendix B) with respect to the extremum points, namely those points that form the vertices of the tetrahedron. Barycentric coordinates must add to 1. However, so do probabilities. In fact, the barycentric coordinate with respect to any given extremum (phase) is just the probability that a walker starting from the given point ends up in that particular phase. (This is a theorem. See below.)
There can also be surprises in the OR. By “surprise” I mean a property that I do not know how to prove. It turns out that the tetrahedral form of the OR is retained even for higher temperatures, even though the theorems that have been proved until now do not establish this. Then—suddenly—the figure becomes something altogether different. This is shown in
Figure 4. The change in the figures represents a relative temperature change (
) of order
. Up until
the OR is recognizably a tetrahedron. From
on it remains a ball. There is no recognizable change in the probability distribution nor in the spectrum (which no longer guarantees anything), but the change takes place in as short a temperature interval as I have patience to compute.
Phase transitions, whether between stable or metastable phases, have the same properties. The general notion follows from the eigenvalues of the matrix,
R, of transition probabilities. The criterion is that the first few non-trivial eigenvalues must be close to 1, while those following them drop off quickly. (For now I consider the case of detailed balance, so all eigenvalues are real.) Suppose
for
while
for
is much larger. Then the OR is a simplex of
extrema in Euclidean
n-space. Moreover, points deeply in the interior of the simplex (having barycentric coordinates with no nearly-1 component) have much lower probability (
is small for those points). A way to look at this is to use the spectral expansion. Let
T (a time) be sufficiently large that
is small for
but
is still close to 1 for
. Then
On this time scale the dynamics is fully equilibrated within each phase, and all that is left is the hopping between phases.
A proof of the simplex property is a bit complicated and for a full proof the reader is referred to [
13]. (In that paper “
m” is what is here called
.) For
the proof is simpler although even in this low dimension some preparation is needed. First,
by the basic orthogonality relation. Since (as usual) we assume irreducibility, we know that
is strictly positive. It follows that there is a positive maximum value of
, call it
and (one of) the point(s) where it takes that value
. Similarly there is a negative minimum value, call it
and define
analogously. Now take as your initial point
. In other words
. Evolve it for
T time steps. Clearly almost all points remain in the relevant phase, which follows from the spectral expansion. What we do know is that
since each application of
R to the left leaves
unchanged except for building up factors of
, and
by definition. Call
, which is the probability distribution after
T time steps starting from
,
. Similarly the same can be said for
and its corresponding
. It follows that
From the equalities
we subtract Equation (
7) to obtain
The quantity on the left is nearly zero. The contribution from eigenvalues beyond
is also negligible. It follows that to get the sums to be zero one or the other term in the respective products must be near zero. In other words, if
is not zero, i.e., if you are the phase associated with
then
is going to be very close to
. The same is true for
; both are (nearly) constant on their respective phases.
With this result we are able to define phases. As usual there is a certain arbitrariness involved. Suppose we pick an a and define the phase to mean (where is either M or m). The quantity a is small, but not too small, since as we shall see it appears in a significant denominator.
Our next goal is to establish that there is little probability outside the phases; that is
is small. Define
to be the complement of
. Then
And since
it follows that
The same relations hold with
m (minimum) replacing
M (maximum). Note that
may not be all that close to unity in some applications (for example in
Figure 4) and (relatively) high probability
x′s may occur at some distance from the extremum.
We next show that barycentric coordinates can serve as probabilities. For arbitrary
y we look at
. The fundamental equation for left eigenvectors can be written
At this point we need to give some normalization to
, since until now the only demand was
. There are two natural normalizations. One can set
for all
k, and this is the normalization we here adopt. A second normalization, particularly when detailed balance obtains (which is not demanded for what we now prove) is L
normalization, which will be described in detail in the next subsection. Either way we have
and therefore
Next
within a given phase is replaced by
for that phase with an error of order
a. It follows that
Two points should be noted about this equation. First, up to order
,
is simply 1, and this factor already appears on the right hand side, divided by
a, a number equal to or less than one. So set the
on the left to one. Consider the sum
: it is just the probability that a point starting at
y ends up in phase
. We give it a name:
. Equation (
13) becomes
Now
is the position of the point
y in the (one-dimensional) OR. What Equation (
14) says is that up to
,
and be expressed as a sum of the extremal points,
; that is, the quantities
are
barycentric coordinates. Or to restate it, to a(n often) good approximation, the probability to enter one or another phase is given by the barycentric coordinates in the OR.
As for the usual OR these results generalize to a multidimensional OR. See [
13] for details. To summarize: When there are
m eigenvalues that are real and close to 1, with the other
much smaller in absolute value, then the OR is a simplex. Phases are defined through proximity to the extrema of the simplex and the barycentric coordinates of points within the simplex are the probabilities that starting from that point you will end in one or another phase. The proof relied on the fact that there is an intermediate time at which points within a phase have come to equilibrium, but the overall distribution of phase occupation will depend on the initial conditions.
3.3. Rationale
In general, proximity in the OR reflects dynamical proximity. This was seen for phase transitions, for all sorts of Brownian motion and through the barycentric relation, which does not require detailed balance. On the other hand, if there is detailed balance a precise relation can be found.
We first generalize notation: Let be the probability distribution for points that were at the point y at . (This is a clear generalization of or .) Clearly .
For the case of detailed balance (
) we earlier saw that
played an important role. We now formalize that. Let
be a diagonal
N-by-
N matrix with
. Then as observed earlier
is symmetric and by virtue of its reality, Hermitian. As such,
C has a complete set of eigenvalues and eigenvectors with
with the same set of
′s as for
R and with
orthonormal, i.e.,
. Since
and
it is clear that this normalization of eigenvectors is different from setting the maximum of
to be unity.
The probability distribution can be thought of as a cloud of values; for example in the case of a phase transition if t is such that is close to 1, but is near zero, then will either be an entire phase, or a weighted sum of the two of them, and will exclude (have small value at) points in between.
We define a distance between two such clouds
Note that
The function
of Equation (
16) is of the form of an L
norm. The L
norm is always equal to or greater than the L
norm, and therefore
This expression is of the form
, which because of the orthonormality of the
′s is
. It follows that
The sum on the right can be truncated at the
mth eigenvalue, preserving the direction of the inequality. Since the magnitudes of the eigenvalues decrease monotonically (with index) we obtain, finally,
The quantity on the right is the distance in the OR. Thus, two points
x and
y which are adjacent dynamically, meaning their clouds overlap, will be spatially adjacent in the OR.
3.4. Spin Glass
The spin glass has Hamiltonian
and has served as a model for many phenomena, from glass to the brain. The coupling constants,
, are typically random but quenched (meaning they are fixed for each calculation); they are usually taken to be either
or with values in a normal distribution. The spins,
are
. The pairs
are usually nearest neighbors on a lattice of some sort. A model of the model is a mean field version known as the Sherrington-Kirkpatrick model (SK). The Hamiltonian is
where the sum is now over all
i and
j (different from each other). The states are now
and the energy is
. The transition probability for
x and
differing by the value of a single spin is
Then the diagonal of
R is adjusted to make all column sums unity. This transition probability matrix guarantees that the stationary probability distribution is just
with
.
The SK model spin glass [
15] offers a mixture of the previous situations. Yes, there are phases, but the spectrum of eigenvalues has many points near 1. This is because there are
many phases, most of them metastable. The idea is that the system passes slowly through ever deeper (in energy) metastable phases on its way to the stationary probability distribution.
To diagonalize R you first need to fix a matrix, J (the set ), of couplings; in this case each coupling was . For N spins there are different states, so that R is a -by- matrix. For the computer used (and the limited skill of the user) this meant that for most calculations N was 12 or 13. Nevertheless, insight into the coordinate space and the progression of metastable phases could be gained.
First, each would naturally be associated with a point on the unit N-cube, a not very helpful association. A measure of distance could be hamming distance or Euclidean distance, neither of which would describe the underlying dynamics, which also depends on the matrix J. With the OR there is an embedding that respects the dynamics, so that proximity means that one state can easily become the other.
Having more than 4 phases means that three dimensions are not adequate to fully visualize the geometry. Nevertheless, one can find the convex hull in more than three dimensions and mark the relevant state (the vertices) on a 3-dimensional plot. (Actually, barring the possibility that this article will be published with a 3-dimensional printer, what you will see is a 2-dimensional plot of a 3-dimensional object.) For a particular choice of
J and
T this is shown in
Figure 10.
What does the OR tell you about the dynamics? It provides extrema and a distance scale. By the procedure mentioned earlier (cf. the paragraph between Equations (
8) and (
9)) this allows us to define phases. (Further verification is that for the example in
Figure 10, 40% of the points lie outside all phases, and yet their
combined probability is about
). What about the flow between phases? For that it suffices to start a bunch of points in a phase and see where they end. This leads to an effective
between phases. In this case there is an element of randomness due to the routes of various points from the phases. (You could also add the transition probabilities from the each point in a given phase.) The diagram of transitions (with the randomness) is illustrated in
Figure 11.
The figure can be considered a test—or a reflection—of the hierarchical model. That is the picture of phases flowing into ever deeper metastable phases. Probabilistically there should be flow in both directions, but in a larger setup the flow to less likely phases would be quite small. In the illustration some of the wrong-direction flow happens to be large but this can be attributed to random paths. It is interesting though that, up to randomness, the OR distances (and resulting phases) reproduce the hierarchical model.
3.6. Complex Systems
Complexity has yet to be defined, but people will agree that linguistics, ecology and protein networks can qualify. An example of an application to each field is given in [
17]. Occasionally the embedding is easy to understand and has the appearance of one of those I’ve already dealt with. However, there are also situations where deductions require a bit of knowledge of the system. Even in our earlier examples if the number of extrema exceeded four, other tricks had to be used for visualization. Still, as one can see from the spin glass example, it is possible to find structure (e.g., hierarchical properties) in the system.
In the present article I’ll give an example of an OR that I do not understand. The results are significant, the system is arguably complex, but I do not understand what’s going on. Sometimes the OR brings surprises, and sometimes you (or I, anyway) do not know why the figure picks out special points.
I took the text of Melville’s novel, Moby Dick (“Call me Ishmael. …”). The question was if letter 337 was an “a” what would letter 338 be? In other words I got the probability of ordered pairs, what is most likely to follow each letter. To make things easily doable I made several simplifications. I got rid of commas and other markings such as quotation marks, changed semicolons to periods and all capital letters to lower case, so in the end there were only 28 characters, the Roman alphabet (sans accents) plus a period and a space. Then I counted all ordered pairs and these became the first version of the transition matrix. However, the columns sums were not all the same. Thus, if #j goes to letter # then the sums, sum, vary. Since all these numbers should be one, further measures must be taken—and they are not unique. In MATLAB notation, you would have choices:
R=R/max(sum(R)); followed by R=R+diag(1-sum(R)); or
R = R*diag(1./sum(R)); and many others.
If you choose the first method, you divide R by the maximum of its column sums and make it up in the diagonal terms. In the second scheme you divide each column by its sum. I prefer the first method because for equilibrium systems it gives the Boltzmann distribution as it is stationary probability distribution. Also, in the present instance scheme #1 gives interesting results, Scheme #2, as far as I know, does not.
Those “interesting results” are illustrated in
Figure 13. You can see that there is a well-formed tetrahedron, with the extremal points indicated. All the other points are gathered in that fuzzy image near the vertex at point #19. Now the first few eigenvalues are
so there is no sudden dropoff after
, but there is still a tetrahedron. However, the real surprise comes when the identity of the vertices are revealed. They are the letters “jqxz” (not ordered). These are the most rare letters in the book. One can also look at the 4-dimensional convex hull formed by the first 4 non-trivial eigenvectors, and one gets “jkqvxz”, again rare letters. (There are 6 letters, rather than 5, because the convex hull is not a perfect simplex.) Why in 3 dimensions one gets a neat tetrahedron, and why the rare letters are at the vertices, I do not know. The OR is full of surprises, and this one has me baffled.