Independence (probability theory)

Last updated August 22, 2024

Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent^[1] if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

When dealing with collections of more than two events, two notions of independence need to be distinguished. The events are called pairwise independent if any two events in the collection are independent of each other, while mutual independence (or collective independence) of events means, informally speaking, that each event is independent of any combination of other events in the collection. A similar notion exists for collections of random variables. Mutual independence implies pairwise independence, but not the other way around. In the standard literature of probability theory, statistics, and stochastic processes, independence without further qualification usually refers to mutual independence.

Definition

For events

Two events

Two events $A$ and $B$ are independent (often written as $A\perp B$ or $A\perp \!\!\!\perp B$ , where the latter symbol often is also used for conditional independence) if and only if their joint probability equals the product of their probabilities:^[2]^{: p. 29}^[3]^{: p. 10}

\mathrm {P} (A\cap B)=\mathrm {P} (A)\mathrm {P} (B)

(Eq.1)

$A\cap B\neq \emptyset$ indicates that two independent events $A$ and $B$ have common elements in their sample space so that they are not mutually exclusive (mutually exclusive iff $A\cap B=\emptyset$ ). Why this defines independence is made clear by rewriting with conditional probabilities $P(A\mid B)={\frac {P(A\cap B)}{P(B)}}$ as the probability at which the event $A$ occurs provided that the event $B$ has or is assumed to have occurred:

\mathrm {P} (A\cap B)=\mathrm {P} (A)\mathrm {P} (B)\iff \mathrm {P} (A\mid B)={\frac {\mathrm {P} (A\cap B)}{\mathrm {P} (B)}}=\mathrm {P} (A).

and similarly

\mathrm {P} (A\cap B)=\mathrm {P} (A)\mathrm {P} (B)\iff \mathrm {P} (B\mid A)={\frac {\mathrm {P} (A\cap B)}{\mathrm {P} (A)}}=\mathrm {P} (B).

Thus, the occurrence of $B$ does not affect the probability of $A$ , and vice versa. In other words, $A$ and $B$ are independent of each other. Although the derived expressions may seem more intuitive, they are not the preferred definition, as the conditional probabilities may be undefined if $\mathrm {P} (A)$ or $\mathrm {P} (B)$ are 0. Furthermore, the preferred definition makes clear by symmetry that when $A$ is independent of $B$ , $B$ is also independent of $A$ .

Odds

Stated in terms of odds, two events are independent if and only if the odds ratio of ⁠ $A$ ⁠ and ⁠ $B$ ⁠ is unity (1). Analogously with probability, this is equivalent to the conditional odds being equal to the unconditional odds:

O(A\mid B)=O(A){\text{ and }}O(B\mid A)=O(B),

or to the odds of one event, given the other event, being the same as the odds of the event, given the other event not occurring:

O(A\mid B)=O(A\mid \neg B){\text{ and }}O(B\mid A)=O(B\mid \neg A).

The odds ratio can be defined as

O(A\mid B):O(A\mid \neg B),

or symmetrically for odds of ⁠ $B$ ⁠ given ⁠ $A$ ⁠, and thus is 1 if and only if the events are independent.

More than two events

A finite set of events $\{A_{i}\}_{i=1}^{n}$ is pairwise independent if every pair of events is independent^[4]—that is, if and only if for all distinct pairs of indices $m,k$ ,

\mathrm {P} (A_{m}\cap A_{k})=\mathrm {P} (A_{m})\mathrm {P} (A_{k})

(Eq.2)

A finite set of events is mutually independent if every event is independent of any intersection of the other events^[4]^[3]^{: p. 11}—that is, if and only if for every $k\leq n$ and for every k indices $1\leq i_{1}<\dots <i_{k}\leq n$ ,

\mathrm {P} \left(\bigcap _{j=1}^{k}A_{i_{j}}\right)=\prod _{j=1}^{k}\mathrm {P} (A_{i_{j}})

(Eq.3)

This is called the multiplication rule for independent events. It is not a single condition involving only the product of all the probabilities of all single events; it must hold true for all subsets of events.

For more than two events, a mutually independent set of events is (by definition) pairwise independent; but the converse is not necessarily true.^[2]^{: p. 30}

Log probability and information content

Stated in terms of log probability, two events are independent if and only if the log probability of the joint event is the sum of the log probability of the individual events:

\log \mathrm {P} (A\cap B)=\log \mathrm {P} (A)+\log \mathrm {P} (B)

In information theory, negative log probability is interpreted as information content, and thus two events are independent if and only if the information content of the combined event equals the sum of information content of the individual events:

\mathrm {I} (A\cap B)=\mathrm {I} (A)+\mathrm {I} (B)

See Information content § Additivity of independent events for details.

For real valued random variables

Two random variables

Two random variables $X$ and $Y$ are independent if and only if (iff) the elements of the $π$ -system generated by them are independent; that is to say, for every $x$ and $y$ , the events $\{X\leq x\}$ and $\{Y\leq y\}$ are independent events (as defined above in Eq.1 ). That is, $X$ and $Y$ with cumulative distribution functions $F_{X}(x)$ and $F_{Y}(y)$ , are independent iff the combined random variable $(X,Y)$ has a joint cumulative distribution function^[3]^{: p. 15}

F_{X,Y}(x,y)=F_{X}(x)F_{Y}(y)\quad {\text{for all }}x,y

(Eq.4)

or equivalently, if the probability densities $f_{X}(x)$ and $f_{Y}(y)$ and the joint probability density $f_{X,Y}(x,y)$ exist,

f_{X,Y}(x,y)=f_{X}(x)f_{Y}(y)\quad {\text{for all }}x,y.

More than two random variables

A finite set of $n$ random variables $\{X_{1},\ldots ,X_{n}\}$ is pairwise independent if and only if every pair of random variables is independent. Even if the set of random variables is pairwise independent, it is not necessarily mutually independent as defined next.

A finite set of $n$ random variables $\{X_{1},\ldots ,X_{n}\}$ is mutually independent if and only if for any sequence of numbers $\{x_{1},\ldots ,x_{n}\}$ , the events $\{X_{1}\leq x_{1}\},\ldots ,\{X_{n}\leq x_{n}\}$ are mutually independent events (as defined above in Eq.3 ). This is equivalent to the following condition on the joint cumulative distribution function $F_{X_{1},\ldots ,X_{n}}(x_{1},\ldots ,x_{n})$ . A finite set of $n$ random variables $\{X_{1},\ldots ,X_{n}\}$ is mutually independent if and only if^[3]^{: p. 16}

F_{X_{1},\ldots ,X_{n}}(x_{1},\ldots ,x_{n})=F_{X_{1}}(x_{1})\cdot \ldots \cdot F_{X_{n}}(x_{n})\quad {\text{for all }}x_{1},\ldots ,x_{n}

(Eq.5)

It is not necessary here to require that the probability distribution factorizes for all possible $k$ -element subsets as in the case for $n$ events. This is not required because e.g. $F_{X_{1},X_{2},X_{3}}(x_{1},x_{2},x_{3})=F_{X_{1}}(x_{1})\cdot F_{X_{2}}(x_{2})\cdot F_{X_{3}}(x_{3})$ implies $F_{X_{1},X_{3}}(x_{1},x_{3})=F_{X_{1}}(x_{1})\cdot F_{X_{3}}(x_{3})$ .

The measure-theoretically inclined may prefer to substitute events $\{X\in A\}$ for events $\{X\leq x\}$ in the above definition, where $A$ is any Borel set. That definition is exactly equivalent to the one above when the values of the random variables are real numbers. It has the advantage of working also for complex-valued random variables or for random variables taking values in any measurable space (which includes topological spaces endowed by appropriate σ-algebras).

For real valued random vectors

Two random vectors $\mathbf {X} =(X_{1},\ldots ,X_{m})^{\mathrm {T} }$ and $\mathbf {Y} =(Y_{1},\ldots ,Y_{n})^{\mathrm {T} }$ are called independent if^[5]^{: p. 187}

F_{\mathbf {X,Y} }(\mathbf {x,y} )=F_{\mathbf {X} }(\mathbf {x} )\cdot F_{\mathbf {Y} }(\mathbf {y} )\quad {\text{for all }}\mathbf {x} ,\mathbf {y}

(Eq.6)

where $F_{\mathbf {X} }(\mathbf {x} )$ and $F_{\mathbf {Y} }(\mathbf {y} )$ denote the cumulative distribution functions of $\mathbf {X}$ and $\mathbf {Y}$ and $F_{\mathbf {X,Y} }(\mathbf {x,y} )$ denotes their joint cumulative distribution function. Independence of $\mathbf {X}$ and $\mathbf {Y}$ is often denoted by $\mathbf {X} \perp \!\!\!\perp \mathbf {Y}$ . Written component-wise, $\mathbf {X}$ and $\mathbf {Y}$ are called independent if

F_{X_{1},\ldots ,X_{m},Y_{1},\ldots ,Y_{n}}(x_{1},\ldots ,x_{m},y_{1},\ldots ,y_{n})=F_{X_{1},\ldots ,X_{m}}(x_{1},\ldots ,x_{m})\cdot F_{Y_{1},\ldots ,Y_{n}}(y_{1},\ldots ,y_{n})\quad {\text{for all }}x_{1},\ldots ,x_{m},y_{1},\ldots ,y_{n}.

For stochastic processes

For one stochastic process

The definition of independence may be extended from random vectors to a stochastic process. Therefore, it is required for an independent stochastic process that the random variables obtained by sampling the process at any $n$ times $t_{1},\ldots ,t_{n}$ are independent random variables for any $n$ .^[6]^{: p. 163}

Formally, a stochastic process $\left\{X_{t}\right\}_{t\in {\mathcal {T}}}$ is called independent, if and only if for all $n\in \mathbb {N}$ and for all $t_{1},\ldots ,t_{n}\in {\mathcal {T}}$

F_{X_{t_{1}},\ldots ,X_{t_{n}}}(x_{1},\ldots ,x_{n})=F_{X_{t_{1}}}(x_{1})\cdot \ldots \cdot F_{X_{t_{n}}}(x_{n})\quad {\text{for all }}x_{1},\ldots ,x_{n}

(Eq.7)

where $F_{X_{t_{1}},\ldots ,X_{t_{n}}}(x_{1},\ldots ,x_{n})=\mathrm {P} (X(t_{1})\leq x_{1},\ldots ,X(t_{n})\leq x_{n})$ . Independence of a stochastic process is a property within a stochastic process, not between two stochastic processes.

For two stochastic processes

Independence of two stochastic processes is a property between two stochastic processes $\left\{X_{t}\right\}_{t\in {\mathcal {T}}}$ and $\left\{Y_{t}\right\}_{t\in {\mathcal {T}}}$ that are defined on the same probability space $(\Omega ,{\mathcal {F}},P)$ . Formally, two stochastic processes $\left\{X_{t}\right\}_{t\in {\mathcal {T}}}$ and $\left\{Y_{t}\right\}_{t\in {\mathcal {T}}}$ are said to be independent if for all $n\in \mathbb {N}$ and for all $t_{1},\ldots ,t_{n}\in {\mathcal {T}}$ , the random vectors $(X(t_{1}),\ldots ,X(t_{n}))$ and $(Y(t_{1}),\ldots ,Y(t_{n}))$ are independent,^[7]^{: p. 515} i.e. if

F_{X_{t_{1}},\ldots ,X_{t_{n}},Y_{t_{1}},\ldots ,Y_{t_{n}}}(x_{1},\ldots ,x_{n},y_{1},\ldots ,y_{n})=F_{X_{t_{1}},\ldots ,X_{t_{n}}}(x_{1},\ldots ,x_{n})\cdot F_{Y_{t_{1}},\ldots ,Y_{t_{n}}}(y_{1},\ldots ,y_{n})\quad {\text{for all }}x_{1},\ldots ,x_{n}

(Eq.8)

Independent σ-algebras

The definitions above ( Eq.1 and Eq.2 ) are both generalized by the following definition of independence for σ-algebras. Let $(\Omega ,\Sigma ,\mathrm {P} )$ be a probability space and let ${\mathcal {A}}$ and ${\mathcal {B}}$ be two sub-σ-algebras of $\Sigma$ . ${\mathcal {A}}$ and ${\mathcal {B}}$ are said to be independent if, whenever $A\in {\mathcal {A}}$ and $B\in {\mathcal {B}}$ ,

\mathrm {P} (A\cap B)=\mathrm {P} (A)\mathrm {P} (B).

Likewise, a finite family of σ-algebras $(\tau _{i})_{i\in I}$ , where $I$ is an index set, is said to be independent if and only if

{\displaystyle \forall \left(A_{i}\right)_{i\in I}\in \prod \nolimits _{i\in I}\tau _{i}\

and an infinite family of σ-algebras is said to be independent if all its finite subfamilies are independent.

The new definition relates to the previous ones very directly:

Two events are independent (in the old sense) if and only if the σ-algebras that they generate are independent (in the new sense). The σ-algebra generated by an event $E\in \Sigma$ is, by definition,

\sigma (\{E\})=\{\emptyset ,E,\Omega \setminus E,\Omega \}.

Two random variables $X$ and $Y$ defined over $\Omega$ are independent (in the old sense) if and only if the σ-algebras that they generate are independent (in the new sense). The σ-algebra generated by a random variable $X$ taking values in some measurable space $S$ consists, by definition, of all subsets of $\Omega$ of the form $X^{-1}(U)$ , where $U$ is any measurable subset of $S$ .

Using this definition, it is easy to show that if $X$ and $Y$ are random variables and $Y$ is constant, then $X$ and $Y$ are independent, since the σ-algebra generated by a constant random variable is the trivial σ-algebra $\{\varnothing ,\Omega \}$ . Probability zero events cannot affect independence so independence also holds if $Y$ is only Pr-almost surely constant.

Properties

Self-independence

Note that an event is independent of itself if and only if

\mathrm {P} (A)=\mathrm {P} (A\cap A)=\mathrm {P} (A)\cdot \mathrm {P} (A)\iff \mathrm {P} (A)=0{\text{ or }}\mathrm {P} (A)=1.

Thus an event is independent of itself if and only if it almost surely occurs or its complement almost surely occurs; this fact is useful when proving zero–one laws.^[8]

Expectation and covariance

If $X$ and $Y$ are statistically independent random variables, then the expectation operator $\operatorname {E}$ has the property

\operatorname {E} [X^{n}Y^{m}]=\operatorname {E} [X^{n}]\operatorname {E} [Y^{m}],

^[9]^{: p. 10}

and the covariance $\operatorname {cov} [X,Y]$ is zero, as follows from

\operatorname {cov} [X,Y]=\operatorname {E} [XY]-\operatorname {E} [X]\operatorname {E} [Y].

The converse does not hold: if two random variables have a covariance of 0 they still may be not independent.

Similarly for two stochastic processes $\left\{X_{t}\right\}_{t\in {\mathcal {T}}}$ and $\left\{Y_{t}\right\}_{t\in {\mathcal {T}}}$ : If they are independent, then they are uncorrelated.^[10]^{: p. 151}

Characteristic function

Two random variables $X$ and $Y$ are independent if and only if the characteristic function of the random vector $(X,Y)$ satisfies

\varphi _{(X,Y)}(t,s)=\varphi _{X}(t)\cdot \varphi _{Y}(s).

In particular the characteristic function of their sum is the product of their marginal characteristic functions:

\varphi _{X+Y}(t)=\varphi _{X}(t)\cdot \varphi _{Y}(t),

though the reverse implication is not true. Random variables that satisfy the latter condition are called subindependent.

Examples

Rolling dice

The event of getting a 6 the first time a die is rolled and the event of getting a 6 the second time are independent. By contrast, the event of getting a 6 the first time a die is rolled and the event that the sum of the numbers seen on the first and second trial is 8 are not independent.

Drawing cards

If two cards are drawn with replacement from a deck of cards, the event of drawing a red card on the first trial and that of drawing a red card on the second trial are independent. By contrast, if two cards are drawn without replacement from a deck of cards, the event of drawing a red card on the first trial and that of drawing a red card on the second trial are not independent, because a deck that has had a red card removed has proportionately fewer red cards.

Pairwise and mutual independence

Consider the two probability spaces shown. In both cases, $\mathrm {P} (A)=\mathrm {P} (B)=1/2$ and $\mathrm {P} (C)=1/4$ . The events in the first space are pairwise independent because $\mathrm {P} (A|B)=\mathrm {P} (A|C)=1/2=\mathrm {P} (A)$ , $\mathrm {P} (B|A)=\mathrm {P} (B|C)=1/2=\mathrm {P} (B)$ , and $\mathrm {P} (C|A)=\mathrm {P} (C|B)=1/4=\mathrm {P} (C)$ ; but the three events are not mutually independent. The events in the second space are both pairwise independent and mutually independent. To illustrate the difference, consider conditioning on two events. In the pairwise independent case, although any one event is independent of each of the other two individually, it is not independent of the intersection of the other two:

\mathrm {P} (A|BC)={\frac {\frac {4}{40}}{{\frac {4}{40}}+{\frac {1}{40}}}}={\tfrac {4}{5}}\neq \mathrm {P} (A)

\mathrm {P} (B|AC)={\frac {\frac {4}{40}}{{\frac {4}{40}}+{\frac {1}{40}}}}={\tfrac {4}{5}}\neq \mathrm {P} (B)

\mathrm {P} (C|AB)={\frac {\frac {4}{40}}{{\frac {4}{40}}+{\frac {6}{40}}}}={\tfrac {2}{5}}\neq \mathrm {P} (C)

In the mutually independent case, however,

\mathrm {P} (A|BC)={\frac {\frac {1}{16}}{{\frac {1}{16}}+{\frac {1}{16}}}}={\tfrac {1}{2}}=\mathrm {P} (A)

\mathrm {P} (B|AC)={\frac {\frac {1}{16}}{{\frac {1}{16}}+{\frac {1}{16}}}}={\tfrac {1}{2}}=\mathrm {P} (B)

\mathrm {P} (C|AB)={\frac {\frac {1}{16}}{{\frac {1}{16}}+{\frac {3}{16}}}}={\tfrac {1}{4}}=\mathrm {P} (C)

Triple-independence but no pairwise-independence

It is possible to create a three-event example in which

\mathrm {P} (A\cap B\cap C)=\mathrm {P} (A)\mathrm {P} (B)\mathrm {P} (C),

and yet no two of the three events are pairwise independent (and hence the set of events are not mutually independent).^[11] This example shows that mutual independence involves requirements on the products of probabilities of all combinations of events, not just the single events as in this example.

Conditional independence

For events

The events $A$ and $B$ are conditionally independent given an event $C$ when

$\mathrm {P} (A\cap B\mid C)=\mathrm {P} (A\mid C)\cdot \mathrm {P} (B\mid C)$ .

For random variables

Intuitively, two random variables $X$ and $Y$ are conditionally independent given $Z$ if, once $Z$ is known, the value of $Y$ does not add any additional information about $X$ . For instance, two measurements $X$ and $Y$ of the same underlying quantity $Z$ are not independent, but they are conditionally independent given $Z$ (unless the errors in the two measurements are somehow connected).

The formal definition of conditional independence is based on the idea of conditional distributions. If $X$ , $Y$ , and $Z$ are discrete random variables, then we define $X$ and $Y$ to be conditionally independent given $Z$ if

\mathrm {P} (X\leq x,Y\leq y\;|\;Z=z)=\mathrm {P} (X\leq x\;|\;Z=z)\cdot \mathrm {P} (Y\leq y\;|\;Z=z)

for all $x$ , $y$ and $z$ such that $\mathrm {P} (Z=z)>0$ . On the other hand, if the random variables are continuous and have a joint probability density function $f_{XYZ}(x,y,z)$ , then $X$ and $Y$ are conditionally independent given $Z$ if

f_{XY|Z}(x,y|z)=f_{X|Z}(x|z)\cdot f_{Y|Z}(y|z)

for all real numbers $x$ , $y$ and $z$ such that $f_{Z}(z)>0$ .

If discrete $X$ and $Y$ are conditionally independent given $Z$ , then

\mathrm {P} (X=x|Y=y,Z=z)=\mathrm {P} (X=x|Z=z)

for any $x$ , $y$ and $z$ with $\mathrm {P} (Z=z)>0$ . That is, the conditional distribution for $X$ given $Y$ and $Z$ is the same as that given $Z$ alone. A similar equation holds for the conditional probability density functions in the continuous case.

Independence can be seen as a special kind of conditional independence, since probability can be seen as a kind of conditional probability given no events.

History

Before 1933, independence, in probability theory, was defined in a verbal manner. For example, de Moivre gave the following definition: “Two events are independent, when they have no connexion one with the other, and that the happening of one neither forwards nor obstructs the happening of the other”.^[12] If there are n independent events, the probability of the event, that all of them happen was computed as the product of the probabilities of these n events. Apparently, there was the conviction, that this formula was a consequence of the above definition. (Sometimes this was called the Multiplication Theorem.), Of course, a proof of his assertion cannot work without further more formal tacit assumptions.

The definition of independence, given in this article, became the standard definition (now used in all books) after it appeared in 1933 as part of Kolmogorov's axiomatization of probability.^[13] Kolmogorov credited it to S.N. Bernstein, and quoted a publication which had appeared in Russian in 1927.^[14]

Unfortunately, both Bernstein and Kolmogorov had not been aware of the work of the Georg Bohlmann. Bohlmann had given the same definition for two events in 1901^[15] and for n events in 1908^[16] In the latter paper, he studied his notion in detail. For example, he gave the first example showing that pairwise independence does not imply imply mutual independence. Even today, Bohlmann is rarely quoted. More about his work can be found in On the contributions of Georg Bohlmann to probability theory from de:Ulrich Krengel.^[17]

Related Research Articles

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable $, which takes values in the set and is distributed according to, the entropy is where denotes the sum over the variable's possible values. The choice of base for, the logarithm, varies for different applications. Base 2 gives the unit of bits, while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys". An equivalent definition of entropy is the expected value of the self-information of a variable.$

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is $The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is . A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate .$

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.

In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

Covariance in probability theory and statistics is a measure of the joint variability of two random variables.

<span class="mw-page-title-main">Mutual information</span> Measure of dependence between two variables

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In probability theory and statistics, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two jointly distributed random variables $and, the conditional probability distribution of given is the probability distribution of when is known to be a particular value; in some cases the conditional probabilities may be expressed as functions containing the unspecified value of as a parameter. When both and are categorical variables, a conditional probability table is typically used to represent the conditional probability. The conditional distribution contrasts with the marginal distribution of a random variable, which is its distribution without reference to the value of the other variable.$

In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probability, as a special case where the probability of the hypothesis given the uninformative observation is equal to the probability without. If $is the hypothesis, and and are observations, conditional independence can be stated as an equality:$

Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables and the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).

A stochastic differential equation (SDE) is a differential equation in which one or more of the terms is a stochastic process, resulting in a solution which is also a stochastic process. SDEs have many applications throughout pure mathematics and are used to model various behaviours of stochastic models such as stock prices, random growth models or physical systems that are subjected to thermal fluctuations.

Probability theory and statistics have some commonly used conventions, in addition to standard mathematical notation and mathematical symbols.

In mathematics, a $π$ -system on a set $is a collection of certain subsets of such that$

<span class="mw-page-title-main">Scoring rule</span> Measure for evaluating probabilistic forecasts

In decision theory, a scoring rule provides evaluation metrics for probabilistic predictions or forecasts. While "regular" loss functions assign a goodness-of-fit score to a predicted value and an observed value, scoring rules assign such a score to a predicted probability distribution and an observed value. On the other hand, a scoring function provides a summary measure for the evaluation of point predictions, i.e. one predicts a property or functional $, like the expectation or the median.$

In statistics, an exchangeable sequence of random variables is a sequence X₁, X₂, X₃, ... whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. In other words, the joint distribution is invariant to finite permutation. Thus, for example the sequences

In the theory of stochastic processes, filtering describes the problem of determining the state of a system from an incomplete and potentially noisy set of observations. While originally motivated by problems in engineering, filtering found applications in many fields from signal processing to finance.

In probability theory, a Markov kernel is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.

In probability theory, the family of complex normal distributions, denoted $or, characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix, and the relation matrix . The standard complex normal is the univariate distribution with,, and .$

References

↑ Russell, Stuart; Norvig, Peter (2002). Artificial Intelligence: A Modern Approach . Prentice Hall. p. 478. ISBN 0-13-790395-2.
1 2 Florescu, Ionut (2014). Probability and Stochastic Processes. Wiley. ISBN 978-0-470-62455-5.
1 2 3 4 Gallager, Robert G. (2013). Stochastic Processes Theory for Applications. Cambridge University Press. ISBN 978-1-107-03975-9.
1 2 Feller, W (1971). "Stochastic Independence". An Introduction to Probability Theory and Its Applications. Wiley.
↑ Papoulis, Athanasios (1991). Probability, Random Variables and Stochastic Processes. MCGraw Hill. ISBN 0-07-048477-5.
↑ Hwei, Piao (1997). Theory and Problems of Probability, Random Variables, and Random Processes . McGraw-Hill. ISBN 0-07-030644-3.
↑ Amos Lapidoth (8 February 2017). A Foundation in Digital Communication. Cambridge University Press. ISBN 978-1-107-17732-1.
↑ Durrett, Richard (1996). Probability: theory and examples (Second ed.). page 62
↑ E Jakeman. MODELING FLUCTUATIONS IN SCATTERED WAVES. ISBN 978-0-7503-1005-5.
↑ Park, Kun Il (2018). Fundamentals of Probability and Stochastic Processes with Applications to Communications. Springer. ISBN 978-3-319-68074-3.
↑ George, Glyn, "Testing for the independence of three events," Mathematical Gazette 88, November 2004, 568. PDF
↑ Cited according to: Grinstead and Snell’s Introduction to Probability. In: The CHANCE Project. Version of July 4, 2006.
↑ Kolmogorov, Andrey (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung (in German). Berlin: Julius SpringerTranslation: Kolmogorov, Andrey (1956). Translation:Foundations of the Theory of Probability (2nd ed.). New York: Chelsea. ISBN 978-0-8284-0023-7.
↑ S.N. Bernstein, Probability Theory (Russian), Moscow, 1927 (4 editions, latest 1946)
↑ Georg Bohlmann: Lebensversicherungsmathematik, Encyklop¨adie der mathematischen Wissenschaften, Bd I, Teil 2, Artikel I D 4b (1901), 852–917
↑ Georg Bohlmann: Die Grundbegriffe der Wahrscheinlichkeitsrechnung in ihrer Anwendung auf die Lebensversichrung, Atti del IV. Congr. Int. dei Matem. Rom, Bd. III (1908), 244–278.
↑ de:Ulrich Krengel: On the contributions of Georg Bohlmann to probability theory (PDF; 6,4 MB), Electronic Journal for History of Probability and Statistics, 2011.

External links

Media related to Independence (probability theory) at Wikimedia Commons

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Artificial_Intelligence-1] Russell, Stuart; Norvig, Peter (2002). Artificial Intelligence: A Modern Approach . Prentice Hall. p. 478. ISBN 0-13-790395-2.

[Florescu-2] 1 2 Florescu, Ionut (2014). Probability and Stochastic Processes. Wiley. ISBN 978-0-470-62455-5.

[Gallager-3] 1 2 3 4 Gallager, Robert G. (2013). Stochastic Processes Theory for Applications. Cambridge University Press. ISBN 978-1-107-03975-9.

[Feller-4] 1 2 Feller, W (1971). "Stochastic Independence". An Introduction to Probability Theory and Its Applications. Wiley.

[Papoulis-5] Papoulis, Athanasios (1991). Probability, Random Variables and Stochastic Processes. MCGraw Hill. ISBN 0-07-048477-5.

[HweiHsu-6] Hwei, Piao (1997). Theory and Problems of Probability, Random Variables, and Random Processes . McGraw-Hill. ISBN 0-07-030644-3.

[Lapidoth2017-7] Amos Lapidoth (8 February 2017). A Foundation in Digital Communication. Cambridge University Press. ISBN 978-1-107-17732-1.

[8] Durrett, Richard (1996). Probability: theory and examples (Second ed.). page 62

[JakemanBook-9] E Jakeman. MODELING FLUCTUATIONS IN SCATTERED WAVES. ISBN 978-0-7503-1005-5.

[KunIlPark-10] Park, Kun Il (2018). Fundamentals of Probability and Stochastic Processes with Applications to Communications. Springer. ISBN 978-3-319-68074-3.

[11] George, Glyn, "Testing for the independence of three events," Mathematical Gazette 88, November 2004, 568. PDF

[12] Cited according to: Grinstead and Snell’s Introduction to Probability. In: The CHANCE Project. Version of July 4, 2006.

[13] Kolmogorov, Andrey (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung (in German). Berlin: Julius SpringerTranslation: Kolmogorov, Andrey (1956). Translation:Foundations of the Theory of Probability (2nd ed.). New York: Chelsea. ISBN 978-0-8284-0023-7.

[14] S.N. Bernstein, Probability Theory (Russian), Moscow, 1927 (4 editions, latest 1946)

[15] Georg Bohlmann: Lebensversicherungsmathematik, Encyklop¨adie der mathematischen Wissenschaften, Bd I, Teil 2, Artikel I D 4b (1901), 852–917

[16] Georg Bohlmann: Die Grundbegriffe der Wahrscheinlichkeitsrechnung in ihrer Anwendung auf die Lebensversichrung, Atti del IV. Congr. Int. dei Matem. Rom, Bd. III (1908), 244–278.

[17] :Ulrich Krengel: On the contributions of Georg Bohlmann to probability theory (PDF; 6,4 MB), Electronic Journal for History of Probability and Statistics, 2011.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

Independence (probability theory)

Contents

Definition

For events

Two events

Odds

More than two events

Log probability and information content

For real valued random variables

Two random variables

More than two random variables

For real valued random vectors

For stochastic processes

For one stochastic process

For two stochastic processes

Independent σ-algebras

Properties

Self-independence

Expectation and covariance

Characteristic function

Examples

Rolling dice

Drawing cards

Pairwise and mutual independence

Triple-independence but no pairwise-independence

Conditional independence

For events

For random variables

History

See also

Related Research Articles

References

External links