Content deleted Content added
TypistMonkey (talk | contribs) →Overview: Copyedit. |
AfD-merge from Fungible information; see Wikipedia:Articles for deletion/Fungible information |
||
(46 intermediate revisions by 29 users not shown) | |||
Line 2:
{{distinguish|Information science}}
{{Information theory}}
'''Information theory''' is the mathematical study of the [[quantification (science)|quantification]], [[
A key measure in information theory is [[information entropy|entropy]]. Entropy quantifies the amount of uncertainty involved in the value of a [[random variable]] or the outcome of a [[random process]]. For example, identifying the outcome of a fair [[coin flip]] (
Applications of fundamental topics of information theory include source coding/[[data compression]] (e.g. for [[ZIP (file format)|ZIP files]]), and channel coding/[[error detection and correction]] (e.g. for [[digital subscriber line|DSL]]). Its impact has been crucial to the success of the [[Voyager program|Voyager]] missions to deep space, the invention of the [[compact disc]], the feasibility of mobile phones and the development of the Internet.<ref name=":0" /> The theory has
</ref> imaging system design,<ref>{{Cite arXiv|title=Universal evaluation and design of imaging systems using information estimation|eprint=2405.20559 |last1=Pinkard |first1=Henry |last2=Kabuli |first2=Leyla |last3=Markley |first3=Eric |last4=Chien |first4=Tiffany |last5=Jiao |first5=Jiantao |last6=Waller |first6=Laura |date=2024 |class=physics.optics }}</ref> [[epistemology]],<ref>{{Cite journal |last=Harms |first=William F. |date=1998 |title=The Use of Information Theory in Epistemology |url=https://www.jstor.org/stable/188281 |journal=Philosophy of Science |volume=65 |issue=3 |pages=472–501 |doi=10.1086/392657 |jstor=188281 |issn=0031-8248}}</ref> and even art creation.
==Overview==
Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was formalized in 1948 by Claude Shannon in a paper entitled ''[[A Mathematical Theory of Communication]]'', in which information is thought of as a set of possible messages, and the goal is to send these messages over a noisy channel, and to have the receiver reconstruct the message with low probability of error, in spite of the channel noise. Shannon's main result, the [[noisy-channel coding theorem]], showed that, in the limit of many channel uses, the rate of information that is asymptotically achievable is equal to the channel capacity, a quantity dependent merely on the statistics of the channel over which the messages are sent.<ref name="Spikes" />
Coding theory is concerned with finding explicit methods, called ''codes'', for increasing the efficiency and reducing the error rate of data communication over noisy channels to near the channel capacity. These codes can be roughly subdivided into data compression (source coding) and [[error-correction]] (channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible.{{cn|date=April 2024}}
A third class of information theory codes are cryptographic algorithms (both [[code (cryptography)|code]]s and [[cipher]]s). Concepts, methods and results from coding theory and information theory are widely used in cryptography and [[cryptanalysis]], such as the [[Ban (unit)|unit ban]].{{cn|date=April 2024}}
==Historical background==
{{Main|History of information theory}}
The landmark event ''establishing'' the discipline of information theory and bringing it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the ''[[Bell System Technical Journal]]'' in July and October 1948. He came to be known as the "father of information theory".<ref>{{Cite web |last=Horgan |first=John |date=2016-04-27 |title=Claude Shannon: Tinkerer, Prankster, and Father of Information Theory |url=https://spectrum.ieee.org/claude-shannon-tinkerer-prankster-and-father-of-information-theory |access-date=2023-09-30 |website=
Prior to this paper, limited information-theoretic ideas had been developed at [[Bell Labs]], all implicitly assuming events of equal probability. [[Harry Nyquist]]'s 1924 paper, ''Certain Factors Affecting Telegraph Speed'', contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation {{math|1=''W'' = ''K'' log ''m''}} (recalling the [[Boltzmann constant]]), where ''W'' is the speed of transmission of intelligence, ''m'' is the number of different voltage levels to choose from at each time step, and ''K'' is a constant. [[Ralph Hartley]]'s 1928 paper, ''Transmission of Information'', uses the word ''information'' as a measurable quantity, reflecting the receiver's ability to distinguish one [[sequence of symbols]] from any other, thus quantifying information as {{math|1=''H'' = log ''S''<sup>''n''</sup> = ''n'' log ''S''}}, where ''S'' was the number of possible symbols, and ''n'' the number of symbols in a transmission. The unit of information was therefore the [[decimal digit]], which since has sometimes been called the [[Hartley (unit)|hartley]] in his honor as a unit or scale or measure of information. [[Alan Turing]] in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war [[Cryptanalysis of the Enigma|Enigma]] ciphers.{{cn|date=April 2024}}
Much of the mathematics behind information theory with events of different probabilities were developed for the field of [[thermodynamics]] by [[Ludwig Boltzmann]] and [[J. Willard Gibbs]].
In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion:
Line 32:
* the mutual information, and the channel capacity of a noisy channel, including the promise of perfect loss-free communication given by the noisy-channel coding theorem;
* the practical result of the [[Shannon–Hartley law]] for the channel capacity of a [[Gaussian channel]]; as well as
* the [[bit]]—a new way of seeing the most fundamental unit of information.{{cn|date=April 2024}}
==Quantities of information==
{{Unreferenced section|date=April 2024}}{{Main|Quantities of information}}
Information theory is based on [[probability theory]] and statistics, where [[Quantities of information|quantified information]] is usually described in terms of bits.
The choice of logarithmic base in the following formulae determines the [[units of measurement|unit]] of information entropy that is used.
In what follows, an expression of the form {{math|''p'' log ''p''}} is considered by convention to be equal to zero whenever {{math|1=''p'' = 0}}.
===Entropy of an information source===
Line 52:
The entropy of a source that emits a sequence of {{math|''N''}} symbols that are [[independent and identically distributed]] (iid) is {{math|''N'' ⋅ ''H''}} bits (per message of {{math|''N''}} symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length {{math|''N''}} will be less than {{math|''N'' ⋅ ''H''}}.
[[File:Binary entropy plot.svg|thumbnail|right|200px|The entropy of a [[Bernoulli trial]] as a function of success probability, often called the {{em|[[binary entropy function]]}}, {{math|''H''<sub>b</sub>(''p'')}}.
If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted.
:<math> H(X) = \mathbb{E}_{X} [I(x)] = -\sum_{x \in \mathbb{X}} p(x) \log p(x).</math>
Line 65:
===Joint entropy===
The {{em|[[joint entropy]]}} of two discrete random variables {{math|''X''}} and {{math|''Y''}} is merely the entropy of their pairing: {{math|(''X'', ''Y'')}}.
For example, if {{math|(''X'', ''Y'')}} represents the position of a chess piece—{{math|''X''}} the row and {{math|''Y''}} the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.
Line 78:
:<math> H(X|Y) = \mathbb E_Y [H(X|y)] = -\sum_{y \in Y} p(y) \sum_{x \in X} p(x|y) \log p(x|y) = -\sum_{x,y} p(x,y) \log p(x|y).</math>
Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use.
: <math> H(X|Y) = H(X,Y) - H(Y) .\,</math>
===Mutual information (transinformation)===
''[[Mutual information]]'' measures the amount of information that can be obtained about one random variable by observing another.
:<math>I(X;Y) = \mathbb{E}_{X,Y} [SI(x,y)] = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)\, p(y)}</math>
Line 97:
Mutual information can be expressed as the average Kullback–Leibler divergence (information gain) between the [[posterior probability|posterior probability distribution]] of ''X'' given the value of ''Y'' and the [[prior probability|prior distribution]] on ''X'':
: <math>I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )].</math>
In other words, this is a measure of how much, on the average, the probability distribution on ''X'' will change if we are given the value of ''Y''.
: <math>I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)).</math>
Line 103:
===Kullback–Leibler divergence (information gain)===
The ''[[Kullback–Leibler divergence]]'' (or ''information divergence'', ''information gain'', or ''relative entropy'') is a way of comparing two distributions: a "true" [[probability distribution]] {{tmath|p(X)}}, and an arbitrary probability distribution {{tmath|q(X)}}. If we compress data in a manner that assumes {{tmath|q(X)}} is the distribution underlying some data, when, in reality, {{tmath|p(X)}} is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression.
:<math>D_{\mathrm{KL}}(p(X) \| q(X)) = \sum_{x \in X} -p(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)} = \sum_{x \in X} p(x) \log \frac{p(x)}{q(x)}.</math>
Line 109:
Although it is sometimes used as a 'distance metric', KL divergence is not a true [[Metric (mathematics)|metric]] since it is not symmetric and does not satisfy the [[triangle inequality]] (making it a semi-quasimetric).
Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number ''X'' is about to be drawn randomly from a discrete set with probability distribution {{tmath|p(x)}}.
===Directed Information===
'''[[Directed information]]''', <math>I(X^n\to Y^n) </math>, is an information theory measure that quantifies the [[information]] flow from the random process
:<math>I(X^n\to Y^n) \triangleq \sum_{i=1}^n I(X^i;Y_i|Y^{i-1})</math>,
where <math>I(X^{i};Y_i|Y^{i-1})</math> is the [[conditional mutual information]] <math>I(X_1,X_2,...,X_{i};Y_i|Y_1,Y_2,...,Y_{i-1})</math>.
In contrast to ''mutual'' information, ''directed'' information is not symmetric. The <math>I(X^n\to Y^n) </math> measures the information bits that are transmitted causally<sup>[definition of causal transmission?]</sup> from <math>X^n</math> to <math>Y^n</math>. The Directed information has many applications in problems where [[causality]] plays an important role such as [[channel capacity|capacity of channel]] with feedback,<ref name=massey>{{citation |last1=Massey|first1=James|
and in [[real-time control]] communication settings,<ref>{{cite journal|last1=Charalambous|first1=Charalambos D.|last2=Stavrou|first2=Photios A.|title=Directed Information on Abstract Spaces: Properties and Variational Equalities|journal=IEEE Transactions on Information Theory|date=August 2016|volume=62|issue=11|pages=6019–6052|doi=10.1109/TIT.2016.2604846|arxiv=1302.3971|s2cid=8107565}}</ref><ref>{{cite journal |last1=Tanaka |first1=Takashi |last2=Esfahani |first2=Peyman Mohajerin |last3=Mitter |first3=Sanjoy K. |title=LQG Control With Minimum Directed Information: Semidefinite Programming Approach |journal=IEEE Transactions on Automatic Control |date=January 2018 |volume=63 |issue=1 |pages=37–52 |doi=10.1109/TAC.2017.2709618|s2cid=1401958 |url=http://resolver.tudelft.nl/uuid:d9db1c11-fbfd-4c0c-b66f-f341b49fa61a |arxiv=1510.04214 |via=TU Delft Repositories |url-status=live |archive-url=https://web.archive.org/web/20240412014101/https://repository.tudelft.nl/islandora/object/uuid:d9db1c11-fbfd-4c0c-b66f-f341b49fa61a/datastream/OBJ/download |archive-date= Apr 12, 2024 }}</ref> statistical physics.<ref>{{cite journal |last1=Vinkler |first1=Dror A |last2=Permuter |first2=Haim H |last3=Merhav |first3=Neri |title=Analogy between gambling and measurement-based work extraction |journal=Journal of Statistical Mechanics: Theory and Experiment |date=20 April 2016 |volume=2016 |issue=4 |pages=043403 |doi=10.1088/1742-5468/2016/04/043403|arxiv=1404.6788 |bibcode=2016JSMTE..04.3403V |s2cid=124719237 }}</ref>
===Other quantities===
Other important information theoretic quantities include the [[Rényi entropy]] and the [[Tsallis entropy]] (
==Coding theory==
{{Unreferenced section|date=April 2024}}{{Main|Coding theory}}
[[File:CDSCRATCHES.jpg|thumb|right|A picture showing scratches on the readable surface of a CD-R.
Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.
Line 137:
===Source theory===
Any process that generates successive messages can be considered a {{em|[[Communication source|source]]}} of information.
====Rate====<!-- This section is linked from [[Channel capacity]] -->
Information ''[[Entropy rate|rate]]'' is the average entropy per symbol.
:<math>r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots);</math>
that is, the conditional entropy of a symbol given all the previous symbols generated.
:<math>r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n);</math>
that is, the limit of the joint entropy per symbol.
Information rate is defined as
:<math>r = \lim_{n \to \infty} \frac{1}{n} I(X_1, X_2, \dots X_n;Y_1,Y_2, \dots Y_n);</math>
It is common in information theory to speak of the "rate" or "entropy" of a language.
===Channel capacity===
{{Main|Channel capacity}}
Communications over a channel is the primary motivation of information theory.
Consider the communications process over a discrete channel. A simple model of the process is shown below:
Line 168:
Here ''X'' represents the space of messages transmitted, and ''Y'' the space of messages received during a unit time over our channel. Let {{math|''p''(''y''{{pipe}}''x'')}} be the [[conditional probability]] distribution function of ''Y'' given ''X''. We will consider {{math|''p''(''y''{{pipe}}''x'')}} to be an inherent fixed property of our communications channel (representing the nature of the ''[[Signal noise|noise]]'' of our channel). Then the joint distribution of ''X'' and ''Y'' is completely determined by our channel and by our choice of {{math|''f''(''x'')}}, the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the ''[[Signal (electrical engineering)|signal]]'', we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the {{em|channel capacity}} and is given by:
:<math> C = \max_{f} I(X;Y).\! </math>
This capacity has the following property related to communicating at information rate ''R'' (where ''R'' is usually bits per symbol).
''[[Channel code|Channel coding]]'' is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.
Line 183:
====Channels with memory and directed information====
In practice many channels have memory. Namely, at time <math> i </math> the channel is given by the conditional probability
It is often more comfortable to use the notation <math> x^i=(x_i,x_{i-1},x_{i-2},...,x_1) </math> and the channel become <math> P(y_i|x^i,y^{i-1}). </math>.
In such a case the capacity is given by the [[mutual information]] rate when there is no [[Feedback capacity|feedback]] available and the [[Directed information]] rate in the case that either there is feedback or not<ref
===Fungible information===
'''Fungible information''' is the [[information]] for which the means of [[encoding]] is not important.<ref>{{cite journal|last=Bartlett|first=Stephen D. |author2=Rudolph, Terry|author3-link=Robert Spekkens|author3=Spekkens, Robert W.|title=Reference frames, superselection rules, and quantum information|journal=Reviews of Modern Physics|volume=79|issue=2|date=April–June 2007|pages=555–606|doi=10.1103/RevModPhys.79.555|bibcode=2007RvMP...79..555B|arxiv = quant-ph/0610030 }}</ref> Classical information theorists and computer scientists are mainly concerned with information of this sort. It is sometimes referred as speakable information.<ref>{{cite book | last = Peres | first = A.| title = Quantum Theory: Reconsideration of Foundations | publisher = Växjö University Press, Växjö, Sweden| year = 2002b|editor = A. Khrennikov|page=283 |author2=P. F. Scudo}}</ref>
==Applications to other fields==
===Intelligence uses and secrecy applications===
{{Unreferenced section|date=April 2024}}
Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the [[Ban (unit)|ban]], was used in the [[Ultra (cryptography)|Ultra]] project, breaking the German [[Enigma machine]] code and hastening the [[Victory in Europe Day|end of World War II in Europe]].
Information theory leads us to believe it is much more difficult to keep secrets than it might first appear.
[[Information theoretic security]] refers to methods such as the [[one-time pad]] that are not vulnerable to such brute force attacks.
===Pseudorandom number generation===
{{Unreferenced section|date=April 2024}}
[[Pseudorandom number generator]]s are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed [[cryptographically secure pseudorandom number generator]]s, but even they require [[random seed]]s external to the software to work as intended. These can be obtained via [[Extractor (mathematics)|extractors]], if done carefully. The measure of sufficient randomness in extractors is [[min-entropy]], a value related to Shannon entropy through [[Rényi entropy]]; Rényi entropy is also used in evaluating randomness in cryptographic systems.
===Seismic exploration===
Line 211 ⟶ 216:
===Miscellaneous applications===
Information theory also has applications in the [[search for extraterrestrial intelligence]],<ref>{{Cite journal |last1=Doyle |first1=Laurance R. |author-link=Laurance Doyle |last2=McCowan |first2=Brenda |author-link2=Brenda McCowan |last3=Johnston |first3=Simon |last4=Hanser |first4=Sean F. |date=February 2011 |title=Information theory, animal communication, and the search for extraterrestrial intelligence |journal=[[Acta Astronautica]] |language=en |volume=68 |issue=3–4 |pages=406–417 |doi=10.1016/j.actaastro.2009.11.018|bibcode=2011AcAau..68..406D }}</ref> [[black hole information paradox|black holes]], [[bioinformatics]]{{cn|date=April 2024}}, and [[Gambling and information theory|gambling]].
==See also==
Line 219 ⟶ 224:
* [[Bayesian inference]]
* [[Communication theory]]
* [[Constructor theory]]
* [[Formal science]]
* [[Inductive probability]]
Line 246 ⟶ 251:
* [[Timeline of information theory]]
* [[Hubert Yockey|Yockey, H.P.]]
* [[Andrey Kolmogorov]]
===Theory===
Line 292 ⟶ 298:
* [[Variety (cybernetics)|Variety]]
* [[Hamming distance]]
* [[Perplexity]]
{{div col end}}
Line 308 ⟶ 316:
===Other journal articles===
{{refbegin|}}
* J. L. Kelly
* R. Landauer, [https://archive.today/20071016185253/http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=615478 IEEE.org], "Information is Physical" ''Proc. Workshop on Physics and Computation PhysComp'92'' (IEEE Comp. Sci.Press, Los Alamitos, 1993) pp. 1–4.
* {{cite journal|last1=Landauer|first1=R.|year=1961|title=Irreversibility and Heat Generation in the Computing Process|url=http://www.research.ibm.com/journal/rd/441/landauerii.pdf|journal=IBM J. Res. Dev.|volume=5|issue=3|pages=183–191|doi=10.1147/rd.53.0183}}
* {{cite arXiv|last1=Timme|first1=Nicholas|last2=Alford|first2=Wesley|last3=Flecker|first3=Benjamin|last4=Beggs|first4=John M.|date=2012|title=Multivariate information measures: an experimentalist's perspective|eprint=1111.6857|class=cs.IT}}
Line 327 ⟶ 335:
* [[Robert McEliece|McEliece, R]]. ''The Theory of Information and Coding''. Cambridge, 2002. {{isbn|978-0521831857}}
* [[John R. Pierce|Pierce, JR]]. "An introduction to information theory: symbols, signals and noise". Dover (2nd Edition). 1961 (reprinted by Dover 1980).
* [[Fazlollah Reza|Reza, F]]. ''An Introduction to Information Theory''. New York: McGraw-Hill 1961. New York: Dover 1994. {{isbn|0-486-68210-2}}
* {{cite book |last1=Shannon |first1=Claude |author-link1=Claude Shannon |last2=Weaver |first2=Warren |author-link2=Warren Weaver |date=1949 |title=The Mathematical Theory of Communication |url=http://monoskop.org/images/b/be/Shannon_Claude_E_Weaver_Warren_The_Mathematical_Theory_of_Communication_1963.pdf |location=[[Urbana, Illinois]] |publisher=[[University of Illinois Press]] |lccn=49-11922 |isbn=0-252-72548-4}}
* Stone, JV. Chapter 1 of book [https://jamesstone.sites.sheffield.ac.uk/books/information-theory-2nd-edition "Information Theory: A Tutorial Introduction"], University of Sheffield, England, 2014. {{isbn|978-0956372857}}.
|