Algorithmic information theory
From Academic Kids

Algorithmic information theory is a field of study which attempts to capture the concept of complexity by using tools from theoretical computer science. The chief idea is to define the complexity (or Descriptive complexity, Kolmogorov complexity or also KolmogorovChaitin complexity) of a string as the length of the shortest program which outputs that string. Strings that can be produced by short programs are considered to be not very complex. This notion is surprisingly deep and can be used to state and prove impossibility results akin to Gödel's incompleteness theorem and Turing's halting problem.
The field was developed by Andrey Kolmogorov, Ray Solomonoff and Gregory Chaitin starting in the late 1960s. There are several variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on selfdelimiting programs and is mainly due to Leonid Levin (1974).
To formalize the above definition of complexity, one has to specify exactly what types of programs are allowed. Fortunately, it does not really matter: as we can see later, one could take a particular notation for Turing machines, or Lisp programs, or Pascal programs, or Java virtual machine bytecode.
In the subsequent, we will measure the lengths of all objects consistently in bits. To hit the definition of the complexity, we should start with more elementary notions. We start with the description of a binary sequence (string). Description of a binary sequence s is simply the program written as a string of bits which produces the sequence s as a result. Taking into consideration all possible programs that generate the sequence s and choosing the shortest one, we get the minimal description of the sequence s, denoted d(s). (If there exist more than one program of the same minimal length, select as d(s) the lexicographically first among them.) The Kolmogorov complexity of s, written K(s), is
 <math>K(s) = d(s).<math>
In the other words, K(s) is the length of the minimal description of s.
We can now get back to the problem of choosing the notation to write algorithms in. Suppose K_{1}(s) and K_{2}(s) be the complexities of the string s according to two different programming languages L_{1} and L_{2}, then there is a constant c (which only depend on the languages chosen, but not on s) such that
 K_{1}(s) ≤ K_{2} (s) + c
Here, c is the length in bits of an interpreter for L_{2} written in L_{1}. (One technical requirement is that it must be possible to embed arbitrary binary data into programs without delimiters, e.g. by providing such data on "standard input" and considering all bits read from this stream as part of the program.)
In the following, we will fix one definition and simply write K(s) for the complexity of the string s.
The first surprising result is that K(s) cannot be computed: there is no general algorithm which takes a string s as input and produces the number K(s) as output. The proof is a formalization of the amusing Berry paradox: "Let n be the smallest number that cannot be defined in fewer than twenty English words. Well, I just defined it in fewer than twenty English words."
It is however straightforward to compute upper bounds for K(s): simply compress the string s with some method, implement the corresponding decompressor in the chosen language, concatenate the decompressor to the compressed string, and measure the resulting string's length.
The next important result is about the randomness of strings. Most strings are complex in the sense that they cannot be significantly compressed: K(s) is not much smaller than s, the length of s in bits. The precise statement is as follows: the probability that a random string of length n has complexity less than n − k is smaller than 2^{−k}. The proof is a counting argument: you count the programs and the strings, and compare. This theorem is the justification for Mike Goldman's challenge in the comp.compression FAQ (http://www.faqs.org/faqs/compressionfaq/):
 I will attach a prize of $5,000 to anyone who successfully meets this challenge. First, the contestant will tell me HOW LONG of a data file to generate. Second, I will generate the data file, and send it to the contestant. Last, the contestant will send me a decompressor and a compressed file, which will together total in size less than the original data file, and which will be able to restore the compressed file to the original state.
 With this offer, you can tune your algorithm to my data. You tell me the parameters of size in advance. All I get to do is arrange the bits within my file according to the dictates of my whim. As a processing fee, I will require an advance deposit of $100 from any contestant. This deposit is 100% refundable if you meet the challenge.
Now for Chaitin's incompleteness result: though we know that most strings are complex in the above sense, the fact that a specific string is complex can never be proven (if the string's length is above a certain threshold). The precise formalization is as follows. Suppose we fix a particular consistent axiomatic system for the natural numbers, say Peano's axioms. Then there exists a constant L (which only depends on the particular axiomatic system and the choice of definition of complexity) such that there is no string s for which the statement
 K(s) ≥ L
can be proven within the axiomatic system (even though, as we know, the vast majority of those statements must be true). Again, the proof of this result is a formalization of Berry's paradox.
Similar ideas are used to prove the properties of Chaitin's constant.
The minimum message length principle of statistical and inductive inference and machine learning was developed by C.S. Wallace (http://www.csse.monash.edu.au/~dld/CSWallacePublications/) and D.M. Boulton in 1968. MML is Bayesian (it incorporates prior beliefs) and informationtheoretic. It has the desirable properties of statistical invariance (the inference transforms with a reparametrisation, such as from polar coordinates to Cartesian coordinates), statistical consistency (even for very hard problems, MML will converge to any underlying model) and efficiency (the MML model will converge to any true underlying model about as quickly as is possible). C.S. Wallace (http://www.csse.monash.edu.au/~dld/CSWallacePublications/) and D.L. Dowe showed a formal connection between MML and algorithmic information theory (or Kolmogorov complexity) in 1999.
See also
External links
 The Legacy of Andrei Nikolaevich Kolmogorov (http://www.kolmogorov.com/)
 Chaitin's online publications (http://www.cs.umaine.edu/~chaitin/)
 Solomonoff's IDSIA page (http://www.idsia.ch/~juergen/ray.html)
 Schmidhuber's generalizations of algorithmic information (http://www.idsia.ch/~juergen/kolmogorov.html)
 Li & Vitanyi's textbook (http://homepages.cwi.nl/~paulv/kolmogorov.html)
 Tromp's lambda calculus computer model offers a concrete definition of K() (http://homepages.cwi.nl/~tromp/cl/cl.html)
 Minimum Message Length and Kolmogorov Complexity (http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/pdf/420270.pdf) (by C.S. Wallace (http://www.csse.monash.edu.au/cgibin/pub_search?publication_type=0&year=&authors=wallace&title=) and D.L. Dowe (http://www.csse.monash.edu.au/~dld), Computer Journal, Vol. 42, No. 4, 1999).
 David Dowe (http://www.csse.monash.edu.au/~dld)'s Minimum Message Length (MML) (http://www.csse.monash.edu.au/~dld/MML.html) and Occam's razor (http://www.csse.monash.edu.au/~dld/Occam.html) pages.
 P. Grunwald, M. A. Pitt and I. J. Myung (ed.), Advances in Minimum Description Length: Theory and Applications (http://mitpress.mit.edu/catalog/item/default.asp?sid=4C100C6F225540FFA2ED02FC49FEBE7C&ttype=2&tid=10478), M.I.T. Press, April 2005, ISBN 0262072629.
 Kolmogorov Complexity (http://nms.lcs.mit.edu/~gch/kolmogorov.html) provides a simple explanation of Kolmogorov complexity.de:Algorithmische Informationstheorie