|
Previous |
Main |
Next |
|
In This Article:
Algorithmic Information Theory
Gregory Chaitin[1], Ray Solomonoff, and Andrei Kolmogorov developed a different view of information from that of Shannon. Rather than considering the statistical ensemble of messages from an information source, algorithmic information theory looks at individual sequences of symbols. Here, information H(X) is defined as the minimum size of a program necessary to generate the sequence X.
This article highlights some of the main concepts of Algorithmic Information Theory. Knowledge of this subject can be useful when arguing with Creationists whose use of concepts from both Classical and Algorithmic Information Theory tends to be sloppy. This article is only a survey of the major ideas. For the mathematical proofs, see the cited material.
Turing Machines and the Halting Problem [Top]
Understanding Algebraic Information Theory requires knowledge of Turing Machines. Alan Turing[2] showed that it is possible to invent a single machine U capable of computing any computable sequence. Turing's notations are somewhat difficult. A more understandable description of a Turing Machine is provided by the on-line Stanford Encyclopedia of Philosophy[3].
A Turing machine is an abstract representation of a computing device. It consists of a read/write head that scans a (possibly infinite) one-dimensional (bi-directional) tape divided into squares, each of which is inscribed with a 0 or 1. Computation begins with the machine, in a given "state", scanning a square. It erases what it finds there, prints a 0 or 1, moves to an adjacent square, and goes into a new state. This behavior is completely determined by three parameters: (1) the state the machine is in, (2) the number on the square it is scanning, and (3) a table of instructions. The table of instructions specifies, for each state and binary input, what the machine should write, which direction it should move in, and which state it should go into. (E.g., "If in State 1 scanning a 0: print 1, move left, and go into State 3".) The table can list only finitely many states, each of which becomes implicitly defined by the role it plays in the table of instructions. These states are often referred to as the "functional states" of the machine.
Turing invented his theoretical machine to address the Entscheidungsproblem or decision problem; that is, whether it is possible in symbolic logic to find a general algorithm which decides for given first-order statements whether they are universally valid or not.
The modern roots of the Entscheidungsproblem go back to a 1900 speech by David Hilbert. [4] Hilbert had hoped it would be possible to develop a general algorithm for deciding whether a set of axioms were self-contradictory. Kurt Gödel [5] showed that, in any axiomatic system, some propositions exist that cannot be proved or disproved within the axioms of the system. This is known as Gödel's Incompleteness Theorem, and it demonstrated that Hilbert's Second Problem could not be solved.
Turing, building on the work of Gödel, proved that the Entscheidungsproblem was not solvable. Alonzo Church, independently of Turing, showed the same thing.
Turing used the halting problem for his proof. The halting problem
is a decision problem. It poses the question: given a description of
an algorithm and its initial conditions, can one show that the
algorithm will halt? An algorithm producing the digits of is an example of a
non-halting algorithm (
is an irrational and transcendental
number; it cannot be expressed as a polynomial with rational
coefficients and therefore has an infinite number of digits). Turing
proved that no general algorithm exists that can solve the halting
problem for all possible inputs, and hence that the
Entscheidungsproblem is unsolvable.
Kolmogorov Complexity [Top]
Given a universal Turing machine (UTM) U, the algorithmic information content, also called algorithmic complexity or Kolmogorov complexity H(X) of string X is defined as the length of the shortest program p on U producing string X. The shortest program capable of producing a string is called an elegant program. The program may also be denoted X*, where U(X*) = X, and can itself be considered a string. Given an elegant program X*,
where |X*| is the size of X*.
A string X is said to be algorithmically random if it X* cannot be expressed any shorter than X; that is if its algorithmic complexity is maximal. Such a string does not contain redundant data. It may also be shown that the proportion of each symbol in the string is approximately equal. If such a string represents a number between 0 and 1 (the unit interval), then the number is a normal number or Borel normal after the work of Emile Borel.
On the other hand, not all normal numbers are algorithmically random. Strings with redundant data are compressible. For example, the binary string 01010101010101010101... is easily compressed to "repeat 01 n times."
It may be shown that if a string is algorithmically random, it is also algorithmically random read backwards.
The joint information content H(X,Y) of strings X and Y is the size of the smallest program to produce both X and Y simultaneously.
The conditional or relative information content H(X|Y) is the size of the smallest program to produce X from a minimal program for Y. Note that H(X|Y) requires Y* and not Y. Chaitin showed that:
where O(1) is a constant representing computational overhead.
The mutual information content H(X : Y) is computed by any of the following:
Strings X and Y are algorithmically independent if
It must be stressed that, despite a similarity in notation and form to Classical Information Theory, Algorithmic Information Theory deals with individual objects, while Classical Information Theory deals with the statistical behavior of information sources.
Chaitin's Halting Probability:
(Omega) [Top]
Chaitin extended the work of Turing by defining the halting
probability. Considering programs on a universal Turing machine
U that require no input, we define P as the set of
all programs on U that halt. Let |p| be the length
of the bit string encoding program p, which is any element
of P. The halting probability or number is defined:
Again, the absolute value notation |p| denotes the size or length of the bit string p.
One way to think about is that, if you feed a UTM with a random binary string
generated by a Bernouli 1/2 process (such as tossing a fair coin) as
its program,
is
the probability that the UTM will halt.
As a probability, has a value between 0 and 1. Chaitin showed that
is not only
irrational and transcendental, but is an uncomputable real number. It
is algorithmically irreducible or algorithmically
random. No algorithm exists that will compute its digits.
[The Greek and mathematical symbols are courtesy of Karen Strom of the University of Massachussetts. They may be downloaded here and used with this attribution and without fee for non-profit purposes only.]
References [Top]
[1] Chaitin, G.J., Algorithmic Information Theory. Third Printing, Cambridge University Press, 1990. Available on-line.
[2] Turing, A.M., On Computable Numbers, with an Application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, ser. 2. vol. 42 (1936-7), pp.230-265; corrections, Ibid, vol 43 (1937) pp. 544-546. Available on-line.
[3] at the SEP, Editors, "Turing Machine", The Stanford Encyclopedia of Philosophy (Spring 2002 Edition), Edward N. Zalta (ed.), URL =http://plato.stanford.edu/archives/spr2002/entries/turing-machine/
[4] Hilbert, D., Mathematical Problems: Lecture Delivered before the International Congress of Mathematicians at Paris in 1900, translated by Maby Newson, Bulletin of the American Mathematical Society 8 (1902), pp. 437-479. , HTML version by D. Joyce available on-line.
[5] K. Gödel, K., On Formally Undecidable Propositions of Principia Mathematica and Related Systems. Monatshefte für Mathematik und Physik, 38 (1931), pp. 173-198. Translated in van Heijenoort: From Frege to Gödel. Harvard University Press, 1971.
|
Previous |
Main |
Next |
3/21/03