Brief History of Probability

Here is a brief history of probability from a Bayesian perspective with emphasis on computation, since computation has been the great enabler of Bayesian inference.

The word "probability" is derived from the word "probity". Prior to the 1600s, legal evidence had greater weight when it had probity, which was a measure of authority. A person with more authority had better evidence. Today, probability may be loosely defined as the chance that an event occurs.

Probability theory began with two French mathematicians, Pierre de Fermat (1601-1665) and Blaise Pascal (1623-1662), in 1654 regarding a question of profitability while gambling in a popular dice game. An exchange of letters between Fermat and Pascal contained the first fundamental principles of probability. The Dutch scientist Christian Huygens (1629-1695), a teacher of Leibniz(1646-1716), learned of this correspondence and published the first book on probability in 1657.

Reverend Thomas Bayes (1702-1761) developed what is now called Bayes' theorem, which was published posthumously in 1763. Bayes' theorem was the first expression of inverse probability, and is the basis of Bayesian inference. However, Bayes' theorem remained obscure after its introduction.

Unaware of Bayes, Pierre-Simon Laplace (1749-1827) independently developed Bayes' theorem and first published his version in 1774, eleven years after Bayes, in one of Laplace's first major works (Laplace, 1774, p.366-367). In 1812, Laplace introduced a host of new ideas and mathematical techniques in his book, Théorie Analytique des Probabilités. Before Laplace, probability theory was solely concerned with developing a mathematical analysis of games of chance. Laplace applied probabilistic ideas to many scientific and practical problems.

In 1814, Laplace published his "Essai philosophique sur les probabilites", which introduced a mathematical system of inductive reasoning based on probability. In it, the Bayesian interpretation of probability was developed independently by Laplace, much more thoroughly than Bayes, so some "Bayesians" refer to Bayesian inference as Laplacian inference. In this same publication, Laplace developed the the Laplace Approximation in a proof, and used it to approximate posterior moments.

Bayesian, or Laplacian, inference was widely used and taught in the 1800s, until it was attacked by Ronald A. Fisher (1890-1962) and Jerzy Neyman (1894-1981). Bayesian inference became replaced with frequentist inference, mainly due to computational limitations.

Debates arose with probability regarding objectivity vs. subjectivity. In the early 1920s, John Maynard Keynes (1883-1946) proposed the idea that probability should be interpreted as a subjective degree of belief in a proposition. The subjective interpretation of probability was developed by Frank Plumpton Ramsey (1903-1930), Bruno de Finetti (1906-1985), Leonard Jimmie Savage (1917-1971), and others. The earlier approach of Laplace became considered objectivist, and was further developed by Harold Jeffreys (1891-1989).

Harold Jeffreys published his Theory of Probability in 1939, and is credited with the beginning of the revival of Bayesian inference. During World War II, Alan Turing (1912-1954) invented a Bayesian codebreaking technique termed Banburismus, to assist in decoding the Nazi Enigma machine. It was an early form of Bayesian networks used to infer information about the settings of the Enigma machine.

Richard T. Cox (1898-1991) demonstrated in 1946 that the rules of Bayesian inference have a well-formulated axiomatic basis, unlike frequentist inference, and may be derived from a simple set of desiderata. He showed that Bayesian inference is the only inferential approach that is logically consistent. In the 1950s, Leonard Jimmie Savage (1917-1971) further popularized subjective probability.

Taking a step back to 1906, Andrej Markov (1856-1922) introduced chains. Stanislaw Ulam (1909-1984) and John von Neumann (1903-1957) developed Monte Carlo with reference to random numbers for solving numerical problems. The first publication appeared in the journal of the American Statistical Association (ASA) in 1949, co-writen by Nicholas Metropolis (1915-1999).

Metropolis and others introduced what would later be called the first Markov chain Monte Carlo (MCMC) algorithm, the Metropolis algorithm, to the Journal of Chemical Physics in 1953. MCMC later becomes a successful class of algorithms for sampling from probability distributions, and becomes crucial in the revival of Bayesian inference. The Metropolis algorithm was generalized to the Metropolis-Hastings algorithm by W. Keith Hastings, appearing in Biometrika in 1970.

In 1971, Valentin Fedorovich (or Fyodorovich) Turchin invented the Gibbs sampler. Unaware of Turchin, brothers Stuart and Donald Geman independently introduced Gibbs sampling in 1984 as a special case of the Metropolis-Hastings algorithm for image restoration. The first successful generalized Bayesian software, BUGS (Bayesian Using Gibbs Sampling), began development in 1989 in UNIX.

Alan E. Gelfand and Adrian F.M. Smith generalized Gibbs sampling in 1990, introducing it broadly to the field of statistics. Likewise, the initialism MCMC gained popularity somewhere around 1990. A version of BUGS for Windows, called WinBUGS, was introduced in 1997.

In 2010, Statisticat, LLC. began development of LaplacesDemon, a complete environment for Bayesian inference. This free software package includes dozens of MCMC algorithms, Bayes factors, the Bayesian Bootstrap, disjoint HPD intervals, elicitation, plentiful examples, faster updating with large data sets, performance comparisons between MCMC samplers, iterative quadrature, Laplace Approximation, likelihood-free estimation, lowest posterior loss (LPL) intervals, a variety of marginal likelihood calculations, multimodality functions, parallelization of MCMC, posterior predictive checks, scoring of new data sets, test statistics such as the Durbin-Watson test for autocorrelation or the Jarque-Bera test for normality, validation, variational Bayes, etc.

References

Hastings W (1970). "Monte Carlo Sampling Methods Using Markov Chains and Their Applications." Biometrika, 57(1), 97-109.
Jeffreys H (1961). Theory of Probability. Third edition. Oxford University Press, Oxford, England.
Laplace P (1774). "Memoire sur la Probabilite des Causes par les Evenements." l'Academie Royale des Sciences, 6, 621-656. English translation by S.M. Stigler in 1986 as "Memoir on the Probability of the Causes of Events" in Statistical Science, 1(3), 359-378.
Laplace P (1812). Theorie Analytique des Probabilites. Courcier, Paris. Reprinted as "Oeuvres Completes de Laplace", 7, 1878-1912. Paris: Gauthier-Villars.
Laplace P (1814). "Essai Philosophique sur les Probabilites." English translation in Truscott, F.W. and Emory, F.L. (2007) from (1902) as "A Philosophical Essay on Probabilities". ISBN 1602063281, translated from the French 6th ed. (1840).
Metropolis N, Rosenbluth A, MN R, Teller E (1953). "Equation of State Calculations by Fast Computing Machines." Journal of Chemical Physics, 21, 1087-1092.
Turchin, VF (1971). "On the Computation of Multidimensional Integrals by the Monte Carlo Method", Theory of Probablility and its Applications, 16(4), 720-724.

Numbers, Alphabet, Symbol, Geometry and The Universe

Brief History of Probability

Here is a brief history of probability from a Bayesian perspective with emphasis on computation, since computation has been the great enabler of Bayesian inference.

References

Site Tools

Categories