Bayesian estimation
(Stephane Pauquet)
Statistical Inference
The ultimate goal of statistics is to provide an inference about a parameter theta (q) given some observations x related to q through a probability distribution f(x|q ). The basis of statistical inference is fundamentally an inversion process, since it aims at deriving effects from causes by taking into account the probabilistic nature of the model and the influence of totally random (i.e. unexplained) factors. In both its discrete and continuous versions, Bayes' theorem formalizes this inversion, as does the notion of the likelihood function L(q|x), and as such is the unique coherent paradigm which respects the inversion perspective:
P(x)´P(q|x) = P(qx) = P(q)´P(x|q)
General Framework for the Bayesian Statistical Inference
p
(q|x) = f(x|q) p(q) / ò f(x|q ) p(q) dq ;Rearranging the terms in the first equation yields an expression for P(q|x), or the posterior probability of obtaining the parameter q given the data at hand :
P(q|x) = P(x|q) P(q) / P(x)
The Bayesian Approach
Bayesian statistics provide a conceptually simple process for updating uncertainty in the light of evidence. Initial beliefs about some unknown quantity are represented by a prior distribution. Information in the data is expressed by the likelihood function. The prior distribution and the likelihood function are then combined to obtain the posterior distribution for the quantity of interest. The posterior distribution expresses our revised uncertainty in light of the data,
in other words an organized appraisal in the consideration of previous experience.
To propose a distribution of the unknown parameters of a statistical model may be characterized as a probabilization of uncertainty, i.e. as an axiomatic reduction from the notion of unknown to the notion of random, (as opposed to the conventional dichotomic investigation over whether or not parameters are equal to zero). In this manner, the Bayesian approach allows us to make direct probability statements about unique and singular systems, whereas classical statistics, concerned with the long run performance of inferential procedures (tests or confidence intervals) in some hypothetically infinite sequence of applications, can hardly be applied to any unique case.
Thus, the Bayesian paradigm induces a dramatic shift in the interpretation of probabilities and their associated random variables: whereas to a frequentist "probability" can only refer to the result of an infinite series of trials under identical conditions, a Bayesian interprets probabilities to refer to the observer's degree of belief.
The Prior Distribution:
3 interpretations of the prior distribution:
Since Bayesian inference is an iterative process, even the posterior probability distribution of a parameter can be used as a prior for a new set of experiments, should further refinement of an estimate or additional hypothesis testing be required. But if the prior dominates the likelihood, the experiment is likely to be irrelevant, since that implies the existence of more prior information than the subsequent testing can supply to influence posterior estimates.
The Likelihood Principle/Function
The principle of including only the actual data in the analysis and excluding consideration of all other sample-space possibilities is known as the "likelihood principle", and is at the basis of Bayesian inference. Hence, the concepts of significance level and test power play no role in Bayesian statistics.
The information brought by an observation x about a parameter q is entirely contained in the likelihood function L(q |x), and thus there are 2 key elements in the Bayesian approach to statistics:
(The battle lines between Bayesians and Frequentists are drawn around these 2 elements: because it involves a sample-space probability, the use by the latter of P-values to draw conclusions violates the likelihood principle.)
Bayesian Decision Theory
Risk and Loss
The Bayesian approach integrates on the space Q since q is unknown, instead of integrating on the space C as x is known. It relies on the posterior expected loss:
P(p,d|x) = *IEp [L(q,d)|x]
= ò Q L(q,d) p(q |x) dq
[* expectation of L(q,d) for the distribution of q conditionally on x, p(q |x)].
...which minimizes the error (i.e. the loss) according to the posterior distribution of the parameter q, conditionally on the observed value x.
It follows that an estimator minimizing the integrated risk r(p,d) can be obtained by selecting, for every x Î X, the value d(x) which minimizes the posterior expected loss, p(p,d|x), since:
r(p,d) = ò X p(p,d(x)|x) m(x) dx
A Bayes estimator associated with a prior distribution p and a loss function L is an estimator dp which minimizes r(p,d). For every x Î X, it is given by dp(x), argument of mind p(p,d|x). The value r(p) = r(p,dp) is then called the Bayes risk.
The incorporation of these two concepts in Bayesian Decision Theory represents the attempt to minimize some risk or loss under the most favorable circumstances imaginable. The use of loss functions forces parties involved in the decision to specifically address the cost of errors. The estimator that has the smallest Bayes risk is then referred to as a Bayes estimator.
Quadratic loss: The Bayes estimator dp associated with the prior distribution p and with the quadratic loss function is the posterior expectation:
dp
(x) = IEp [q |x] = òQ q f(x|q ) p(q ) dq / f(x|q ) p(q ) dqThe Bayes estimator d p associated with p and with the weighted quadratic loss:
L(q , d ) = w (q ) (q -d )2
where w (q ) is a nonnegative function, is
dp
(x) = IEp[w (q ) q|x] / IEp[w (q )|x]
Absolute error loss: The Bayes estimator d p under absolute error loss:
L(q, d) = ÷d p -q÷
is the median of the posterior distribution p(q|x).
Admissibility
If a prior distribution p is strictly positive on Q, with finite Bayes risk function, R(q,d), is a continuous function of q for every d, the Bayes estimator dp is admissible (also if there exists a unique minimax estimator, see below.).
A generalized Bayes estimator, dp, is admissible when:
r(p ) = òQ R(q, dp) p(q) dq
Minimaxity
In the context discussed above, the minimax criterion appears as an "insurance against the worst case", as it aims at minimizing the expected loss in the least favorable case. Literally, it aims at minimizing the maximum risk. It is a very conservative approach*, inherent to frequentist statistics and used only marginally by Bayesians (the minimax rule, which does not depend on a prior probability distribution, is equivalent to the Bayes decision rule that uses the prior probability distribution associated with the highest expected risk).
The notion of minimaxity provides a good illustration of the conservative aspects of the frequentist paradigm. Since this approach refuses to make any assumption on the parameter q, it has to consider the "worst" cases as equally likely and then needs to focus on the maximal possible risk.
The Bayes risks are always smaller than the minimax risk, such that:
R = sup r(p ) = sup inf r(p , d ) < (or =) R = inf sup R(q ,d )
* example (Robert, 1994, p.55): "The first oil-drilling platforms in the North Sea were designed according to a minimax principle. In fact, they were supposed to resist the conjugate action of the worst gale and the worst storm ever observed, at the minimal record temperature. This strategy obviously gives a comfortable margin of safety but is quite costly. For more recent platforms, engineers have taken into account the distribution of these weather phenomena in order to reduce the production cost."
Benefits of the Bayesian choice
The prior distribution points out a unique advantage of Bayesian methods: an explicit framework for incorporating prior knowledge into an analysis. Bayesian theory takes into account differences between observers, thus allowing the investigator to examine the differences in assessments and decisions due to varying amounts of information, and hence to measure the value of additional information. Further more, loss functions also give greater flexibility than the hypothesis-testing framework because they allow consideration of a range of outcomes rather than only 2 (the null and alternative hypotheses).Human errors in judgment, especially in assessing the importance of the data relative to existing info, can be reduced.
Bayesian Contras
However, if the Bayesian formalization of the scientific process is not done well, it can easily make matters worse. Moreover, in the science of ecology, with the current state of data-analytic technology, it often cannot be done even by the scientists who have access to the best prior information. Communication problems also constitute a threat for this method which relies heavily on the sharing of information: Can a knowledge that required an ecologist years or decades of study to acquire be passed totally intact to the statistician? And of course, subjectivity is nothing but a pretext for all kinds of abductions, including the often tempting choice of the most advantageous procedures.
It can actually be argued the opposite way, namely, that the Bayesian approach is essentially more objective than other inferential methods, because, first, it separates the different subjective inputs of the inferential process (sample distribution, prior, loss function), thus leaving ground for possible modifications, and then it develops in addition objective tools to assess the influence of the prior distribution (noninformative distributions, sensitivity analysis, etc...)
More technically, although all posterior quantities are automatically defined as integrals with respect to the posterior distribution, it may be quite difficult to provide a numerical value in practice, and, in particular, an explicit form of the posterior distribution cannot always be derived.
Frequentist contras
First, any null hypothesis can be rejected (and similarly any significant test made significant) by choosing an appropriately large sample size. And finding a significant difference does not make any statement about the magnitude of the difference, something that is usually of great importance to ecologists. Second, the frequentist approach does not give us the answer to what we want to ask (what are the relative probabilities of the competing hypotheses?) and its paradigm is not reductive enough to lead to a single optimal estimator.
95% intervals: with the classical procedure, whether determining a 95% interval or a P-value, it is not the parameter q which belongs to an interval with probability 95% conditionally on x, but the interval derived from x which contains the fixed value q with probability 0.95. The nonrepeatability of most practical experiments comes to question this frequentist point of view.
A Bayesian reading list
Pioneering Bayesian books include:Box, G. E. P. and G. C. Tiao (1973). Bayesian Inference in Statistical Analysis. Reading, MA: Addison-Wesley.
de Finetti, B. 1970/1974. Teoria delle Probabilita 1. Turin: Einaudi. English translation as Theory of Probability 1 in 1974, Chichester: Wiley.
de Finetti, B. 1970/1975. Teoria delle Probabilita 2. Turin: Einaudi. English translation as Theory of Probability 2 in 1975, Chichester: Wiley.
de Finetti, B. 1972. Probability, Induction and statistics. Chichester: Wiley.
De Groot, M. H. 1970. Optimal Statistical Decisions. New-York: Mc Graw-Hill.
Dubins, L. E. and L. J. Savage. 1965/1976. How to Gamble if you Must: Inequalities for Stochastic Processes. New-York: Mc Graw-Hill. Second edition in 1976. New-York: Dover.
Jeffreys, H. 1939/1961. Theory of probability. Oxford: University Press. Third edition in 1961, Oxford: Good, I. J. 1965. The Estimation of Probabilities. An essay on modern Bayesian methods. Cambridge, Mass: The MIT Press.University Press.
Keynes, J. M. 1921/29. A Treatise on Probability. London: Macmillan. Second edition in 1929, London: Macmillan. Reprinted in 1962. New-York: Harper and Row.
Laplace, P. S. 1812. Theorie Analytique des probabilites. Paris: Courcier. Reprinted as Oeuvres completes de Laplace7, 1878-1912. Paris: Gauthier-Villars.
Lindley, D. V. 1965. Introduction to Probability from a Statistical Viewpoint. Cambridge: University Press.
Lindley, D. V. 1972. Bayesian Statistics, a Review. Philadelphia, PA: SIAM.
Mosteller, F. and D. L. Wallace. 1964. Inference and Disputed Authorship: the federalist. Pratt, J. W., H. Raiffa, and R. Schlaifer. 1965. Introduction to Statistical Decision Theory. New-York: Mc Graw-Hill.
Raiffa, H. and Schlaifer, R. 1961. Applied Statistical Decision Theory. Boston: Harvard University.
Savage, L. J. 1954/1972. The Foundations of Satistics. New-York: Wiley. Second edition in 1972, New-York: Dover.
Savage, L. J. 1962. The Foundations of Statistical Inference: a Discussion. London: Methuen.
Schlaifer, R. 1959. Probability and Satistics for Business Decisions. New-York: Mc Graw-Hill.
Schlaifer, R. 1961. Introduction to Statistics for Business Decisions. New-york: Mc Graw-Hill.
Tribus, M. 1969. Rational Descriptions, Decisions and Designs. New-York: Pergamon.
Elementary and intermediate Bayesian textbooks include those of:
Bernardo, J. M. 1981. Bioestadistica, une Perspectiva Bayesiana. Barcelona: Vicens-Vives.
Berry, D. A. 1994. Bayesian Biostatistics. Belmont, CA: Duxbury.
Cifarelli, D. M. and P. Muliere. 1989. Statistica Bayesiana. Pavia: G. Iuculano.
Iversen, G. R. 1984. Bayesian Statistical Inference. Beverly Hills, CA: Sage.
Kleiter, G. D. 1980. Bayes-Statistik: Grundlagen und Anwendungen. Berlin: W. de Gruyter.
Lavalle, I. H. 1970. An Introduction to Probability, Decision and Inference. Toronto: Holt, Rhinehart and Winston.
Lee, P. M. 1997. Bayesian Statistics. An Introduction. John Wiley and Sons, Inc.
Lindley, D. V. 1971/1985. Making Decisions. Second edition in 1985, Chichester: Wiley.
O'Hagan, A. 1988. Modelling with heavy tails. Bayesian Statistics 3. (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.). Oxford: University Press, 345-359 (with discussion).
Press, S. J. 1989. Bayesian Statistics. New-York: Wiley.
Savage, I. R. 1968. Statistics:Uncertainty and Behavior.Boston: Houghton Miffin.
Schmitt, S. A. 1969. Measuring Uncertainty: an ElementaryIntroduction to Bayesian Statistics. Reading, MA: Addison-Wesley.
Winkler, R. L. 1972. Introduction to Bayesian Inference and Decision. Toronto: Holt, Rhinehart and Winston.
More advanced Bayesian monographs include:
Berger, J. O. 1985. Statistical Decision Theory and Bayesian Analysis. Berlin: Springer.
Florens, J. P., M. Mouchart, and J.-M. Rolin. 1990. Elements of Bayesian Statistics. New-York: Marcel Dekker.
Hartigan, J. A. 1983. Bayes Theory.Berlin: Springer.
O'Hagan, A. and H. Le. 1994. Conflicting Information and a class of bivariate heavy-tailed distributions. In Aspects of Uncertainty: a Tribute to D. V. Lindley. (P. R. Freeman, and A. F. M. Smith, eds.). Chichester: Wiley.
Robert, C.P. 1992. LAnalyse Statistique Bayesienne. Paris: Economica.
Robert, C.P. 1994. The Bayesian Choice. A Decision-Theoretic Motivation. Springer: New-York.
Savchuk, V. P. 1989. Bayesovskiye Metodi Statisticheskogo Ostenivaniys. Moscow: Nauka.
Smith, J. Q. 1988. Decision Analysis: A Bayesian Approach. New-York: Chapman and Hall.
Polson and Tiao (1994) propose a collection of classic papers in: Bayesian inference. Aldershot: Edward Elger.
This page was last updated on 09/26/02