Bayesian Decision Theory
Risk and Loss
The Bayesian approach integrates on the space Q since q is unknown, instead of integrating on the space C as x is known. It relies on the posterior expected loss:
P(p,d|x) = *IEp [L(q,d)|x]
= ò Q L(q,d) p(q |x) dq
[* expectation of L(q,d) for the distribution of q conditionally on x, p(q |x)].
...which minimizes the error (i.e. the loss) according to the posterior distribution of the parameter q, conditionally on the observed value x.
It follows that an estimator minimizing the integrated risk r(p,d) can be obtained by selecting, for every x Î X, the value d(x) which minimizes the posterior expected loss, p(p,d|x), since:
r(p,d) = ò X p(p,d(x)|x) m(x) dx
A Bayes estimator associated with a prior distribution p and a loss function L is an estimator dp which minimizes r(p,d). For every x Î X, it is given by dp(x), argument of mind p(p,d|x). The value r(p) = r(p,dp) is then called the Bayes risk.
The incorporation of these two concepts in Bayesian Decision Theory represents the attempt to minimize some risk or loss under the most favorable circumstances imaginable. The use of loss functions forces parties involved in the decision to specifically address the cost of errors. The estimator that has the smallest Bayes risk is then referred to as a Bayes estimator.
Quadratic loss: The Bayes estimator dp associated with the prior distribution p and with the quadratic loss function is the posterior expectation:
dp
(x) = IEp [q |x] = òQ q f(x|q ) p(q ) dq / f(x|q ) p(q ) dqThe Bayes estimator d p associated with p and with the weighted quadratic loss:
L(q , d ) = w (q ) (q -d )2
where w (q ) is a nonnegative function, is
dp
(x) = IEp[w (q ) q|x] / IEp[w (q )|x]
Absolute error loss: The Bayes estimator d p under absolute error loss:
L(q, d) = ÷d p -q÷
is the median of the posterior distribution p(q|x).
Admissibility
If a prior distribution p is strictly positive on Q, with finite Bayes risk function, R(q,d), is a continuous function of q for every d, the Bayes estimator dp is admissible (also if there exists a unique minimax estimator, see below.).
A generalized Bayes estimator, dp, is admissible when:
r(p ) = òQ R(q, dp) p(q) dq
Minimaxity
In the context discussed above, the minimax criterion appears as an "insurance against the worst case", as it aims at minimizing the expected loss in the least favorable case. Literally, it aims at minimizing the maximum risk. It is a very conservative approach*, inherent to frequentist statistics and used only marginally by Bayesians (the minimax rule, which does not depend on a prior probability distribution, is equivalent to the Bayes decision rule that uses the prior probability distribution associated with the highest expected risk).
The notion of minimaxity provides a good illustration of the conservative aspects of the frequentist paradigm. Since this approach refuses to make any assumption on the parameter q, it has to consider the "worst" cases as equally likely and then needs to focus on the maximal possible risk.
The Bayes risks are always smaller than the minimax risk, such that:
R = sup r(p ) = sup inf r(p , d ) < (or =) R = inf sup R(q ,d )
* example (Robert, 1994, p.55): "The first oil-drilling platforms in the North Sea were designed according to a minimax principle. In fact, they were supposed to resist the conjugate action of the worst gale and the worst storm ever observed, at the minimal record temperature. This strategy obviously gives a comfortable margin of safety but is quite costly. For more recent platforms, engineers have taken into account the distribution of these weather phenomena in order to reduce the production cost."