1.4.4

In formal models, as opposed to heuristic ones, an energy function is formulated based on an established criterion. Because of inevitable uncertainties in vision processes, principles from statistics, probability and information theory are often used as the formal basis.

When we have the knowledge about the data distribution but no
appreciable prior information about the quantity being estimated, we may
use the * maximum likelihood* (ML) criterion. When the situation is the
opposite, that is, when we have only prior information, then we may use
the * maximum entropy* criterion.
Distributions of higher entropy are more likely because nature can
generate them in more ways and the maximum entropy criterion is simply
taking this fact into account [Jaynes 1982].

Neither of these two methods is adequate for problems where we know both
prior and likelihood distributions. With both sources of information
available, the best we can get is that maximizes a Bayes criterion.
There are two forms of such estimate often used in practice: that of the
MAP probability and that of maximum * a
posteriori* mean. The maximizer of the posterior marginals (MPM)
[Marroquin 1985 ; Marroquin et al. 1987] provides an alternative Bayes
estimator. Although there have been philosophical and scientific
controversies about their appropriateness in inference and decision
making (see [Clark and Yuille 1990] for a short review), Bayes criteria are
among the most popular ones in computer vision and in fact, MAP is the
most popular criterion in optimization-based MRF modeling. The
equivalence theorem of between Markov random fields and Gibbs
distribution established in Section 1.2.4 provides a
convenient way for specifying the joint prior probability, solving a
difficult issue in MAP-MRF labeling.

In the principle of * minimum description length* (MDL)
[Rissanen 1978 ; Rissanen 1983], the
optimal solution to a problem is that needs the smallest set of
vocabulary in a given language for explaining the input data. The MDL
has close relationships to the statistical methods such as the ML and
MAP [Rissanen 1983]. For example, if is related to the
description length and related to the description error,
then MDL is equivalent to MAP. However, it is a more natural and
intuitive when prior probabilities are not well defined. The MDL has
been used for vision problems at different levels such as segmentation
[Leclerc 1989 ; Pentland 1990 ; Darrell et al. 1990 ; Dengler 1991 ; Keeler 1991] and
object recognition [Breuel 1993].