next up previous index
Next: Research Issues Up: Introduction Previous: Hierarchical GRF Model


Optimization-Based Vision


Optimization has been playing an essential and important role in computer vision. Most vision problems are formulated as optimizing a criteria, explicitly or implicitly. The extensive use of optimization principles is due to various uncertainties in vision processes, such as noise and occlusion in the sensed image and ambiguities in visual interpretation. Exact or perfect solutions hardly exist. Inexact but optimal (in some sense) solutions are usually sought instead.

In the pioneer vision system of Roberts [Roberts 1965], object identification and pose estimation are performed using the simplest least squares (LS)   fitting. Nowadays, optimization is pervasive in all aspects of vision, including image restoration and reconstruction [Grimson 1981; Terzopoulos 1983a; Geman and Geman 1984; Leclerc 1989; Hung et al. 1991], shape from shading [Ikeuchi and horn 1981], stereo, motion and optical flow [Ullman 1979a; Horn and Schunck 1981' Hildreth 1984; Murray and Buxton 1987; Barnard Jain 1987], texture [Hassner and Slansky 1980; Kashyap et al. 1982; Cross and Kain 1983], edge detection [Torre and Poggio 1986; Tan et al. 1992], image segmentation [Silverman and Cooper 1988; Li 1990a], perceptual grouping [Lowe 1985; Mohan and Nevatia 1989; Herault and Horaud 1993], interpretation of line drawings [Leclerc and Fischler 1992], object matching and recognition [Fischler and Elschlager 1973; Davis 1979; Shapiro and Haralick 1981; Bhanu and Faugeras 1984 ; Ben-Arie and Meiri 1987 ; Modestino and Zhang 1989 ; nasrabadi et al. 1990 ; Wells III 1991 ; Friedland and Rosenfeld 1992 ; Li 1992a ; Li 1994a], and pose estimation [Haralick et al. 1989].

In all of the above cited examples, the solution is explicitly defined as an optimum of an objective function by which the goodness, or otherwise cost, of the solution is measured. Optimization may also be performed implicitly: the solution may optimize an objective function but in an implicit way which may or may not be realized. Hough transform [Hough 1962 ; Duda and Hart 1972 ; Ballard 1981 ; Illingworth and Kittler 1988] is a well-known technique for detecting lines and curves by looking at peaks of an accumulation function. It is later found to be equivalent to template matching [Stockman and Agrawala 1977] and can be reformulated as a maximizer of some probabilities such as the likelihood [Haralick and Shapiro 1992]. Edge detection was performed using some simple operators like derivatives of Gaussian [Rosenfled and Kak 1976]. The operators can be derived by using regularization principles in which an energy function is explicitly minimized [Poggio et al. 1985].

The main reason for the extensive use of optimization is the existence of uncertainties in every vision process. Noise and other degradation factors, such as caused by disturbances and quantization in sensing and signal processing, are sources of uncertainties. Different appearances and poses of objects, their mutual and self occlusion and possible shape deformation also cause ambiguities in visual interpretation. Under such circumstances, we can hardly obtain exact or perfect solutions and have to resort to inexact yet optimal solutions.

Because of the importance of optimization, it is crucial to study vision problems from the viewpoint of optimization and to develop methodologies for optimization-based vision modeling. The following presents discussions on optimization-based vision.