Return to the global table of contents
To measure the similarity between stereo images, their photometric and geometric differences, or relative distortions caused by different projective views, different cameras, and discontinuities due to partial occlusions, have to be taken into account. Photometric distortions are due to non-uniform reflection of observed 3-D points in different directions, non-uniform and noisy transfer factors over a field-of-view (FOV) of each camera,and so forth. Because of these distortions,the corresponding pixels in stereo images may have different signal values. Geometric distortions are due to projecting a 3-D surface onto the two image planes and involve: (i) spatially variant disparities of the corresponding pixels and (ii) partial occlusions of some 3-D points. As a result, the corresponding regions in the images may differ in positions, scales, and orientations. Partial occlusions lead to only monocular visibility of certain surfaces, so that some image regions have no stereo correspondence in principle. If a visible surface is continuous, then geometric distortions preserve the natural x- and y-order of the binocularly visible points (abbreviated below as BVP) in the images. Due to occlusions and uniform colouring, even without photometric distortions two or more surface variants are in full agreement with the same stereopair. Therefore, stereo matching, as an ill-posed problem, must involve some regularisation.
Today's stereo matching approaches differ in (i) which similarities are measured between the images; (ii)to what extent the image distortions are taken into account; (iii) which regularising constraints and heuristics are involved, and (iv) how a stereopair is matched as a whole. All the matching techniques exploit the image signals, that is, grey values (intensities), colours, i.e. signal triples in RGB, HSI, or other colour space, or multiband signatures (signals in several and not only visible spectral bands). Usually these techniques are divided onto feature-based and intensity-based stereo.
The first group relies on specific image features such as edges, corners, isolated small areas of specific shape, or other easily detectable objects that can be individually found in each stereo image by ipreprocessing. Then, only the features are tested for similarity. Usually, a natural 3-D scene has a relatively small number of such characteristic features so that their matching is insufficient for producing a dense 3-D model of the visible surfaces. The intensity-based approaches define image similarity directly in terms of the initial signals (grey levels, colours, or multiband signatures) by using mathematical models that relate optical signals from the observed 3-D points to image signals in the corresponding pixels. The model produces a particular measure of similarity between the corresponding pixels or regions in the images to be matched. The similarity measure takes account of the admissible geometric and photometric image distortions.
The simplest model assumes (i) no local geometric distortions and (ii) either no photometric distortions or only spatially uniform contrast and offset deviations between the corresponding image signals. More specfically, it is assumed that a patch of a frontal planar 3-D surface is viewed by (photometrically) ideal cameras and produces two relatively small corresponding rectangular windows in both stereo images. Then, the similarity between the two windows is measured by summing squared differences between the signals or by computing the cross-correlation between the windows.
The above model is easily extended to account for varying x-slopes of a surface patch, namely, by exhausting relative x-expansions and contractions of both the windows and searching for the maximum similarity. An alternative way is to adapt the window size until the simplifying assumption about a horizontal surface patch is justified.
Similarity under nonuniform photometric distortions of the images is sometimes computed either by comparing intensity-independent signal characteristics (like Fourier phases) or after the nonuniformity is partially excluded due to special filtering. Alternative and computationally less complex signal models take account of both the varying surface geometry and the nonuniform signal distortions, but only along a single terrain profile. These models admit arbitrary changes of the corresponding grey values provided that the ratios of their differences remain in a given range.
Return to the local table of contents
Return to the global table of contents
3-D reconstruction is performed by searching for the maximum similarity (or minimum dissimilarity) between the corresponding regions or pixels in a stereopair. The similarity measure takes account of admissible image distortions and includes some regularising constraints, e.g. to deal with partial occlusions or multiple equivalent similarity maxima. Generally, there exist two possible scenarios for reconstructing a visible 3-D scene: to exhaust all possible variants of visible surfaces by global optimisation or to successively search for each next (and relatively small) surface patch by local optimisation in order to add it the previously found patches. For an assumed single continuous optical surface, both the variants are guided by visibility and ordering constraints.
Local optimisation needs less computation and easily takes into account both x- and y-disparities of the corresponding pixels. But it has the following drawbacks. If each local decision is taken independently from others, then the surface patches found may form an invalid 3-D surface violating visibility and continuity constraints. Conversely, if each next search is guided by the previously found surfaces, then local errors are accumulated and, after a few steps, the "guidance" may lead to completely wrong search regions for matching. In both cases, the local optimisation needs either intensive interactive on-line intervention or intensive off-line post-editing of the reconstructed 3-D scene in order to fix the errors.
Global optimisation is less sensitive to the local errors because it exploits constraints on scan-lines or on the entire images. But generally it is an NP-hard problem, so that it is feasible only in particular cases where the direct exhaustion of all the variants that has exponential complexity is avoided. For instance, this is possible when a 3-D scene is reconstructed in a profile-by-profile mode and an additive similarity measure allows for the use of dynamic programming to optimise each individual profile.