In correlation-based methods, the elements to match are image windows of fixed size, and the similarity criterion is a measure of the correlation between windows in the two images. The corresponding element is given by the window that maximises the similarity criterion (called also similarity measure, or score) within a search region. Correlation-based stereo matching is based on simplifying assumptions:
Let g1 =
(g1:x,y: x = 0,1,…,M1−1;
y = 0,1,…,n1−1) and
g1 =
(g2:x,y: x = 0,1,…,M2−1;
y = 0,1,…,n2−1) be images of a stereo pair.
Let the same rectangular window with sides being parallel to the image coordinate axes be placed onto
g1 and g2 so that the window
centres coincide with a pixel (x′,y′) in g1 and
a pixel (x,y) in g2:
Suppose the window in g1 is fixed whereas the window in g2
can be placed to different candidate positions (x,y) within a search region.
For every such position, the
dissimilarity between the two windows
is computed by minimising the squared distance between the corresponding signals in
the windows with respect to allowable relative distortions, e.g. to
the uniform contrast and offset distortions of the window in g2 comparing
to the fixed window in g1. Indices i and j below denote
relative pixel coordinates relative to the centre of the window;
m1:x′,y′, S11:x′,y′
and m2:x,y,
S22:x,y
are the mean signal value and
the total squared signal deviation over the window in
each image, and S12x′,y′,x,y is the
cross-product of the two centred windows:
The cross-correlation is most frequently
used as similarity measure instead of the whole distance-based dissimilarity measure
.
For a window centred in position (x′,y′) in an image g1 of a stereo pair, the search for for the minimum dissimilarity or the maximum correlation exhausts positions (x,y) of the window centre within a search region SR in the other image, g2. Let (x*,y*) be the optimum position found in g2:
The factor α∗ specifying relative contrast of one window with respect to the other, and the signal variances S11:x′,y′ and S22:x,y should be restricted for excluding inadequate matching results. In particular, if both the variance S11:x′,y′ and the cross-product S12:x′,y′,x,y are equal to zero due to an uniform window in g1 with constant signals, g1:x+i,y+j = const, then such a window is similar to any arbitrary window in the other image provided that zero contrast is allowed. In practice, the contrast between corresponding parts in stereo images is changing in a relatively narrow range αmin ≤ α ≤ αmax, and the matching scope has to take this constraint into account:
Return to the local table of contents
Return to the global table of contents
The correlation matching is easily implemented, but its straightforward implementation is inefficient due to redundant computations in the overlapping windows. For an image size MN, window size S0, and search region size Δ, the complexity of straightforward implementation is O(MNS0Δ) assuming that M1=M2=M and n1=n2=n. But it can be reduced to O(MNΔ), i.e. made independent of the window size, if sums of signals, their products and squares can be stored. For instance, let an array Ψ = (ψu,v: u = 0,1,…,M−1; v = 0,1,…,n−1) keep the sums of squared signals of g1:
![]() |
Left - the straighforward sum in the window 5×4: 0+0+1+2+0+0+1+1+0+0+1+2+0+0+2+1+0+0+1+2 = 14 |
Right - the fast sum in the same window using the accumulator: 62−40−23+15 = 14 |
Return to the local table of contents
Return to the global table of contents
The simplest symmetric correlation matching assumes canonical epipolar geometry with only frontal planar surfaces being parallel to the image plane to in the object space, independent contrast (α) and offset (β) signal deviations in the images with respect to an unknown noiseless cyclopean image g of the surface, and additive independent normal noise in every image with zero mean and the same variance. Provided that the matching score minimises the maximum of the two squared distances between the image signals and closely adapted to them signals of the cyclopean image, the dissimilarity score is derived as follows (here, d is the disparity between the positions of the windows in both images such that the window cemtres are projected to the position (x,y) in the object space):
The canonical (epipolar) geometry of stereo images and the symmetric matching allows us to perform the search in the cyclopean object space that exhausts positions of a rectangular frontal planar patch in the object space centred at the cyclopean location (x,y). For each disparity value d, the patch is represented by two windows in the left and right images centred at locations (x+d/2,y) and (x+d/2,y), respectively. The so-called least-squares correlation extends such matching to a more general case of slant planar patches. Projective transformations of the patches are approximately represented by affine parameters, e.g. x" = α1x′ + α2y′ + α3; and y" = α4x′ + α5y′ + α6 that establish correspondence between the left window and the deformed right window. If the approximate value of these parameters are known, the linearised signal model (i.e. the Taylor's expansion restricted to only the linear term) allows for analytical refinement of them. The refinement uses approximate partial derivatives of the signals in the windows which are smoothed to suppress the image noise.
Although the least-squares correlation and various attempts to heuristically adapt the window size to stereo images may improve in some cases the matching results, generally the correlation-based stereo has serious shortcomings because it cannot appropriately account for realistic geometric differences between the images. The conjugate images of a stereo pair are perspective projections of the 3-D scene, and surface slope, discontinuities, and different positions and orientations of the cameras result in geometric distortions that influence image correspondence. The assumed frontal planes are too imprecise approximations of natural surfaces. Also, the correlation does not work well if the windows to be matched are either almost uniform (a dominant low frequency content) or too textured (a dominant high frequency content). Moreover, it is not obvious how to choose an adequate window size and shape in each position because they depend on the unknown observed optical surface.