Subsections
Theoretical and experimental analysis of
some best preforming
current computer stereo matching techniques led to the development
of a new alternative approach to 3D stereo reconstruction, called
Noise-driven Concurrent Stereo Matching (NCSM). This framework
reduces drawbacks of more conventional approaches due to more
general image noise models and less restrictive matching goals and
also the framework gives promising results because it separates the
initial ill-posed problem into two well-posed problems to be solved
sequentially. The NCSM separates 3D reconstruction into
two
independent stages:
- the image noise estimation in order to
outline spatial
candidate volumes being equivalent from the standpoint of image
matching under the noise and
- the selection of one or more surfaces
which closely fit
these volumes with due account of partial occlusions of background
objects with foreground ones. This framework circumvents the ``best
match" or ``closest similarity" criteria exploited in almost all
existing matching strategies in favour of a likely match criterion
based on a local model of signal noise.
A family of the NCSM based algorithms
developed presented here
demonstrates high quality 3D reconstruction from various stereo
pairs. Detailed analyses and comparisons show that the NCSM
framework yields results competitive with those from the
best-performing conventional algorithms on test stereo pairs with no
contrast deviations but notably outperforms these algorithms in the
presence of large contrast deviations. At the same time, the linear
computational complexity (
for NCSM-SDPS and
for
NCSM-ITER)5.1
of the NCSM based techniques is notably lower
than of the best-performing conventional algorithms (for example,
the minimum-cut algorithms are of complexity
where
is the image size and thus are very slow in practice even on a
moderate-size stereo pair).
The NCSM framework rules out two unrealistic
assumptions which
appear explicitly or implicitly in most of the known approaches to
the computational binocular stereo problem, namely, (i)
an
assumed single continuous opaque surface to be reconstructed and
(ii) an assumed solution framework based on a the
``best
match" or "closest similarity" between corresponding areas in images
that represent the same binocularly visible parts of the surface.
Almost all conventional binocular stereo algorithms search for a
single optical surface that yields the best correspondence between
the images of a stereo pair under the constrained surface
continuity, smoothness and visibility. However, as underscored in
Chapter
,
almost all the real 3D scenes contain
multiple disjoint optical surfaces. Thus the assumption of a single
surface is too restrictive for stereo matching.
Under this assumption, conventional
algorithms cannot account for
violations of the ordering constraint and must involve heuristic
penalties to handle large jumps of disparity resulting in surface
discontinuities and partial occlusions. Empirically chosen weights
of each penalising term strongly influence the reconstruction
accuracy [11].
The NCSM framework is based on a more
realistic multilayered model of an observed 3D scene and need not
penalise discontinuities due to transitions from one candidate
volume to the next volumes. The surface fitting process is
restricted to each continuous volume and proceeds from foreground to
background with due account of possible occlusions. After each
foreground surface is found, the corresponding background volumes
are enlarged at the expense of their occluded portions, so that
mutually consistent optical surfaces yielding high point-wise signal
similarity are selected.
Experiments with real
stereo pairs presented in Chapter
have shown that
``best" matching does not always correspond to the ground truth.
Conventional stereo algorithms rely on the ``best" matching due to
too simplistic noise models. Optimal statistical decision rules
based on these models lead to various energy minimisation schemes
with different energy functions that quantitatively specify signal
dissimilarity and surface imperfection. In particular, dynamic
programming algorithms produce the ``best matching" epipolar profile
(1D surface cross-section) modelled as a Markov chain of successive
heights along the profile. The graph minimum-cut and belief
propagation algorithms provide a close approximation of a ``best
matching" 2D surface under shape constraints, the surface being
modelled as a 2D Markov random field of heights. Although all of
them assume that the ``best" matching of stereo images is the
ultimate goal of computational binocular stereo, each real stereo
pair contains a big many equally admissible matches. Thus the
selection of only the ``best" matches may lead to many incorrect
decisions.
Thus the computationally ``best" match is
not always the best
selection, especially for occluded regions where no ``best" matching
exists at all. The NCSM framework circumvents the ``best matching"
criteria in favour of the more realistic selection of all the likely
matches that follow from a detailed image noise model.
The umbrella term ``noise"
relates to all deviations between the corresponding signals in
stereo images. The noise arises from multiple sources including
random variations of sensitivity of optical sensors, non-Lambertian
surface reflection, specific impacts of geometry of stereo
observation (e.g. occlusions), etc. Although stereo matching
criteria and strategies obviously depend on all the noise
components, most of the conventional stereo algorithms account only
for a very simple and thus unrealistic models of random pixel noise
like statistically independent normal or uniform deviations. This is
why these algorithms totally fail under more realistic noise models
including spatially constant or variant contrast and offset
deviations.
The NCSM algorithms use two schemes for noise estimation at the
first stage. The NCSM-SDPS algorithm takes account of
possible contrast and offset distortions combined with independent
intensity random deviations and occlusions along each epipolar 2D
profile represented by the conjugate epipolar lines. However, it
does not exploit the inter-dependence of these distortions across
the set of profiles forming a 3D surface, i.e. the inter-dependence
across the scan-lines in the images. The second algorithm,
NCSM-ITER, uses a more realistic spatial noise model with uniform
contrast and offset distortions for all the scene points at the same
depth level, the distortions being independent on the different
levels. Experiments
in Chapter
confirm that the latter
algorithm outperforms more conventional ones if stereo pairs have
contrast and offset distortions.
The proposed NCSM framework could be refined
by developing more
versatile models of image noise (e.g., spatially variant Markov
random field models of corresponding image signals to account for
spatial interdependence of the noise components), better detection
of the likely occluded areas in stereo images, and more powerful
surface fitting techniques suitable for slanted and curvilinear
surfaces.
The current NCSM algorithms either build
empirical probability
models of signals in occluded areas using the likely occlusions
detected by the symmetric DP stereo (NCSM-SDPS), or
assume such signals are uniformly distributed at each disparity
level (NCSM-ITER). More accurate modelling of geometric
noise caused by occlusions and more theoretically justified
comparisons of these models to other noise components will reduce
errors in selection of the candidate volumes.
The current surface fitting selects only
horizontal surfaces at
fixed depth levels by comparing planar cross-sections of each volume
in a level-by-level mode. This oversimplified process works well for
a number of stereo pairs with mainly fronto-planar
surfaces (e.g.
the `Tsukuba' pair) but fails when inter-relations between the
disparity levels in the candidate volumes cannot be ignored.
Obviously, the surfaces to be reconstructed are not always parallel
to the image plane and also are not always planar. Increasing a
disparity resolution may lead to a better approximation of a
curvilinear surface by horizontal planes, but a problem is how to
extend this range for a given stereo pair. A more general surface
fitting procedure has to be developed in order to properly handle
slanted planar or curvilinear surfaces.
The NCSM framework has a high degree of
inherent parallelism with
the potential for high resolution, accurate and real-time 3D scene
reconstruction. Indeed, this framework meets requirements for
efficient hardware implementations because no complex optimisation
is involved. This would allow the NCSM framework to be extended to
stereo videos of moving 3D scenes both for 3D scene reconstruction
and motion tracking.