Reconstruction
of 3D scenes from stereo pairs is based on matching of corresponding
points in the left and right images. Generally, reconstruction is an
ill-posed inverse optical problem because many optical surfaces may
produce the same stereo image pair due to homogeneous texture, partial
occlusions and optical distortions. To regularise the problem in order
to obtain a unique solution close to human visual perception, specific
constraints on surfaces need to be imposed. Almost all existing stereo
reconstruction algorithms search for a single optical surface yielding
the best correspondence between the images under constrained surface
continuity, smoothness, and visibility conditions. Typically, most of
the constraints are ‘soft’, i.e. allow for
deviations, and the matching score is an
ad hoc
linear
combination of
individual criteria of signal similarity, surface smoothness, and
surface visibility (or occlusions) with empirically chosen weights for
each criterion. The resulting complex optimisation problem is solved
using different exact or approximate techniques, e.g. dynamic
programming, belief propagation or graph min-cut algorithms. However,
the heuristic choice of the weights in the matching score strongly
influences the reconstruction accuracy. In addition, natural stereo
pairs contain many admissible matches, so that the
‘best’ matching that optimises the score may not
lead to correct decisions. Moreover, real scenes very rarely consist of
a single surface, so this assumption is also too restrictive.
The thesis develops an alternative approach
to 3D stereo reconstruction
called Noise-driven Concurrent Stereo Matching (NCSM). The family of
algorithms that implement the NCSM paradigm clearly separate image
matching from a subsequent search for optical surfaces. First, a hidden
noise model which allows for mutual photometric distortions of images
and matching outliers is estimated and then used to search for the
candidate volumes by detecting all likely image matches. The selection
of the 3D candidate volumes performed by image-to-image matching at a
set of fixed depth, or disparity, values abandons the conventional
assumption that a single best match has to be found. Then, the
reconstruction proceeds from most likely foreground surfaces to the
background ones (accounting for occlusions in the process), enlarging
corresponding background volumes at the expense of occluded portions
and selecting consistent optical surfaces that exhibit high point-wise
signal similarity. A family of the NCSM based algorithms demonstrates
high quality 3D reconstruction from various stereo pairs. Detailed
analyses and comparisons show that the NCSM framework yields results
competitive with those from the best-performing conventional algorithms
on test stereo pairs with no contrast deviations but notably
outperforms these algorithms in the presence of large contrast
deviations.
Acknowledgments
First of all I would like to express my
gratitude to my supervisor,
Associate Professor Georgy Gimel'farb, for all the invaluable help,
support, commitment and enthusiasm he has given me throughout my PhD
study. I have learned much about computer vision from his vast
knowledge of the field. Without his support the completion of this
project would not have been possible and it would certainly have
been less enjoyable.
I would like to thank Associate Professor
John Morris and Dr.
Patrice Jean Delmas for sharing their expertise and always being
available to discuss our results and providing interesting
suggestions.
Thanks must also go to my family and my wife
Jingyi Li. They have
provided unconditional support and encouragement during the last few
years.
I would like to thank all my friends and
fellow students for their
help, input and motivation.