COMPSCI773: Vision Guided Control

Stereo Matching by Graph-cut Techniques

Maximum flow problem
Ford-Fulkerson max-flow algorithm
Goldberg-Tarjan push-label max-flow algorithm
Energy minimization via graph cuts: a binary case
Energies being minimized by graph cuts
Approximate minimum multiway cut and k-cut
Large moves using min-cut / max-flow techniques
Optimality of large moves

The search for a whole continuous 3-D surface making the minimum dissimilarity or maximum similarity between corresponding points of a stereo pair can be considered as combinatorial optimisation on graphs. The graphs describe binary relationships between the neighbouring disparities and signals that restrict the solution. Typically, the graph specifies neighborhoods, i.e. subsets of mutually dependent pixels or points, and weights of its nodes and edges characterize possible image signals and/or disparities. The surfaces are evaluated in terms of local and global energies accumulating the weights of nodes and edges. Generally, under dense 2D neighborhoods, an optimal minimum-energy solution is an NP-complete problem. But it can be at least approximately solved or sometimes is reduced to an exactly solvable particular case.

Maximum flow problem

Let G = [N;E] denote a directed (linear) graph with a collection of nodes (vertices, points) N = {x_a, x_b, x_c, …} and a subset E ⊆ N × N of ordered pairs (edges, arcs) (x_α,x_β) of elements from N. A chain is a sequence of nodes x₁, x₂, … such that (x_i, x_i+1) ∈ E. A path is a sequence x₁, x₂, … such that (x_i, x_i+1) ∈ E or (x_i+1, x_i) ∈ E. Let A(x) = {y | y ∈ N; (x,y) ∈ E} and B(x) = {y | y ∈ N; (y,x) ∈ E} be subsets of the nodes after x and before x, respectively.

A network is a graph with two special nodes s and t called source and sink, respectively, and with a non-negative capacity c(x,y) ≥ 0 assigned to every edge (x,y) ∈ E. The function c: E → R^≥0, where R^≥0 is the set of non-negative real numbers, is called the capacity function.

A static flow of value v from s to t in a network G = [N;E] with a capacity function c is a function f: E → R^≥0 that satisfies the linear constraints:

The flow through every edge does not exceed the edge capacity: ∀_{(x,y) ∈ E} f(x,y) ≤ c(x,y)
Every node other than the source and the sink has equal total out- and inflows:

A simple example below shows the network flow of value 3 assuming that f(x,y) ≤ c(x,y) for all edges:

The static maximum flow problem is to maximize the variable v subject to the above flow constraints. To simplify the notation, let (X,Y)); X ⊂ N, Y ⊂ N denote a set of all edges from nodes x ∈ X to nodes y ∈ Y. For any function g: E → R, let g(X,Y) denote the sum of values of the function over the set of edges:

It is easily shown that

so that for disjoint sets Y and Z

In particular, B(x),x)=(N,x) and (x,A(x))=(x,N) so that

A cut C in G = [N;E] separating the source s and the sink t is a set of edges (X,N\X) such that s ∈ X and t ∈ N\X. The capacity of a cut (X,N\X) is c(X,N\X). A simple example below

shows a set of edges C = {(s,y), (x,y), (x,t)} with X = {s,x} and N\X ={y,t}. The set C is the cut separating s and t, and its capacity is c(s,y) + c(x,y) + c(x,t) = 3 + 1 + 3 = 7.

Lemma 1 [Ford, Fulkerson;1956]: Let f be a flow from s to t in a network [N; E], and let f have value v. If (X, N\X) is a cut, separating s and t, then

f(X, N\X) − f(N\X, X) ≤ c(X, N\X )

Proof: Since f is a flow, then f(s,N) − f(N,s) = v; f(x,N) − f(N,x) = 0 for all x ∈ N\{s,t}, and f(t,N) − f(N,t) = −v. Let us sum these equations over x ∈ X. Since s ∈ X and t ∈ N\X, the result is

Because of the obvious relationship

the above equality yields

thus verifying the equality in Lemma. Since f(N\X, X) ≥ 0 and f(X, N\X) ≤ c(X, N\X) due to the flow constraints, the inequality in Lemma 1 follows immediately.

The equality in Lemma 1 states that the value of a flow from s to t is equal to the net flow across any cut separating s and t. The fundamental result concerning the maximal network flow is given by
MAX-FLOW MIN-CUT Theorem [Ford, Fulkerson; 1956]: For any network the maximum flow value from s to t is equal to the minimal cut capacity of all cuts separating s and t.

Return to the local table of contents

Return to the global table of contents

Ford-Fulkerson max-flow algorithm

Let a flow augmenting path with respect to a flow f be defined as a path from s to t such that f < c on forward edges of the path and f > 0 on reverse edges of the path. The following corollary is of fundamental importance in searching for the maximal network flows:
Corollary 1 [Ford, Fulkerson; 1956]: A flow f is maximal if and only if there is no flow augmenting path with respect to f.

This corollary states that in order to increase the value of a flow, it is sufficient to search for improvements of a very restricted kind. Let an edge (x,y) be called saturated with respect to a flow f if f(x,y) = c(x,y) and called flowless with respect to f if f(x,y) = 0. An edge being both saturated and flowless has zero capacity. A minimal cut is characterized in terms of these notions by
Corollary 2 [Ford, Fulkerson; 1956]: A cut (X, N\X) is minimal if and only if every maximal flow saturates all edges of (X, N\X) whereas all edges of (N\X, X) are flowless with respect to f.

Ford-Fulkerson labelling algorithm: Under mild restrictions on the capacity function, the proof of the max-flow min-cut theorem provides an algorithm for constructing a maximal flow and minimal cut in a network. To ensure termination, all the capacities should have integer values. Initialization is with the zero flow. Then a sequence of "labelings" is performed (Routine A below). Each labeling either results in a flow of higher value (Routine B below) or terminates with the conclusion that the present flow is maximal. Given an integral flow f, labels are assigned to nodes of the network according to Routine A. A node can be in one of three states: unlabeled(UL), labeled and scanned (LS), or labeled and unscanned (LUS). Each label has one of the forms (x⁺,ε) or (x⁻,ε) where x ∈ N and ε is a positive integer or ∞. Initially all nodes are UL.

Routine A: labeling

[Initial step:] Make the source LUS with the label (-,ε(s) = ∞ other nodes remain UL
[Iteration:] Repeat General step until
- [1:] either the sink t is LUS - then go to Routine B or
- [2:] no more labels can be assigned and the sink is UL - then terminate
  - [General step:] Select any LUS node x with a label (z^+/−, ε(x)).
    - [1:] Convert all UL nodes y such that f(x,y) < c(x,y) into the LU nodes with the labels (x⁺, ε(y) = min{ε(x), c(x,y) − f(x,y)}).
    - [2:] Convert now all UL nodes y such that f(y,x) > 0 into LUS nodes with the labels (x⁻, ε(y) = min{ε(x),f(y,x)}).
    - [3:] The node x is now LS.
  - Routine B: flow change: the sink t has been labelled (y^+/−, ε(t))
    - Start: If t is labelled (y⁺,ε(t)), replace f(y,t) by f(y,t) + ε(t), otherwise if t is labeled (y⁻,ε(t)), replace f(t,y) by f(t,y) − ε(t), and in either case go on to node y.
    - Repeat: In general, if y is labelled (x⁺,ε{y)\), replace f(x,y) by f(x,y) + ε(t), otherwise if y is labeled (x⁻, ε(y)), replace f(y,x) by f(y,x) − ε(t), and in either case go on to node x.
    - Stop the flow change when the source s is reached, discard the old labels, and go back to Routine A.

The labeling process searches systematically for a flow augmenting path from s to t (see Corollary 1). Information about the paths is carried along in the labels, so that if the sink is labelled, the resulting flow change along the path is readily made. On the other hand, if Routine A ends and the sink is not labelled, the flow is maximal and the set of edges leading from labelled (LUS,LS) nodes to unlabelled (UL) nodes is a minimal cut.

The labelling process is computationally efficient because once a node is labelled and scanned, it is ignored for the remainder of the process. Labelling a node x locates a path from s to x that can be the initial segment of a flow augmenting path. There may be many such paths from s to x, but it is sufficient to find one.

Computational complexity: Both the initial Ford-Fulkerson max-flow min-cut algorithm and the subsequent solutions of the same problem have polynomial time complexity (below, n = |N| is the size, or cardinality of the set of nodes N and m = |E| is the size of the set of edges E):

Algorithm Principle Complexity
Ford--Fulkerson, 1956 Finding flow augmenting paths O(nm²)
Dinic, 1970 Shortest augmenting paths in one step O(n²m)
in a dense graph: O(n³)
in a sparse graph: O(nm log(n))
Goldberg--Tarjan, 1985 Pushing a pre-flow O(nm log(n²/m))

Algorithm	Principle	Complexity
Ford--Fulkerson, 1956	Finding flow augmenting paths	O(nm²)
Dinic, 1970	Shortest augmenting paths in one step	O(n²m)
in a dense graph:	O(n³)
in a sparse graph:	O(nm log(n))
Goldberg--Tarjan, 1985	Pushing a pre-flow	O(nm log(n²/m))

Time complexity of the Ford-Fulkerson algorithm depends on how the flow augmenting paths are determined. The O(nm²) Edmonds-Karp algorithm implements the Ford-Fulkerson scheme by using the breadth-first search for the augmenting path being a shortest path from s to t in the residual graph (network) where each edge has unit length, or weight. For a given network G = [N; E] and a flow f, the residual network G_f induced by f consists of edges, E_f, that allow for increasing the flow. The additional flow permitted by the edge capacity is called the residual capacity of the edge:

r_f(u,v) = c(u,v) − f(u,v); (u,v) ∈ E. An edge (x,y) with the non-zero residual capacity is called a residual edge if r_f(x,y) > 0. The residual graph G_f = [N; E_f] for a pre-flow f is the graph with the same set of nodes N but with the set of only residual edges: E_f = {(x,y): (x,y) ∈ E; r_f(x,y) > 0}. The maximum increment of the flow along each augmenting path is called the residual capacity of the path and is equal to r_f(path p) = min_(u,v)∈p{r_f(u,v)}.

Return to the local table of contents

Return to the global table of contents

Goldberg-Tarjan push-label max-flow algorithm

Search strategies based on flow augmenting paths keep a flow at each step although the maximal flow is needed only at the very end. This is why these strategies turned to be less efficient than the subsequent ones based on a pre-flow notion coined by A. V. Karzanov in 1974. The Karzanov's pre-flow violates the flow conditions in that the in-flow and out-flow nodes may not be equal, the difference at node x being called the excess at x). If f is a pre-flow in a graph G = [N; E], then the excess e(x) at x is:

e(x) = f(B(x),x) − f(x,A(x)). The "push-label" search process pushes as much as possible into the graph, then trims the excess to zero. Until the very end (when the excess at each node goes eventually to zero), no flow exists in the graph. To describe the process, the residual capacity r_f(x,y) of an edge (x,y) is defined as: r_f(x,y) = c(x,y) − f(x,y) + f(y,x). The distance d_G(x,y) from a node x to the node y in G is defined as the minimum number of edges on a path from x to y in G. If there is no such path, d_G(x,y) = ∞. To estimate the distance from a node to s or t, a valid labelling d is defined as a function d: N → R^≥0, such that d(s) = n, d(t) = 0, and d(x) ≤ d(y) + 1 for every residual edge (x,y). If d(x) < n, then d(x) is a lower bound of the actual distance from x to t in the residual graph G_f, and if d(x) ≥ n, then d(x) − n is a lower bound on the actual distance to s in the residual graph (it can be proven that in the latter case t is not reachable from x in G_f). A node x is called active if x ∈ N\{s,t}, d(x) < ∞, and e(x) > 0 (the source and sink are never active). The algorithm below iteratively selects active nodes, tries to push as much as possible from them, and relabel them in order to convert a pre-flow into a maximal flow:

Initialization: Set d(s) = n; d(t) = 0; ∀_x∈N\{s,t} d(x) = 1. For every edge (s,x) ∈ E, set f(s,x) = c(s,x).
Push-label process: While there are active nodes, select an active node, x, and try to push more pre-flow towards the sink.
- Pushing is possible in the two situations:
  - Condition 1: (x,y) ∈ E, d(x) = d(y) + 1, and r_f(x,y) > 0, or
  - Condition 2: (y,x) ∈ E, d(x) = d(y) + 1, and r_f(x,y) > 1.
  In either situation, there is capacity to push Δ = min{e(x), r_f(x,y)} units of flow (that is, as much as the excess at x and the residual capacity of the edge (x,y) allow) from x to y and update the excesses and pre-flow values accordingly:
  
  f(x,y) ← f(x,y) + Δ f(y,x) ← f(y,x) − Δ
  e(x) ← e(x) − Δ e(y) ← e(y) + Δ
- Relabeling of x if nothing can be pushed more from it: d(x) ← min{d(y) + 1 : (x,y) ∈ A(x); r(x,y) > 0}
Termination: Once the push-label process is finished, the pre-flow f is a maximum flow.

Return to the local table of contents

Return to the global table of contents

Energy minimization via graph cuts: a binary case

Greig et al. (see References) were the first who have shown how thew Bayesian denoising (restoration) of binary images can be reformulated as a minimum cut problem in a certain network and solved exactly with the max-flow/min-cut technique such as the aforementioned Ford-Fulkerson algorithm. Let x = (x₁, …, x_n) and y = (y₁, …, y_n) be a hidden noiseless and a measured noisy image, respectively, such that the measurements y_i; i = 1,…,n, are conditionally independent given a noiseless image x. Each measured signal y_i has a known conditional probability density function p(y_i|x_i), depending on x only through x_i. Therefore,

The hidden noiseless images are modeled as samples of a Markov random field with pairwise interactions having the Gibbs probability distribution

where β_ii = 0 and β_ij = β_ji ≥ 0. The inequality β_ij = β_ji > 0 holds for neighbours in the lattice, e.g. for the nearest 4- or 8-neighbours. Apart from an additive constant, the posterior log-likelihood

is as follows:

where

is a log-likelihood ratio at pixel i. The Bayesian MAP estimate of the image maximises the posterior likelihood:

A capacitated network model for this optimisation problem contains n + 2 nodes, being a source s, a sink t, and the n pixels. If the log-likelihood ratio is positive λ_i > 0, there is a directed edge (s,i) from s to pixel i with capacity c(s,i) = λ_i; otherwise, there is a directed edge (i,t) from i to t with capacity c(i,t) = −λ_i. There is an undirected edge (i,j) between two pixels i and j with capacity c(i,j) = β_ij if these pixels are neighbours. Figure below exemplifies such a network model for restoring a binary image under a Markov-Gibbs prior model with the 4-neighbourhood interactions between pixels (here, signals y₁,…,y_n of an observed noisy image that result in positive, λ_i > 0, and non-positive, λ_i ≤ 0, log-likelihood ratios are shown by the black and white pixel nodes, respectively):

Any binary image x = (x₁,…,x_n) produces a cut (S, T) where S = {s} ∪ {i : x_i = 1} and T = {i : x_i = 0} ∪ {t} with the capacity

that can be represented as

The latter differs from −

by a term which does not depend on x. Therefore, maximizing

is equivalent to finding the minimum cut of the above network: in the MAP estimate pixels x_i = 1 if they are on the source side S of the minimum cut and x_i = 0 otherwise. Experiments conducted by Greig et al. (see References) have underscored the importance of an adequate prior distribution because its global properties very rapidly begin to dominate the likelihood contribution to the posterior distribution. At the same time, the stochastic optimization such as simulated annealing applied with a practicable "cooling" schedule does not necessarily yield a good approximation to the MAP estimate. According to the "annealing theorem" of S. Geman and D. Geman, the inverse logarithmic decrease with time of the temperature term in a Gibbs distribution that controls the annealing process guarantees that the MAP solution will be eventually reached, but unfortunately this may take the infinite time.

Accelerated solution: A ten-fold acceleration of the basic Ford-Falkerson algorithm is obtained for this network model by partitioning the input image into 2^K×2^K connected sub-images. The MAP estimate is calculated for each sub-image separately, then the sub-images are amalgamated to form a set of 2^K−1×2^K−1 larger sub-images. The MAP estimate is formed for each of them, and this process is continued until the MAP estimate of the complete image.

Return to the local table of contents

Return to the global table of contents

Energies being minimized by graph cuts

Theorem FD[Friedman, Drineas; 2005] (generally, it is considered as a part of combinatorial optimization folklore): Let x_i ∈ {0,1}; i = 1,…,n, and let

where L represents terms that are linear in x_i plus any constants, i.e.

. Then E can be minimized via graph cut techniques if and only if β_ij ≤ 0 for all i,j.

Proof: To prove the "if" direction, the energy can be rewritten as

where α_ij = −β_ij and the linear term L is altered. The minimum energy E(…) over the binary variables x_i is given by a minimum cut in a complete graph with n nodes and edge weights w_ij = α_ij. The cut splits the nodes with x_i = 0 from those with x_j = 1 because only the pair x_i = 1 and x_j = 0 adds the value α_ij to the energy. As follows from the combinatorial optimization theory, polynomial time min cut is possible if and only if w_ij ≥ 0, that implies β_ij ≤ 0.

For altered linear terms Λ = γ_ix_i + σ there is an edge (s,i) with the weight w_si = γ_i if γ_i ≥ 0 and an edge (i,t) with w_it = |γ_i| if γ_i < 0, so that all weights are non-negative.

The proof of "only if" is only slightly more complicated.

The MAP estimates based on the Markov-Gibbs posteriors with pairwise interactions result in the class of energy functions:

Each pairwise energy term has only four values and can be equivalently rewritten in terms of these four values as

Therefore, in accord with Theorem FD, such energies can be minimized with the graph min-cut techniques if and only if the regularity condition:

holds for all neighbouring pixel pairs i,j forming edges of the graph model.

More sophisticated sufficient conditions for energy minimization using the min-cut techniques exist also for energy functions with k-wise pixel interaction, k > 2. But natural generalisations of the minimum cut problem to the case of more than two terminals, such as a multiway cut and minimum k-cut, are NP-hard apart from a few special cases with very restrictive conditions on energy functions:

Multiway cut: Given a set of terminals S = {s₁,s₂,…,s_k}⊆N, a multiway cut is a set of edges whose removal disconnect the terminals from each other. The multiway cut problem is to find such a set with the minimum weight.
Minimum k-cut: A set of edges whose removal leaves k connected components is called a k-cut. The minimum k-cut problem asks for a minimum weight k-cut.

The problem of finding a minimum weight multiway cut is NP-hard for any fixed k ≥ 3 (the case k = 2 is the minimum s-t cut problem). The minimum k-cut problem has polynomial time complexity for fixed k and is NP-hard if k is one of input variables. Both the problems have approximation algorithms that guarantee the solution within factor 2−2/k from the exact optimum.

When the energy functions involve multiple labels per pixel, the pixel-wise stochastic global minimization with simulated annealing (SA) or the deterministic pixel-wise local minimization with the ``greedy" iterated conditional modes (ICM) algorithm typically produce very poor results. Even in the simplest case of binary labeling considered above, the simulated annealing and ICM algorithms converge to stable points such that are too far from the global minimum. Although simulated annealing provably converges to the global minimum of energy, this could be obtained only in exponential time; this is why no practical implementation can closely approach the goal.

The main drawback of the simulated annealing or ICM algorithms is that they are pixel-wise, i.e. each iteration changes only one label in a single pixel in accord with the neighbouring labels. Therefore each iteration results in an extremely small move in the space of possible labellings. Obviously, the convergence rate should become faster under larger moves that simultaneously change labels in a large number of pixels.

Return to the local table of contents

Return to the global table of contents

Approximate minimum multiway cut and k-cut

As was already mentioned, the minimum multiway cut problem of finding the minimum capacity set of edges whose removal separates a given set of k terminals is NP-hard. But it has a provably good approximate solution that can be obtained with the exact min-cut / max-flow techniques. Let an isolating cut for a terminal s_i in a given set S = {s₁,…,s_k} be defined as a subset of edges whose removal separates s_i from the rest of the terminals. Then the following algorithm gives an approximate solution to the multicut problem:

Algorithm AMC ("Approximate Multiway Cut") [Vazirani, 2003]

Step 1: For each i = 1,…,k, compute a maximum flow isolating cut C_i by identifying the terminals in S\{s_i} into a single node and finding a minimum cut separating this node from s_i.
Step 2: Discard the heaviest of these cuts, and output the union C of the rest.

Step 1 exploits k separate max-flow / min-cut computations. Because the removal of C from the graph disconnects every pair of terminals, it is a multiway cut.

Theorem [Vazirani, 2003]: Algorithm AMC guarantees a solution within a factor 2 − 2/k of the optimal solution.

Proof: Let C_opt be an optimal multiway cut in a graph G = [N; E]. Then C_opt can be considered as the union of k cuts as follows:

The removal of C_opt from G will create k connected components, each having one terminal; since C_opt is a minimum capacity multiway cut, no more than k components will be created.
Let C_opt,i be the cut separating the component containing s_i from the rest of the graph. Then C_opt = ∪_i=1,…,k C_opt,i.

Since each edge in C_opt is incident at two of these components, each edge will be in two of the cuts C_opt,i. Hence,

Because C_opt,i is an isolating cut for s_i and C_i is a minimum capacity isolating cut for s_i, it holds that c(C_i) ≤ c(C_opt,i). This already gives a factor 2 algorithm, by taking the union of all k cuts C_i. But since C is obtained by discarding the heaviest of the cuts C_i,

The minimum k-cut problem of finding a minimum-capacity set of edges whose removal leaves k connected components in a graph G has a natural factor 2 −2/k approximate solution:

starting from G, compute a minimum cut in each connected component and remove the lightest one;
repeat until there are k connected components.

This algorithm does guarantee a solution within the factor 2 −2/k of the optimal, however the proof is very complicated. There exist simpler algorithms achieving the same guaranteed approximation and having simpler proofs.

Return to the local table of contents

Return to the global table of contents

Large moves using min-cut / max-flow techniques

Two fast approximate algorithms for energy minimization developed by Boykov, Veksler, and Zabih (see References) improve the poor convergence of simulated annealing by replacing pixel-wise changes with specific large moves. The resulting process converges to a solution that is provably within a known factor of the global energy minimum.

The energy function to be minimized is

(E1)

where L = {1,2,…,L} is an arbitrary finite set of labels, N ⊂ {1,…,n}² denotes a set of neighbouring, or interacting pairs of pixels, V_i: L → R, where R denotes a set of real numbers, is a pixel-wise potential function specifying pixel-wise energies in a pixel i under different labels, and V_ij: L² → R is a pairwise potential function specifying pairwise interaction energies for different labels in a pair (i,j) of neighbours. The pixel-wise energies V_i(…) can be arbitrary, but the pairwise interaction energies have to be either semimetric, i.e. satisfy the constraints ∀_{α,β ∈ L} V_ij(α,α) = 0; V_ij(α,β) = V_ij(β,α) ≥ 0 or metric, i.e. satisfy the same constraints plus the triangle inequality ∀_{α,β,γ ∈ L} V_ij(α,β) ≤ V_ij(α,γ) + V_ij(γ,β) Each pixel labelling x with a finite set of indices L = {1,…,L} partitions the set of pixels R = {1,…,n} into L disjoint subsets R_λ = {i | i ∈ R; x_i = λ ∈ L} (some of them may be empty) i.e. creates a partition P = {R_λ : λ = 1,…,L}. Each change of a labelling x changes the corresponding partition P.

The approximate Boykov-Veksler-Zabih minimization algorithms work with any semimetric or metric V_ij by using large α-β-swap or α-expansion moves respectively. The conditionally optimal moves are found with a min-cut / max-flow technique.

The α-β-swap for an arbitrary pair of labels α,β∈L is a move from a partition P for a current labelling x to a new partition P′ for a new labelling x′ such that R_λ = R′_λ for any label λ ≠ α, β. In other words, this move changes only the labels α and β in their current region R_αβ = R_α ∪ R_β whereas all other labels in R\R_αβ remain fixed. In the general case, after an α-β-swap some pixels change their labels from α to β and some others -- from β to α. A special variant is when the label α is assigned to some pixels previously labelled β.

The α-expansion of an arbitrary label α is a move from a partition P for a current labelling x to a new partition P′ for a new labelling x′ such that R_α ⊂ R′_α and R\R′_α = ∪_{λ∈L;
λ ≠ α}R′_λ ⊂ R\R_α = ∪_{λ∈L;
λ ≠ α}R_λ. In other words, after this move any subset of pixels can change their labels to α.

Energy minimization algorithms: The SA and ICM algorithms use standard pixel-wise relaxation moves changing one label each time. Such a move is both α-β-swap and α-expansion, so that these latter generalize the standard relaxation scheme. The algorithms based on these generalizations are sketched below.

Swap algorithm for semimetric interaction potentials

1. Initialization: An arbitrary labelling x.
2. Iterative minimization: For every pair of labels (α,β) ∈ L² taken in a fixed or random order,
- 2.1 Find = arg min_{one α-β-swap of x} E(x) with a min-cut/max-flow technique.
- 2.2 If E() < E(x), then accept the lower-energy labelling: x ←
3. Stopping rule: If a new labelling has been accepted for at least one pair of labels at Step 2.1, continue the minimisation process by returning to Step 2; otherwise terminate the process and output the final labelling x.

Expansion algorithm for metric interaction potentials

1. Initialization: An arbitrary labelling x.
2. Iterative minimization: For every label α ∈ L taken in a fixed or random order,
- 2.1 Find = arg min_{one α-expansion of x} E(x) with a min-cut/max-flow technique.
- 2.2 If E() < E(x), then accept the lower-energy labelling: x ← .
3. Stopping rule: If a new labelling has been accepted for at least one label at Step 2.1, continue the minimisation process by returning to Step 2; otherwise terminate the process and output the final labelling x.

An iteration at Step 2 performs L individual α-expansion moves in the expansion algorithm and L² individual α-β-swap moves in the swap algorithm. It is possible to prove that the minimisation terminates in a finite number of iterations being of order of the image size n. Actually, image segmentation and stereo reconstruction experiments conducted by Boykov, Kolmogorov, Veksler, and Zabih (see References) have shown that these algorithms converge to the local energy minimum just in a few iterations.

Given a current labelling x (partition P) and a pair of labels (α,β) or a label α, the swap or expansion moves, respectively, at Step 2.1 in the above algorithms use the min-cut / max-flow optimization techique to find a better labelling . This labelling minimises the energy over all labellings within one α-β-swap (the swap algorithm) or one α-expansion (the expansion algorithm) of x and corresponds to a minimum cut of a specific graph having O(n) nodes associated with pixels. The swap and expansion graphs are different, and the exact number of their pixels, their topology, and edge weights vary from step to step in accord with the current partition.

Swap algorithm: finding the optimal move: Figure below exemplifies a graph G_αβ = [N_αβ; E_{αβ] for finding the optimal swap move for a set of pixels R_αβ = R_α ∪ R_β with the labels α and β:

The graph is built only on the pixels i ∈ R_αβ having the labels α and β in the partition P corresponding to the current labelling x. The set of nodes N_αβ includes the two terminals, denoted α and β, and all the pixels in R_αβ. Each pixel i ∈ R_αβ is connected to the terminals α and β by edges t_α,i and t_β,i, respectively, called t-links (terminal links). Each pair of the nodes (i,j) ⊂ R_αβ which are neighbours, i.e. (i,j) ∈ E, is connected with an edge e_i,j called n-link (neighbour link). Therefore, the set of edges E_αβ consists of the t- and n-links.

If the edges have the following weights:

then each cut C on G_αβ must include exactly one t-link for any pixel i ∈ R_αβ; otherwise either there would be a path between the terminals if both the links are included, or a proper subset of C would become a cut if both the links are excluded. Therefore, any cut C provides a natural labelling x_C such that every pixel i ∈ R_αβ is labelled with α or β if the cut C separates i from the terminal α or β, respectively, and the other pixels keep their initial labels:

as illustrated below:

A cut C on G_αβ for two pixels i,j ∈ N connected by an n-link e_i,j (dashed edges are cut by C).

Each labelling x_C corresponding to a cut C on the graph G_αβ is one α-β-swap away from the initial labelling x.

Because a cut separates a subset of pixels in R_αβ associated with one terminal from a complementary subset of pixels associated with another terminal, it includes (i.e. severs in the graph) an n-link e_i,j between the neighbouring pixels in R_αβ if and only if the pixels i and j are connected to different terminals under this cut:

By taking into account that the function V_ij(x_C,i, x_C,j) is a semimetric and by considering possible cuts involving t-links of and n-link between i and j and the corresponding labellings, it is possible to prove

Theorem BVZ-T1 [Boykov, Veksler, Zabih, 2001]: There is an one-to-one correspondence between cuts C on G_αβ and labellings x_C that are one α-β swap from x. The capacity of a cut C on G_αβ is c(C) = E(x_C) plus a constant where E(…) is the energy function in Eq.(E1).

Corollary BVZ-C1 [Boykov, Veksler, Zabih, 2001]: The lowest energy labelling within a single α-β-swap move from a current labelling x is the labelling = x^°_C corresponding to the minimum cut C^° on G_αβ.

Expansion algorithm: finding the optimal move: The set of nodes N_α of the graph G_α = [N_α; E_α] for finding an optimal expansion-move includes the two terminals, denoted α and α, all the pixels i ∈ R, and a set of auxiliary nodes for each pair of the neighbouring nodes (i,j) ∈ E such that have different labels x_i ≠ x_j in the current partition P. The auxiliary nodes are on the boundaries between the partition sets R_λ; λ ∈ L. Thus the set of nodes is

A simple 1D graph G_α below gives an example of finding the optimal expansion move for the set of pixels in the image:

Here, the set of pixels is R = {i,j,k,l,m}, and the current partition is P = {R_α, R_β,R_γ} where R_α = {i}, R_β = {j,k}, and R_γ = {l,m}. Two auxiliary nodes a_i,j and a_k,l are added between the neighboring pixels with different labels in the current partition, i.e. at the boundaries of the subsets R_λ.

Each pixel i ∈ R is connected to the terminals α and α by t-links t_α,i and t_α,i, respectively. Each pair of the neighboring nodes (i,j) ⊂ N that are not separated in the current partition, i.e. have the same labels x_i = x_j in the current labelling, is connected with an n-link e_i,j. For each pair of the separated neighboring pixels (i,j) ∈ N such that x_i ≠ x_j, the introduced auxiliary node a_i,j results in three edges E_i,j = {e_{i, a_i,j}, e_{a_i,j ,j}, t _{α,a_i,j}} where the first pair of n-links e_… connects the pixels i and j to the auxiliary node a_i,j, and the t-link connects the auxiliary node a_i,j to the terminal α. Therefore, the set of all edges E_α is

The edges have the following weights:

That any cut C on G_α must include exactly one t-link for any pixel i ∈ R provides a natural labelling x_C corresponding to the cut :

as illustrated below:

Figure MC: A minimum cut C on G_α for two pixels i,j ∈ N such that x_i ≠ x_j (a ≡ a_i,j) is
an auxiliary node between the neighbouring pixels i and j; dashed edges are cut by C.

Each labelling x_C corresponding to a cut C on the graph G_α is one α-expansion away from the initial labelling x.

Because a cut separates a subset of pixels in R associated with one terminal from a complementary subset of pixels associated with another terminal, it severs an n-link e_i,j between the neighboring pixels (i,j) ∈ N if and only if the pixels i and j are connected to different terminals under this cut, or in formal terms:

(P1)

The triplet of edges E_i,j corresponding to a pair of neighboring pixels (i,j) ∈ N such that x_i ≠ x_j may be cut in different ways even when the pair of severed t-links at i and j are fixed. However, a minimum cut defines uniquely the edges to sever in E_i,j in these cases due to the minimality of the cut and the metric properties of the potentials associated with the edges {e_{i,a_i,j}, e_{a_i,j,j}, t_{α,a_i,j}} ∈ E_i,j. The triangle inequality suggests that it is always better to cut any one of them, rather than the other two together. This property of a minimum cut C illustrated in above Figure MC has the following formal representation: if (i,j) ∈ C and x_i ≠ x_j, then C satisfies the conditions

(P2)

These properties may hold for non-minimal cuts, too. If an elementary cut is defined as a cut satisfying the above conditions P1 and P2, then it is possible to prove

Theorem BVZ-T2 [Boykov, Veksler, Zabih, 2001]: Let a graph G_α be constructed as above given a labelling x and α. Then, there is an one-to-one correspondence between elementary cuts on G_α and labellings within one α-expansion from x. The capacity of any elementary cut C is c(C) = E(x_C) where E(…) is the energy of Eq.(E1).

Corollary BVZ-C2 [Boykov, Veksler, Zabih, 2001]: The lowest energy labelling within a single α-expansion move from x is the labelling = x_C° corresponding to the minimum cut C° on G_α.

Return to the local table of contents

Return to the global table of contents

Optimality of large moves

Although the swap move algorithm has a wider application area due to only semimetric requirements to potentials V_i,j(…), generally it possesses no proven optimality properties. But a local minimum obtained with the expansion move algorithm is within a fixed factor of the global minimum, according to

Theorem BVZ-T3 [Boykov, Veksler, Zabih, 2001]: Let be a labelling for a local energy minimum when the expansion moves are allowed, and let x* be the globally optimal solution. Then, where

Proof: Let some α ∈ L be fixed and let R*_α = {i ∈ R | x*_i = α}. Let x_α be a labelling within one α-expansion move from such that

Since

is a local minimum if expansion moves are allowed,
(E2)

Let S = S_pix ∪ S_pair be an union of an arbitrary subset S_pix of pixels in R; S_pix ⊆ R, and of an arbitrary subset S_pair of neighbouring pixels in N; S_pair ⊆ N. A restriction of the energy of labelling x to S is defined as

Let I*_α, B*_α, and O*_α denote the union of pixels and pairs of neighbouring pixels contained inside, on the boundary, and outside of R*_α, respectively:

The following three relationships hold:

The relationships (a) and (c) follow directly from the definitions of R*_α and x_α. The relationship (b) holds because V_i,j(x_α,i,x_α,j) ≤ cV(x*_i,y*_i) ≠ 0 for any (i,j) ∈ B*_α.

The union I*_α ∪ B*_α ∪ O_α includes all the pixels in R and all the neighbouring pairs of pixels in N. Therefore, Eq.(E2) can be rewritten as

By substituting the above relationships (a)-(c), one can obtain:

To get the bound on the total energy, this relationship has to be summed over all the labels α ∈ L:

(E3)

For every (i,j) ∈ B = ∪_α∈L B*_α, the term appears twice on the left side of Eq.(E3): once in for α = x*_i, and once in for α = x*_j. Similarly, every appears 2c times on the right side of Eq.(E3). Therefore, Eq.(E3) can be rewritten as

to give the bound of 2c for the factor of the global minimum.

Pictures below taken from the Middlebury Stereo Vision Page http://www.middlebury.edu/stereo (see: D.Scharstein and R.Szeliski, "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms", Int. J. Computer Vision, vol.47 (1/2/3), pp.7-42, April-June 2002) show that the graph-cut stereo algorithm notably outperforms both the dynamic programming stereo and the SSD-based stereo algorithms (this graph-cut algorithm of V.Kolmogorov and R.Zabih takes account of both matches and occlusions; see "Computing Visual Correspondence with Occlusions using Graph Cuts", Proc. 8^th IEEE Int. Conf. on Computer Vision, Vancouver, Canada, July 9-12, 2001, vol.2, pp.508-515, 2001):

Stereo pair "Tsukuba" True disparity map

SSD stereo (window 21×21) Dynamic programming stereo Graph-cut stereo
Grey-coded reconstructed disparity maps

Grey-coded signed disparity errors w.r.t. the true disparity map

Return to the local table of contents

Return to the global table of contents

f(x,y) ← f(x,y) + Δ	f(y,x) ← f(y,x) − Δ
e(x) ← e(x) − Δ	e(y) ← e(y) + Δ


Stereo pair "Tsukuba"		True disparity map

SSD stereo (window 21×21)	Dynamic programming stereo	Graph-cut stereo
Grey-coded reconstructed disparity maps

Grey-coded signed disparity errors w.r.t. the true disparity map

CONTENTS