A wide range of MGRF models have been proposed [6,45,20,34,13,7,75,37,113] over the last several decades. Essentially, an MGRF model considers an image as a realisation of a Markov random field (MRF). A major advantage of MGRF models is that they allow to derive a global texture description by specifying local properties of textures. In this section, fundamental theories of MRFs, Gibbs distribution, and MGRF-based texture modelling are presented.
Consider a two dimensional lattice
. If each site
is associated with
a random variable
, one obtains a
random field,
[34]. A digital image
could be considered as a realisation or a random
sample of the random field
, i.e. a particular
assignment of a set of signal values to all sites of the lattice
[41].
Since the random variables in a random field are considered
statistically dependent, a joint probability distribution of random
variables for an image defines a probability model of
the image from the statistical viewpoint. The joint distribution is
denoted by
or simply
, representing the chance
to observe the image given the configuration
space
of all images associated with the random field.
A desired joint probability distribution for an image model should
yields mostly or almost zero probability for all the images except a
subset-of-interest around a given training image. Formally, the
distribution function is expected to be similar to -function
having its maximum mode on or in a close vicinity of the training
image.
If no prior knowledge is available, a probability model must
enumerate all possible labelling of the lattice in order to derive a
joint distribution. This is an intractable task, because the
configuration space
has combinatorial cardinality,
. If each site is assumed to be
independent of any others, i.e. an independent random
field (IRF), a probability model is specified by the joint
probability distribution,
.
The probability is uninformative and the resulting image
model is not particular useful.
Most real-life probability models assume certain local dependency
among random variables/image pixels. A conditional
probability of a pixel given all other pixels
,
denoted by
, is commonly exploited to
represent local dependency. In this case, a random field is
represented by a undirected graph
in which each graph
vertex represents a random variable while an edge represents
statistical dependency between two connected vertices. The absence
of an edge between two vertices in the graph implies that the
associated random variables are conditionally independent given all
other random variables.
Within such a graphical structure, a probability model for a random
field could be factorised into a normalised product of strictly
positive, real-valued potential functions defined on subsets
of connected random variables in
. Each subset
form a local neighbourhood system with a spatial
configuration, and the image corresponds to a global
configuration of all random variables. A potential function has no
probabilistic meaning other than representing constraints on the
local configuration on which it is defined [60]. This
graphical representation indicates that the probability distribution
of an image can be specified in terms of potential functions of
local neighbourhood, which reduces the complexity of a global image
modelling problem. This is the main idea of applying MRF models for
texture modelling.
MGRF-based texture models have been successfully applied into texture analysis [6,45,20,34,13,7,75,37,113], texture classification and segmentation [23,24,73], and texture synthesis [29,104,65,59]. In this section, definitions and general theories related to an MGRF are introduced.
Here, the first condition is known as the positivity of an MRF,
which is trivial. The second condition is known as Markov
property or Markovianity. The equivalence, between the
conditional probability
of the signal value
given all the other signal values
and the conditional probability
of signal value
given all signal values of the
neighbouring sites
, suggests that the pixel value at
a site in the image is only dependent of that of its neighbouring
sites.
The Markov property describes conditional dependence of image signals within a local neighbourhood system, which is proved to be useful in capturing and modelling usually highly localised and coherent texture features. Another appealing property of an MRF is that, by the Hammersley-Clifford theorem [42], an MRF can be characterised by a global Gibbs distribution.
In general, a Gibbs distribution takes the following form,
The partition function is also known as the normalising constant. Because it is defined over the entire configuration space, the partition function is intractable. This would arise difficulties in parameter estimation of the density function of a Gibbs distribution. A random field having a Gibbs distribution is called a Gibbs random field (GRF).
A Gibbs distribution is usually defined with respect to
cliques, i.e. proper subgraphs of a neighbourhood graph on
the lattice. A clique is a particular spatial configuration of
pixels, in which all its members are statistically dependent of each
other. All the statistical links between pixels in an image
constitute a neighbourhood graph or system denoted by
.
Cliques and the neighbourhood system model spatial geometry of pixel
interactions, which are particularly useful in texture modelling.
All spatial configurations of cliques related to the 8-nearest
neighbourhood system are shown in Fig 5.2. Formally, a
clique is defined as follows,
The number of pixels in a clique specifies the statistical order of
a Gibbs distribution and the related texture model . For texture
modelling, first-order and second-order cliques are frequently used,
because higher-order cliques causes too much computational
complexity. First-order and second-order cliques over the image
lattice
are denoted as by
and
, respectively:
Statistical dependence between pixels within a clique is also
referred to pixel interaction. Let a non-constant function
denote the strength of the pixel
interaction in a clique
. The function
is also known
as clique potential. The energy function
in
Eq. (5.2.2) could be factorised into the sum of clique
potentials
over the set
, so that
Hammersley-Clifford Theorem asserts that a random field is a Markov random field if and only if the corresponding joint probability distribution is a Gibbs distribution [42]. The theorem, also known as MRF-Gibbs distribution equivalence, has been proved by Grimmett [41], Besag [6], and Gemans [34] respectively.
The theorem establishes the equivalence between an MRF and a GRF,
which provides a factorisation of a global probability distribution
in terms of local interaction functions with respect to
cliques. Particularly, the joint probability
and the
conditional probability
can specify each
other.
By the Hammersley-Clifford Theorem, the conditional probability of
an MRF
can be specified in following
form [34], in terms of clique potential,
.
![]() |
(5.2.6) |
Similarly, the clique potential of the Gibbs distribution can
also be derived from the conditional probability
of an MRF but in a more complicated form. An MRF is uniquely
related to a normalised potential, called the canonical
potential [64].
In summary, Markov-Gibbs equivalence provides a theoretical basis for texture modelling based on MGRFs, which allows to employ the global characteristics in modelling while the local characteristics in processing a field [34].
The parametric form of the Gibbs distribution in
Eq. (5.2.2) with parameters is given as
follows:
Model identification is usually formulated as a global optimisation problem that maximises either the entropy or the likelihood of probability distributions.
If domain knowledge and observed statistics are represented as a set of constraints, Maximum entropy (MaxEnt) principle could be applied in estimating parameters of model distributions.
The MaxEnt principle states that a desired prolixity distribution should be assigned by maximising the entropy while subject to a set of constraints [52,60]. The entropy of a probability distribution measures uncertainty [92], which is denoted by:
The MaxEnt principle is formulated as an optimisation problem that
maximises ,
This constrained optimisation problem can be solved by maximising following Lagrangian,
The resulting MaxEnt distribution is as follows,
Similar forms of the probability distributions in Eqs. (5.2.7) and (5.2.11) suggest that the Gibbs distribution of an MGRF belongs to the exponential families and is an MaxEnt distribution. Actually, all distributions belonging to so-called exponential families [2] possess the MaxEnt property. In particular, a generic MGRF model is specified by an MaxEnt distribution constrained by the training sufficient statistics, i.e. co-occurrence frequency distributions.
The MaxEnt principle suggests an alternative approach to texture modelling. One can first select sufficient statistics to characterise textures and then infer the MaxEnt distribution subject to the constraints that the marginal probability of sufficient statistics with respect to the model should match the empirical probability of the same statistics in the training data. As introduced later, both the generic MGRF [37,36] and FRAME [113] models use this strategy to derive their MaxEnt distributions. But the former derivation refers to the exponential families, while the latter approach does not mention the exponential families at all and derives their sufficient features from the scratch.
Maximum likelihood estimation (MLE) is one of the most commonly used procedure for estimating unknown parameters of a distribution. MLE is based on likelihood principle which suggests model parameters should maximising the probability of occurrence of the observation, e.g., a training image, given the configuration space.
Given the parametric form of a Gibbs distribution
in Eq. (5.2.7), the likelihood function for
is given by,
If
is differentiable, the parameters
for the MLE, given a training image
,
is computed by solving
However, due to the intractable normalisation constant
in Eq. (5.2.13), numerical techniques such as
Markov Chain Monte Carlo (MCMC) algorithms have to be used for
estimating
. As shown in [4], the MaxEnt
method is equivalent to the MLE in estimating distributions in
exponential families and therefore Gibbs distributions.
An MCMC algorithm allows to simulate a probability distribution by
constructing a Markov chain with the desired distribution as its
stationary distribution. A Markov chain, defined over a set of
sequential states, is an one-dimensional case of an MRF. Each state
of the chain relates to a particular assignment of all involves
random variables on which the stationary distribution is defined.
The transition from a state to
is controlled by the
transition probability,
, indicating
Markov property of the chain,
The MCMC algorithm iteratively updates the Markov chain based on the transition probability from a state to another state. Eventually, the chain attains the state of equilibrium when the joint probability distribution for the current state approaches the stationary distribution. The parameters that leads to the stationary distribution are considered as the model parameters learnt for the particular training image.
In the case of texture modelling, each state of an Markov chain relates to an image as a random sample drawn from the joint distribution. The image obtained at the state of equilibrium is expected to resemble the training image. In other words, parameter estimation via MCMC algorithms simulates the generative process of a texture and therefore synthesises textures.
Each state of a Markov chain is obtained by sampling a probability distribution. Among various sampling techniques, Metropolis algorithm [77] and Gibbs sampler [34] are two most well-known ones. Metropolis algorithm provides a mechanism to explore the entire configuration space by random walk. At each step, the algorithm performs a random modification to the current state to obtain a new state. The new state is either accepted or rejected with a probability computed based on energy change in the transition. The states generated by the algorithm form a Markov chain. Algorithm 1 outlines a basic Metropolis algorithm.
Gibbs sampler is a special case of the Metropolis algorithm, which
generates new states by using univariate conditional
probabilities [64]. Because direct sampling from the complex
joint distribution of all random variables is difficult, Gibbs
sampler instead simulate random variables one by one from the
univariate conditional distribution. A univariate conditional
distribution involves only one random variable conditioned on the
rest variables having fixed values, which is usually in a simple
mathematical form. For instance, the conditional probability
for an MRF is a univariate conditional
distribution. A typical Gibbs sampling algorithm [34] is
outlined in Algorithm 2.
In summary, an MCMC method learns a probability distribution
by simulating the distribution via a Markov chain. The
Markov chain is specified by three probabilities, an initial
probability
, the transition probability
, and the stationary probability
. The MCMC process is specified by the following equation,
Auto-models [6] are the simplest MGRF models in texture analysis, which are defined over rather simple neighbourhood systems with ``contextual constraints on two labels'' [64]. The Gibbs energy function for an auto-model only involves first- and second-order cliques, having the following general form [25],
There are several variations of auto-models, such as auto-binomial model [20] and auto-normal model [13,18], each of which assumes slightly different models of pair-wise pixel interaction.
Hence, the conditional distribution is given by,
![]() |
(5.2.17) |
By Markov-Gibbs equivalence, the Gibbs distribution is specified by the energy function,
![]() |
(5.2.18) |
In an auto-binomial model, model parameters are the sets
and
. The set
has a
fixed size, but the size of the set
is related to
the order of the selected neighbourhood system. To simplify
parameter estimation, one could reduce the number of parameters, for
instance, by assuming a uniform value
for all the sites
and using only the simplest neighbourhood system, i.e 4- or
8-nearest neighbours.
![]() |
(5.2.19) |
Here,
is the conditional mean
and
is the conditional variance (standard deviation).
By Markov-Gibbs equivalence, an auto-normal model is specified by the following joint probability density:
An auto-normal model defines a continuous Gaussian MRF. The model
parameters,
,
, and
, can be
estimated using either MLE or pseudo-MLE methods [7]. In
addition, a technique representing the model in the spatial
frequency domain [13] has also been proposed for
parameter estimation.
FRAME (short for Filters, Random Fields, and Maximum Entropy) [113] is an MGRF model constructed from the empirical marginal distributions of filter responses based on the MaxEnt principle. The FRAME model is derived based on Theorem. 5.2.1, which asserts that the joint probability distribution of an image is decomposable into the empirical marginal distributions of related filter responses. The proof of the theorem is given in [113].
The theorem suggests that the joint probability distribution
could be inferred by constructing a probability
whose marginal distributions
match
with the same distributions of
. Although, theoretically,
an infinite number of empirical distributions (filters) are involved
in the decomposition of a joint distribution, the FRAME model
assumes that only a relative small number of important filters (a
filter bank),
, are
sufficient to the model distribution
. Within an MaxEnt
framework, by Eq. (5.2.8), the modelling is formulated as
the following optimisation problem,
![]() |
(5.2.21) |
subject to constraints:
In Eq. (5.2.22),
denotes the
marginal distribution of
with respect to the filter
at location
, and by definition,
The second constraint in Eq. (5.2.23) is the
normalising condition of the joint probability distribution
.
The MaxEnt distribution are found by maximising the
entropy using Lagrange multipliers,
![]() |
![]() |
![]() |
|
![]() |
|||
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
||
![]() |
![]() |
The discrete form of
is derived by the
following transformations,
Here, the vector of piecewise functions,
and
,
represent the histogram of filtered image
and the
Lagrange parameters, respectively.
As shown in Eq. (5.2.25), the FRAME model is specified by a
Gibbs distribution with the marginal empirical distribution
(histogram) of filter responses as its sufficient statistics. The
Lagrange multipliers are model parameters to estimate
for each particular texture. Typically, the model parameters are
learnt via stochastic approximation which updates parameter
estimates iteratively based on the following equation,