A generic MGRF model of spatially homogeneous textures is specified by the explicit spatial geometry (the characteristic neighbourhood) and quantitative strengths of statistical dependencies (clique potentials). Conventional model identification involves estimating both the characteristic neighbourhood and the Gibbs potential of each selected family, from a given texture. The process, in particular potential refinement via stochastic approximation, involves exponential time complexity [38].
In order to simplify the identification, a structural approach is proposed which focuses on identifying the geometric structure of texels (short for TEXture ELement [43]) and placement rules of their spatial arrangement. In the resulting texture description, a texture is constructed by a group of texels that repeats many times over the image plane by certain regular or stochastic spatial arrangement. This description is in line with structural approach to texture analysis [43], which suggests texels and their spatial organisation constitute a ``two-layered structure'' in a texture, specifying local and global properties, respectively,
In the proposed method, each texel is a micro geometric element, consisting of a group of image pixels (not necessarily continuous) with a certain spatial configuration. Each individual texel is distinguished by both the geometric structure and the combination of image signals over the structure. Usually, a texel involves a rather simple spatial configuration of only a small number of pixels. For example, Figure 6.4 shows a simple texel with hexagonal structure.
The structural identification of a generic MGRF model in Eq. (6.2.5) for a given training image involves the following steps:
|
|
|
An MBIM jointly depicts the partial energies of all clique families in a texture, from which a characteristic pixel neighbourhood can be estimated.
A generic MGRF model allows an arbitrary structure of pairwise pixel interactions, but the longest interaction to recover depends on the size of each individual training image in practice. Without loss of generality, Markov property is assumed such that all the neighbours of a pixel are limited into a large search window centred on the pixel. So, . In order to capture the periodicity structure of a texture, the search window must be large enough to cover at least a few repetitions. Empirically, the window size is chosen between and of the size of the training texture. The empirical rule accounts for the need of enough samples for each clique family to reliably collect GLCHs and estimate energies.
Let the width and the height of a search window be denoted and respectively. A clique family in a texture has the maximum displacement of and along and axes respectively, so that,
Formally, an MBIM is a 2D function, , which maps a search window onto real-valued partial energies . A two-step procedure is involved in computing the partial energy for each clique family. First, a GLCH matrix is collected for the family, by convolving the displacement vector with the image. Second, the partial energy is computed by Eq. (6.3.4) from the collected statistics. A generic MGRF model only counts on the relative frequency, , of co-occurrence of every two grey levels and , , so the resulting matrix has a dimensionality of . To have statistically meaningful estimates for small training images (e.g., ) and also due to computational restrictions, a texture is typically converted into a 16-level gray-scale image before further processing [38].
Since the relative energy is sufficient for ranking clique families, the first approximation of relative partial energy in Eq. (6.3.4) is used to approximate the `real' partial energy in forming an MBIM. The procedure of computing the MBIM is outlined by Algorithm 5.
An MBIM is symmetric about the origin , because a clique family has the same pixel interaction as the clique family . It also should be noted that there is no such a clique family with a zero displacement vector .
The major computation in constructing an MBIM comes from convolution operation in collecting the GLCH statistics for each clique family of the image. The process involves quadratic time complexity of , which depends on the sizes of both the searching window and the training image.
An MBIM can be represented graphically either by a 2D bitmap or on a 3D surface, in which each coordinate represents a clique family , and the corresponding scalar value represents the partial energy of that family. Figure 6.2 shows a few examples of 2D representation of MBIMs by grey-level images, with the partial energy encoded by intensity, i.e. dark intensities correspond to higher-energy clique families, light intensities correspond to lower-energy clique families. In Figs 6.3 and 6.4, the MBIMs are rendered on a 3D surface, where the values represent real-valued partial energies.
An MBIM reveals several important properties of the clique families in a texture, which helps to recover a characteristic neighbourhood and identify texels from a texture. Experimental comparisons have shown that the more accurate potential estimates in Appendix A produce quite similar MBIMs to those with the potentials of Eq. (6.3.3). For simplicity, below, only the latter estimates are used.
|
The characteristic neighbourhood for a generic MGRF model consists of a group of the most energetic clique families and represents most probably pixel interactions in a texture.
The simplest approach to estimating the structure is to select most energetic clique families by using a threshold of partial energy [37,36]. But the selection of threshold is largely heuristic and the statistical interplay between clique families have not been taken into account.
An alternative sequential approach in [107] selects each next characteristic clique family by comparing the training GLCHs to those for an image sampled from the MGRF with the currently estimated neighbourhood. Since each step involves image sampling (generation) and re-collection of the GLCHs, the process is very computationally complex, i.e. to find a neighbourhood structure , the time complexity is where is the expected number of steps, which, at least, theoretically grows exponentially with although it is limited to a few hundred steps empirically to generate a single MGRF sample using the MCMC process of pixel-wise stochastic relaxation.
The structural identification simplifies selection of the characteristic neighbourhood due to the observation that most energetic clique families form isolated clusters (or blobs) in the MBIMs shown in Figs 6.3 and 6.4. From the statistical point of view, the clique family having maximum energy in each cluster is related to locally the most probable pixel interaction. Pixel interactions represented by other clique families in the vicinity are also likely but with lower probability. This indicates variations of neighbourhood structure at different locations of an image. The size and shape of the clusters reflect the degree of local variations and could be used to measure texture homogeneity, i.e. a strict homogeneous texture should have a smaller cluster.
It is reasonable to use the local maximum of each cluster as a representative clique family for a local region. A two-step approach is proposed to find these local maxima. First, all the significant clusters are identified from the MBIM, and second, the clique family having the maximal partial energy in each cluster is added into the characteristic neighbourhood.
Most MBIMs have their corresponding histograms of partial energies as a unimodal distribution, i.e. each histogram is positively skewed and has a single peak at the lower energy end, as in Fig 6.6. This allows to apply a unimodal thresholding algorithm [87] to determine a threshold that separates high-energy clique families from the majority of low-energy ones. Consequently, clusters formed by high-energy families are isolated and can be easily identified by grouping connected clique families using the classic connected-component labelling algorithm [86]. A local maximum is then identified by as the clique family with maximal energy from each cluster.
|
The thresholding algorithm draws a straight line from the peak of the histogram to the last non-empty bin. The threshold is at the point on the histogram curve having the maximal distance to the straight line. The algorithm is illustrated in Fig 6.7.
For simplicity, the structural identification assumes a uniform geometric structure for all texels in a texture, i.e. the same pixel interactions are applicable at every location in the image. The geometric structure is statistically the most probable one, i.e. the central tendency if consider a distribution of all possible texel's structures. This simplification is implied by the assumption of translational-invariant pixel interaction in a generic MGRF model.
The geometric structure of texels in a texture is derived from the characteristic neighbourhood consisting of local maxima of clusters. Structures for regular and stochastic textures are specified in separate ways, due to the different nature of two texture classes.
Figures 6.3 and 6.4 schematically illustrate several texels for stochastic and regular textures, respectively, where each texture has a distinguishing geometric structure of texels corresponding to the spatial distribution of clusters in the MBIM.
The geometric structure can be used as a mask for sampling (retrieving) texels from a texture. For example, one can superimpose a mask at an arbitrary location on an image, the group of signals covered by the mask are related to a particular texel for the texture. In a same way, texels of the same structure but different signals are retrieved at different locations from the image. Since in our experiments most texels look like a bunch of image signals, this sampling method is also called in [40] bunch sampling.
|
Repetitions of various texels create periodicity, structure and symmetry in a texture. A special placement rule is derived to model spatial relationship between texels, i.e. how the image locations of multiple occurrences of a single texel or different texels are related in forming a texture pattern.
In a stochastic texture, texels appear with no explicit placement rules and have mostly weak spatial interrelations. As shown later, a synthetic texture of this class can be formed by sampling, repeating and randomly placing of the distinct training texels.
In contrast, a regular texture involves strict placement rules reflecting the underlying strong periodicity. Due to the assumption of translation-invariant pixel interaction, only translation symmetry is taken into account in estimating the placement rule for texels for textures in this class.
In a regular texture, there exists a underlying placement grid guiding the repetitions of texels. Each grid cell is a compact bounding parallelogram around a texel mask, specified by parameters , where and are guiding angles that give orientations of cell sides with respect to the image coordinate axes, and and are the side lengths, i.e. the mask's spans along the orientational directions. The bounding parallelogram could be decided using an invariant fitting algorithm, which is related to a canonical frame of the texel mask [103]. Figure 6.9 shows a hexagonal texel mask of texture D34, its bounding parallelogram and related parameters . In general, the grid with parallelogram cells represents any five possible types of translation lattice known in the theory of wallpaper groups for repeated patterns [89].
The placement grid is related to a coordinate system with non-orthogonal basis vectors, which leads to a hexagonal tessellation of the image plane [94]. Figure 6.10 shows a tessellation of texture D34 in line with a placement grid defined by the cell in Fig 6.9.
Figure 6.10 also shows that, with the tessellation, each texel is associated with a relative shift, , of its centre , with respect to the origin made coincident with the centre of the closest grid cell:
The placement rule is to repeat each training texel at arbitrary locations having the same relative shift with respect to the placement grid. Due to an infinite number of absolute image locations related to a same relative shift, the rule reflects translation symmetry of regular textures. It also suggests that an arbitrarily large texture can be synthesised by expanding the image lattice in line with the infinite placement grid.
The structural identification yields a texel-based description that characterises a texture by texels and placement rules of their arrangement. Recently, several similar works describing a texture by its primitive elements have been proposed. However, instead of `texel', the majority of the known works use an alternative term `texton', suggested by Julesz, to refer to small objects or characteristic regions that comprise a texture. Research shows that only the difference in textons or in their density can be detected pre-attentively by human early visual system [54].
Motivated by Julesz's texton theory, a few recent works [63,71,100,101,102] employ a filter-based spectral analysis technique to relate textons to the centres of clusters in filter responses of a stack of training images. Conceptually, each texton, as a feature descriptor in the spectral domain, represents a particular statistical spectral feature describing repetitive patterns of a texture in the spatial domain. All the textons form a global texton dictionary, or a feature space, allowing to characterise a texture by an empirical probability distribution of the textons, i.e. the frequency with which each texton in the dictionary occurs in the texture. A nearest-neighbour classifier with similarity metrics based on chi-square distance between texton distributions, can classify textures into different categories. But since only the occurrences of a texton are taken into account, spatial information about relationships between the textons is completely lost in the description.
A texton-based generative model of image in [112] contains local constructs at three levels: pixels, image bases, and textons. An image base is a group of pixels forming a micro geometric element like a circle or a line. In this case, a texton is actually a mini-template consisting of a number of image bases of some geometric and photometric configurations. In an image, these textons are meaningful objects like stars or birds that could be observed. The probability model with parameters is specified as follows:
The MLE of the model parameters or the estimates minimising the Kullback-Leibler divergence are learned using a data-driven MCMC algorithm. Due to the complex likelihood function, the experiments were limited in several aspects in order to keep the problem tractable: (i) only a small number of image bases in the global base map, e.g., a few Laplacian-of-Gaussian and Gabor base functions; (ii) the spatially independent textons for simplicity, and (iii) only very simple textures with a priori obvious image bases and textons.
In both above mentioned and most of other known texton-based approaches, spatial relationship among the textons is either neglected or otherwise too difficult to represent. In contrast, the proposed structure identification provides a much simpler but yet more complete texture description.