and can be interpreted as the assignment probabilities of a given feature vector x to each of the CVs j e {1,..., N} [21].

The CVs Wj mark local centers of the multidimensional probability distribution f (x). Thus, for the application to multispectral image segmentation, the CV Wj is the weighted average of all the gray-level feature vectors x^ belonging to cluster j with respect to a fuzzy tesselation of the feature space according to (16).

In contrast to SOMs, minimal free energy VQ

(i) can be described as a stochastic gradient descent on an explicitly given energy function (see (18)) [21],

(ii) preserves the probability density without distortion as discretization density, i.e. the number of CVs, grows toward infinity [8, 9], and, most important for practical applications,

(iii) allows for hierarchical data analysis on different scales of resolution [7].

Furthermore, the procedure can be monitored by various control parameters such as the free energy, entropy, and reconstruction error, which enable an easy detection of cluster splitting. Properties of resulting optimal codebooks have been thoroughly investigated [8] and allow for a self-control process of the VQ procedure with respect to theoretically proven conservation laws [7].

Figure 2 illustrates the application of hierarchical minimal free energy VQ clustering to a simple two-dimensional toy example.

As the "fuzzy range" p declines in the course of the annealing scenario, the VQ procedure passes several stages:

FIGURE 2 Two-dimensional toy example for minimal free energy VQ (from [13]). Decreasing p leads to repetitive cluster splitting, thus enabling data analysis on different scales of resolution.

(i) In the beginning of the VQ process (p—ro), all the assignment probabilities for any given feature vector x are equal, i.e., aj(x) = 1/N. This state is characterized by a single minimum of the free energy (18). All the CVs are located in the same position of the feature space, i.e., there is maximal "degeneracy" of the codebook with only one cluster representing the center of mass of the whole data set.

(ii) As the deterministic annealing procedure continues with decreasing p>0, phase transitions occur and large clusters split up into smaller ones representing increasingly smaller regions of the feature space. Correspondingly, the number m(p) of clusters increases (1 < m(p) < N) until cluster degeneracy is removed completely.

(iii) For p —»0, each feature vector x is solely attributed to the closest CV, and aj is given by the hard clustering cooperativity (9).

Was this article helpful?

## Post a comment