Minimal free energy VQ forms the basis for the unsupervised segmentation approach presented in this chapter. However, in contrast to learning rule (8), the CVs are computed employing a "batch algorithm" that is explained in the following:

Let (f)x denote the expectation value of a random variable f i.e.,

with probability density p(x), and let

x denote the class-specific expectation values. Then (f )■ can be computed according to Bayes' rule

if we interpret the activations a;- (x) as the a posteriori probabilities p(j\x) (see (19)) for the assignment of feature vector x to the hidden layer neuron j, thus leading to p(i) = J p(j n x)dx = J p(x) p(j\x)dx = (aj(x))x (24)

as the average activation (so-called "load") of neuron j. The stationarity condition

(aj(x) (x - wj))x=0 of the learning rule (8) yields

i.e., the iteration of (8) until stationarity results in a fuzzy tesselation of a feature space

x ! x where the CVs Wj represent the class-specific averages (x) of the data distribution according to (23).

Equation (26) represents a batch version of minimal free energy VQ. The right side of Eq. (26) is influenced via the activations a;-(x) by the CV positions Wj. The procedure is iterated until convergence is achieved for the Wj. The batch version is well suited for implementation on parallel computers (see [2]).

3.2 Optimization of the p,

Once the virtual positions w, have been computed, one has to determine the widths p ■ of the receptive fields. With regard to various heuristic approaches to compute and optimize the p,, we refer to the literature, e.g., [1]. One can either define an individual p■ for every CV w, or a global p ■ for all CVs. A simple global method proposed by [17] is to choose the average nearest-neighbor distance of the CVs. In this chapter, we choose a global width p = p,, j e {1,..., N} for all CVs that is determined by optimizing the classification results of the GRBF network.

3.3 Optimization of the Sj: Supervised Learning

The output weights Sj (see Fig. 1) are collected in the matrix S := (/) e IR(m'N). They can be determined in two different ways:

• A first approach is a global training procedure. Given the training data set T = {(xy, yv e {1,..., p}}, the error function

can be minimized by a gradient descent method. Thus, the weight change Asij in a training step with a learning rate e results in

Assuming convergence of the procedure, i.e., As^- = 0, this yields

0 0

Post a comment