## Info

shadow symptoms—e.g., Lyme's disease shares symptoms with many other very different afflictions. This is the great advantage of fuzzy partitioning algorithms—it may be (notice these are algorithmic outputs, and algorithms are often wrong) precisely the noncrisply labeled patients that need the most attention (most extensive testing). Hardening each column of U2 and U3 with (7) in this example makes them identical to U1, and everything you worked so hard to learn by passing to a soft model is lost! Crisp partitions of data do not possess the information content needed to suggest fine details of infrastructure such as hybridization or mixing that are available in U2 and U3. Consequently, extract information of this kind before you harden U! In medical image segmentation, as soon as the partition of the image is hardened, some of the advantage that accrues by using a soft model is lost. The hope is that carrying soft uncertainty throughout the calculations up to just this point will realize a gain in overall accuracy.

So, how do we get soft partitions of data? Perhaps 80-90% of all soft partitioning algorithms used in medical image segmentation are based on, or derived from, the fuzzy c-means (FCM) model and algorithmic family . The c-means families are the best-known and most well-developed families of batch clustering models because they are least squares models. The optimization problem that defines the hard (H), fuzzy (F), and possibilistic (P) c-means (HCM, FCM and PCM, respectively) models is:

U6 Mhcn, Mfcn or Mpcn for HCM, FCM or PCM respectively;

V = (V1, V2, ..., Vc) 6 Wp; V 6 W is the ith point prototype; w = (W1, W2,..., wc )T; Wi 6 W+ is the ith penalty term (PCM);

m > 1 is the degree of fuzzification; (8e)

is the A- induced inner product distance between xk and v, . Weight vector w in (8d) is a fixed, user-specified vector of positive weights; it is not part of the variable set in minimization problem (8a). The HCM, FCM, and PCM clustering models are summarized in Table 2. Problem (8a) is well-defined for any distance function on W. The method chosen to approximate solutions of (8a) depends primarily on Dft. If Dft is differentiable in V (e.g., whenever Dft is an inner product norm), the most popular technique for solving (8a) is grouped coordinate descent (or alternating optimization (AO)).

Column 3 of Table 2 shows the first-order necessary conditions for U and V at local extrema of Jm required by each model. The second form, V, for v, in (9b), emphasizes that optimal HCM-AO prototypes are simply the mean vectors or centroids of the points in crisp cluster i, ni = | U(,) |, where U(,) is the ith row of U. Table 3 is a pseudocode listing that lets you implement all three of the c-means families in a single framework. There are a lot of details left out here about singularity, initialization, rules of thumb for the parameters, and especially, about how to pick c, the number of clusters to look for. This is crucial when FCM is used for image segmentation. If we had space, we would put a little numerical example here so you could see how these three algorithms work. For brevity, we skip this here, but point out that the three partitions U1, U2, and U3 in Table 1 are exactly the kinds of Us produced by HCM, FCM, and PCM as listed in Table 3. We will see images segmented with these algorithms later.

So, clustering algorithms deliver c-partitions U of X and can often, as with the c-means families, estimate other parameters such as the vectors we denoted by V = {v1 , v2, ..., vc }cSp in (8c). The vector vi is interpreted as a point prototype (centroid, cluster center, signature, exemplar, template, codevector, etc.) for the points associated with cluster i. At the end of clustering, the vectors comprising V will have algorithmically assigned labels, and these may or may not correspond to meaningful physical classes. For example, an MR image segmented by clustering ends up with regions and perhaps prototypes that are artificially colored as, say, "red," "green," etc., whereas the clinician needs for them to be called "bone," "gray matter,'' etc. This seemingly unimportant fact provides a convenient and important way to classify segmentation methods.

TABLE 2 Optimizing Jm (U, V, w) when Dik = \xk — v,\

Minimize

First-order necessary conditions for (U, V) when Dik = \xk — vi\A>0 V i, k

0 0