## Fuzzy Clustering

A cluster or a class may be defined as a set of objects with similar characteristics (features) and different from other objects outside the group. When data are clustered, a condition is chosen for measuring the similarity of an object to that cluster. There are three types of clustering techniques: crisp, also known as hard or classical clustering, fuzzy, and probabilistic clustering [43, 48].

Crisp or classical clustering algorithms classify objects as part of a cluster or not part of a cluster. The object usually takes a hard value of either 0 or 1 with respect to a cluster or class. Cluster separation is very clear and no overlap is allowed. In real applications, however, and medical imaging in particular, there is overlap between the various data types and there are no crisp boundaries between classes.

Fuzzy clustering algorithms allow for data overlap because they use class membership values instead of binary assignments. These algorithms treats an object, e.g., an image pixel, as part of all clusters. The object is given a weight with respect to each cluster and the weight vector is unconstrained taking a value between 0 and 1. A weight greater than 0.5 indicates that the object is more likely to be similar to that cluster.

Probabilistic clustering is similar to fuzzy clustering with the exception of the weight vector being constrained in this case. Namely, the sum of all weights assigned to an object equals 1.

Various fuzzy clustering models are proposed in the literature. One widely used model is the fuzzy c-means (FCM) algorithm developed by Bezdek [48]. We have implemented FCM in several variations mainly for MRI brain tumor classification. Variations include unsupervised FCM [67, 68], FCM combined with knowledge based clustering [51], FCM combined with validity guided

(re)clustering [50], semisupervised FCM (ssFCM) algorithms [49], and supervised FCM algorithms [52, 53]. In supervised learning, FCM is trained under complete supervision, namely the algorithm is forced to learn every class correctly. In unsupervised learning, data are clustered according to their similarity, there is no forced training, and the algorithm is allowed to make its own classification decision. Semisupervised learning offers a middle path between the previous two. In this case, the user defines a degree of supervision that leaves some room for the algorithm to make its own decisions while not entirely unrestricted to do so. Semisupervised learning may offer advantages for the clustering of data with significant overlap. In the following paragraphs we will discuss few key theoretical aspects of the three learning approaches.

The FCM family of objective functions is defined as [47]

k=1 i=l where m e [1, to) is a weighing exponent on each fuzzy membership, U e Mfcn is a constrained fuzzy c-partition of the dataset X of n feature vectors Xj in c clusters, V = (vi,v2, ...,vc) are c-vector prototypes in the p-dimensional feature space Rp and A is any positive definite (p x p) matrix. U and V may minimize J only if

Note that U is a (c x n) matrix of the uik values and \\xk — Vi\\A = (X — Vi)T A(xk — vi). The steps followed in the implementation of the FCM algorithm are as follows [47]:

2. Choose parameters c (number of clusters), T, A, m > 1, and e > 0

3. Initialize U0 e Mfcn randomly

(iii) if error < e stop; else compute {v,t} from (2)

(iv) continue to the next t

In an unsupervised method, all n label vectors are unknown during initialization. Once a condition is satisfied and the parameter c is selected, the algorithm stops and each row in the matrix Ucorresponds to a cluster that may or may not have a label [47]. To avoid problems associated with nonlabeled clusters, a supervised FCM may be used. To change an unsupervised to a supervised method, expert human intervention is required and two things may be done:

1. The operator selects the number of clusters c during initialization.

2. The operator assigns labels to the classes at termination.

The problematic aspects of a supervised approach are the need for high-quality labeled data and the variability introduced by human intervention. An alternative to the supervised and unsupervised versions of FCM is the semisupervised method where a small set of the data is labeled but the majority of the data is unlabeled. In ssFCM, we let

,-y»1 /-v*1 /-y*1 »"Y*2 /"y*2 »"V*2 'V^ 'Y^ 'Y^

Xj , ¿2 ^m ' "<1, 2 ■■■■,•*' 1, -¿2' ■■■ ' nc labeled data ry- ry- ry-

-n ' 2 , ■■■, n-u unlabeled data denote partially labeled data; superscripts denote the class label and n is the number of samples having the same label vector in the partition matrix U. Using the labeled set of data we find the center of the clusters iteratively until the terminating condition is satisfied. The unlabeled data are then introduced for finding the cluster centers. This method is more stable as the centers are well defined from the labeled data used for training. The clusters are also well defined with the correct physical data labels.

The cluster centers for the labeled data are calculated as

where the subscript d denotes design or training data. Having the labels for the nd points, we need to update only the n columns in U by calculating:

Once the initial cluster centers {vij0} are calculated, cluster centers are recomputed using the unlabeled data as

^Ad x +^k=l(Ukt) Xk\ 1 < i < ct = 1, 2,...,T (4.5)

where the subscript u is now used to denote the unlabeled data contribution.

In practice, nd is much smaller than nu. For example, an abdominal CT image is usually 512 x 512 or 262,144 pixels. A pancreatic mass of about 4 cm in maximum diameter may cover approximately 12oo pixels in an image with a resolution of 1 mm/pixel while the pancreas itself may be up to 5000 pixels. Neighboring major organs and structures may be up to 20,000 pixels depending on the slice. A very small percentage of these pixels will be labeled in the ssFCM approach. For example for c = 4 a quarter of the pixels in each class per slice are labeled, which approximately equals to nd = 8000 and n = 254,144.

To reduce potential bias that may be introduced by large differences between nd and nu as well as between the difference tissue classes, one more modification of c-means is introduced that allows us to weigh the fewer labeled samples more heavily than their unlabeled counterparts. Furthermore, such weighing allows us to assign much larger weights to small clusters to better separate them from larger ones. This is done by introducing weights W = (w1; w2,..., wnd) in the Eq. (4.5) as

VI -i—<k= 1 ^ ik,tJ k —<k = 1 ^ ik,t^ k I -t ^ • ^ -in i,t — -=n-k d . ^n.. , ' 1 < i < c t — 1 2

2-^k—1 wk\wik,tJ xk + Z^k—1V ik,tJ k rn=1 wk (udk,t )m n=1(uuk,t )m

In general, W is a vector of positive real numbers and when wnd = 1 then ssFCM becomes standard unsupervised FCM [47].

The implementation of an ssFCM algorithm involves the following steps:

1. Given partially labeled data X = Xd U Xu, set the nd and nu parameters as nd = \Xd | and nu = |XU|;the c parameter is known and fixed by the training data.

4. Compute initial cluster centers using Eq. (4.3).

(iii) if error is < e stop; else compute v^t from Eq. (4.6)

(iv) continue to the next t

In ssFCM, centers of smaller clusters may be prevented from migrating toward larger clusters by giving high weights to the training data that correspond to the smaller clusters.

All FCM versions described previously have been implemented in our laboratory. Preliminary results for pancreatic cancer imaging have been obtained with the unsupervised FCM algorithm and these will be discussed in the following sections. For our pilot work, the number of clusters was set to 4 and FCM was applied in two stages. Specifically, in the first run, FCM was applied to the CT image after the external signals were removed to cluster all major organs in the same class as shown in Figs. 4.13 and 4.14 for the original slices of Figs. 4.6(a) and 4.7(a). In these figures, all major organs, including the pancreas, are grouped in the same class labeled as 1. Class 1 pixels were then remapped to the pixel values in the original CT image and FCM was applied to this selected

area for a second time using the same parameters. The output of the second run is shown in Figs. 4.15 and 4.16. Although not immediately evident in a grayscale representation, four pixel clusters were identified in these images. One of these clusters (encompassed by a white outline) corresponds to the pancreatic tumors indicated by arrows in Figs. 4.6(a) and 4.7(a) and the pancreas.

Several issues remain to be addressed in this FCM application including (a) what is the appropriate number of clusters to differentiate between tumor and nontumor pancreatic areas, (b) how many labeled data are needed for

training to reduce misclassification, (c) in what proportion the weights should be assigned for each cluster, (d) what is the optimum stopping criteria, e, (e) can we avoid the two-stage application by removing unwanted signals further or by reducing the region of interest on which to apply FCM, and (f) validation of the clustering results relative to ground truth files.

Finally, a semisupervised algorithm is currently applied for pancreatic tumor clustering. In this ssFCM, a small subset of feature vectors (or pixels) is labeled by a radiologist, expert in CT interpretation, and used as training information to define the clusters. The disadvantage of ssFCM is that it requires the expert's input through the selection of training pixels for the tissue classes in an image. On the other hand, ssFCM is less likely to be adversely affected by imperfect training data than its fully unsupervised counterpart and it may be more efficient, i.e., it could achieve better results than FCM for the same number of classes (4). From a practical perspective, and assuming that the results from this algorithm turn out to be more robust and consistent, it is not clinically impractical and may also be advantageous to have the expert initiate the ssFCM process during CT image review. Alternative methods under consideration is the optimization of FCM by combining it with a validity-guided clustering technique [50] or a knowledge-based system [54], or a genetic algorithm technique [55] that have shown potential for improvement. Finally, the imaging characteristics of the pancreatic tumors and nonimage, demographic information could be merged to guide cluster initialization and tissue classification. All these options are topics of future investigations.

## Post a comment