Cluster Validation

As mentioned earlier, the optimum number of clusters for a given dataset is usually not known a priori. It is advantageous if this number can be determined

- white matter

- tumor

-♦- thalamus m hypometabolic m hypometabolic

Time (min)

Figure 3.8: Simulated noisy[18F]fluorodeoxyglucose (FDG) kinetics in different regions.

Time (min)

Figure 3.8: Simulated noisy[18F]fluorodeoxyglucose (FDG) kinetics in different regions.

based on the given dataset. In this study, a model-based approach was adopted to cluster validation based on two information-theoretic criteria, namely, Akaike information criterion (AIC) [101] and Schwarz criterion (SC) [102], assuming that the data can be modeled by an appropriate probability distribution function (e.g. Gaussian). Both criteria determine the optimal model order by penalizing the use of a model that has a greater number of clusters. Thus, the number of clusters that yields the lowest value for AIC and/or SC is selected as the optimum. The use of AIC and SC has some advantages compared to other heuristic approaches such as the "bootstrap" resampling technique which requires a large amount of stochastic computation. This model-based approach is relatively flexible in evaluating the goodness-of-fit and a change in the probability model of the data does not require any change in the formulation except the modeling assumptions. It is noted, however, that both criteria may not indicate the same model as the optimum [102].

The validity of clusters is also assessed visually and by thresholding the average mean squared error (MSE) across clusters, which is defined as

Both approaches are subjective but they can provide an insight into the "correct" number of clusters.

0 0

Post a comment