## Accuracy and Robustness

Because all affinities (and consequently the segmentations) shown in the last section are based on seeds selected manually by a user, the practical usefulness and performance of the multiseeded fuzzy segmentation algorithm need to be experimentally evaluated, both for accuracy and for robustness.

The experiments used the top-left images of Figs. 12.3-12.7. We chose these images because they were based on mathematically defined objects to which we assigned gray values that were then corrupted by random noise and shading, and so the "correct" segmentations were known to us.

We then asked five users who were not familiar with the images to perform five series of segmentations, where each series consisted of the segmentation of each one of the five images presented in a random order. Since each of the five users performed five series of segmenting the five images, we had at our disposal 125 segmentations that were analyzed in a number of different ways.

First, we analyzed the segmentations concerning their accuracy. We used two reasonable ways of measuring the accuracy of the segmentations: in one we simply consider if the spel is assigned to the correct object, in the other we take into consideration the grade of membership as well. The point accuracy of a segmentation is defined as the number of spels correctly identified divided by the total number of spels multiplied by 100. The membership accuracy of a segmentation is defined as the sum of the grades of membership of all the spels which are correctly identified divided by the total sum of the grades of membership of all spels in the segmentation multiplied by 100.

The average and the standard deviation of the point accuracy for all 125 segmentations were 97.15 and 4.72, respectively, while the values for their membership accuracy were 97.70 and 3.82. These means and standard deviations are very similar. This is reassuring, since the definitions of both of the accuracies are somewhat ad hoc and so the fact that they yield similar results indicates that the reported figures of merit are not over-sensitive to the precise nature of the definition of accuracy. The slightly larger mean for the membership accuracy is due to the fact that misclassified spels tend to have smaller than average grade of membership values.

The average error (defined as "100 less point accuracy") over all segmentations is less than 3%, comparing quite favorably with the state of the art: in [6] the authors report that a "mean segmentation error rate as low as 6.0 percent was obtained."

The robustness of our procedure was defined based on the similarity of two segmentations. The point similarity of two segmentations is defined as the number of spels which are assigned to the same object in the two segmentations divided by the total number of spels multiplied by 100. The membership similarity of two segmentations is defined as the sum of the grades of memberships (in both segmentations) of all the spels which are assigned to the same object in the two segmentations divided by the total sum of the grades of membership (in both segmentations) of all the spels multiplied by 100. (For both these measures of similarity, identical segmentations will be given the value 100 and two segmentations in which every spel is assigned to a different object will be given the value 0.)

Since each user segmented each image five times, there are 10 possible ways of pairing these segmentations, so we had 50 pairs of segmentations per user and a total of 250 pairs of segmentations. Because the results for point and membership similarity were so similar for every user and image (for detailed information, see [29]) we decided to use only one of them, the point similarity, as our intrauser consistency measure. The results are quite satisfactory, with an average intra-user consistency of 96.88 and a 5.56 standard deviation.

In order to report on the consistency between users (interuser consistency) we selected, for each user and each image, the most typical segmentation by that user of that image. This is defined as that segmentation for which the sum of membership similarities between it and the other four segmentations by that user of that image is maximal. Thus, we obtained five segmentations for each image that were paired between them into 10 pairs, resulting into a total of 50 pairs of segmentations. The average and standard deviation of the interuser consistency (98.71 and 1.55, respectively) were even better than the intrauser consistency, mainly because the selection of the most typical segmentation for each user eliminated the influence of relatively bad segmentations.

Finally, we did some calculations of the sensitivity of our approach to M (the predetermined number of objects in the image). The distinction between the objects represented in the top right and bottom left images of Fig. 12.5 and between the objects represented in the bottom images of Fig. 12.7 is artificial; the nature of the regions assigned to these objects is the same. The question arises: if we merge these two objects into one do we get a similar 2-segmentation to what would be obtained by merging the seed points associated with the two objects into a single set of seed points and then applying our algorithm? (This is clearly a desirable robustness property of our approach.) The average and standard deviation of the point similarity under object merging for a total of 50 readings by our five users on the top-left images of Figs. 12.5 and 12.7 were 99.33 and 1.52, respectively.

0 0