Selforganizing Maps k Means and Correlation Coefficient Methods

Nonhierarchical clustering methods use various strategies to calculate the distance between all data points in multidimensional space. The results are then grouped in a number of user-defined clusters (Tamayo et al. 1999).


58 | 3 Oligo Arrays, Global Transcriptome Analysis Implications

1. Self organizing maps, k means, and correlation coefficient methods are best applied to large datasets such as probesets. Errors do not propagate as in hierarchical clustering, and the calculations are fast.

2. The user needs to define the number of nodes. This implies that the results are somewhat user-dependent and not totally objective. The gap statistic can be employed to provide a principled basis for selecting the number of nodes (Tibshirani et al. 2000).

3. Some techniques, such as self-organizing maps, use a random number generator to determine the order of the distance calculations. Different analysis runs on the same dataset may give slightly different answers. This reflects the true uncertainty in the dataset but can be disconcerting. This uncertainty or lack of robustness is corrected in more recent clustering methods (Bickel 2003).

0 0

Post a comment