## Feature Data Dimensionality Reduction 2331 Principal Component Analysis

This method is also known as Karhunen-Loeve method [38]. Component analysis seeks directions or axes in the feature space that provide an improved, lower dimensional representation of the full data space. The method chooses a dimensionality reducing linear projection that maximizes the scatter of all projected samples. Let us consider a set of M samples {x\, x, ...,xM} in an n-dimensional space. We also consider a linear transformation that maps the original space in a lower dimensional space (of dimension m,m < n). The new feature vectors y are defined in the following way:

where W is a matrix with orthonormal columns. The total scatter matrix ST is defined as where M is the number of samples and z is the mean vector of all samples. Applying the linear transformation WT, the scatter of the transformed feature vectors is WTSTW. PCA is defined as to maximize the determinant of the scatter of the transformed feature vectors:

where [wi | i = 1, 2,..., m} is the set of n-dimensional eigenvectors of ST corresponding to the m largest eigenvalues.

Therefore, PCA seeks the directions of maximum scatter of the input data, which correspond to the eigenvectors of the covariance matrix having the largest eigenvalues. The n-dimensional mean vector ¡x and the n x n covariance matrix £ are computed for the full dataset.

In summary, the eigenvectors and eigenvalues are computed and sorted in decreasing order. The k eigenvectors having the largest eigenvalues are chosen. With those vectors a n x m matrix Wopt is built. This transformation matrix defines an m-dimensional subspace. Therefore, the representation of the data onto this m-dimensional space is

PCA is a general method to find the directions of maximum scatter of the set of samples. This fact however does not ensure that such directions will be optimal for classification. In fact, it is well known that some specific distributions of the samples of the classes result in projection directions that deteriorate the discriminability of the data. This effect is shown in Fig. 2.15 in which the loss of information when projecting to the PCA direction clearly hinders the discrimination process. Note that both projections of the clusters on the PCA subspace overlap.

2.3.3.2 Fisher Linear Discriminant Analysis

0 0