Genetic Algorithm Approach

The genetic algorithm (GA) is another popular machine learning method with some type of biological paradigm that emulates Darwinian evolution by following the only the strongest survive strategy. The fundamental principle of GA is based on natural selection. To solve an optimization task in feature selection, GA usually involves the following five steps

(1) Initialization. The GA starts from a population of randomly selected chromosomes, which are represented by binary or gray-level digital strings. Each chromosome consists of a number of genes (bits in the string) and corresponds to a possible solution of the problem. For feature selection in medical image processing, a chromosome is typically represented by a binary coded feature string, with 1 indicating the presence of a gene (the feature is used in the classifier) and 0 indicating its absence (the feature is not used in the classifier).

(2) Evaluation. In this step a fitness function is applied to evaluate the fitness of all chromosomes in the population. The type of fitness function or criterion is determined by the specific applications. Since the main purpose of medical image processing is to improve the diagnostic accuracy, and ROC methodology has become a standard to evaluate the diagnostic accuracy, the areas under ROC curve (Az) are often used as a fitness criterion. Because the fitness function must be applied to assess the fitness level of each chromosome in the population, evaluation is typically the most difficult and computationally costly step.

(3) Selection. This step involves rewarding high fitness chromosomes (e.g., the chromosomes that generate high Az value for the classifier) and eliminating the low fitness ones. Thus, using different selection methods, such as roulette wheel selection, tournament selection, and elite selection, the chromosomes with better fitness levels can expand to take up a larger percentage of the population, while those with poor fitness levels decrease in numbers.

(4) Search. After the population has adjusted itself to take advantage of the higher fitness chromosomes, search operators are used to recombine or create a new generation of chromosomes. The two most popular operators in this step are crossover and mutation. The crossover exchanges genes between two chromosomes to produce two offspring in the new generation, and mutation injects random changes to selected genes (from 0 to 1 or vice versa in binary coded chromosomes).

(5) Termination. GA continually evolves until one of some terminating conditions is reached. These conditions occur when (1) GA has found a chromosome that yields a predetermined fitness value, (2) GA has reached a predetermined number of evolution generations, and (3) GA can not find better chromosomes in the new generations.

To write a GA program may require much effort. Fortunately, many free GA software packages are available for the research purpose. A research group at Carnegie Mellon University has collected many software packages of artificial intelligence and machine learning algorithms in the collection of Prime Time Freeware for AI [13]. Users also can find and download an updated GA program package from the World Wide Web site of this research group (http://www.cs.cmu.edu/ Groups/AI/html/air.html). Since GA can be applied to a variety of problems to generate solutions without knowing anything about the problem domain, the GA software package downloaded usually cannot be directly used in a specific application of medical image processing. The program needs to be tested and modified. The users should provide a fitness function and some initial parameters used in the GA for their specific application. For example, selection of the initial population of chromosomes is very important. The size of population should expand as much as possible, constrained by computer resources and time. One study demonstrated that when using an initial population size of 50, the GA could not find the optimal solution. Then, after the population size was increased to 100 while keeping all other conditions unchanged, the same GA algorithm did find the optimal solution [31]. Meanwhile, the diversity of initial population is also important, because the diversity can help find better starting points in the GA. For the better result, a small part of the population (e.g., less than one-third of initial chromosomes) may be assigned by the investigators based on their knowledge to the application problem, and the large part of population should be randomly assigned by the computer program. In general, the bigger population sizes create better opportunity for GA to find a close-to-optimal solution, and diversity in initial chromo somes also makes GA converge more quickly to its searching destination.

Although to a lesser degree than simpler search methods such as the ANN, GA is susceptible to the problems of the "hill-climbing" process. Namely, it does not guarantee finding the best feature subset (or global maximum) in a large multidimensional feature space. However, because GA starts searching from many different places in the feature space simultaneously and uses the only the strongest survive strategy, it is not easily trapped into local maxima. Compared to many other optimization methods (e.g., permutation and stepwise methods), GA is much more efficient in computation. GA has demonstrated the ability to find good (or close-to-optimal) solutions for a wide variety of applications. As a result, GA has also gained popularity in medical image processing for feature selection and system optimization problems. In the study reported by Sahiner et al., a database consisted of 168 biopsy-proven mammographic mass regions and 504 normal breast tissue regions. From each region, 572 texture features and 15 morphological features were initially extracted. Using 10 different partitions in the training and testing data, the GA selected on the average 20 features and yielded an average Az value of 0.90. If 20 features were randomly selected, the average Az value was 0.82 [13]. In another study aimed at improving performance of a computer-assisted diagnosis scheme for detecting microcalcification clusters in digitized mammo-grams, Anastasio et al. utilized GA for searching through a "parameter space" that consists of all possible combinations of parameter values. These parameters included not only the image-based features extracted from suspicious microcalcification clusters, but also the processing values used in the scheme, such as filter weights and threshold levels. Compared with the empirical selection of these parameter values, the study demonstrated that using the GA selection approach, the sensitivity of the detection scheme could increase from 80% to 87% at a false-positive rate of 1.0 per image based on a jackknife testing method involving 89 digitized mammograms [1].

Was this article helpful?

0 0
Essentials of Human Physiology

Essentials of Human Physiology

This ebook provides an introductory explanation of the workings of the human body, with an effort to draw connections between the body systems and explain their interdependencies. A framework for the book is homeostasis and how the body maintains balance within each system. This is intended as a first introduction to physiology for a college-level course.

Get My Free Ebook


Responses

Post a comment