Adequacy of Training Samples

Minimizing training biases in the ANN or the BBN is another important issue in database selection. Because of the complex and noisy nature of medical images, there is no way to know whether a particular finite training database is sufficient for training the network. However, several studies have demonstrated that increasing the size of training database was an effective approach to minimize training bias in the networks [26,29]. One study systematically investigated the relationship between the size of training database and the performance of an ANN-based computer-assisted diagnosis scheme [29]. In this study, an ANN with 24 input neurons, 8 hidden neurons, and 1

output neuron was built and tested. The input neurons represent 24 features measured from topographic growth layers of suspicious mass regions in digitized mammograms. The database contained 368 positive mass regions and 1778 suspicious but negative regions. As a testing data set, 120 positive and 400 negative regions were randomly selected. The selected testing data set was used solely for testing purposes and was not used in any of training protocols. The remaining 248 positive and 1378 negative regions made up the database from which the appropriate number of positive and negative regions was randomly selected for each training experiment. At the completion of each training cycle with 1:1 ratio between positive and negative regions, the performance of the ANN was evaluated by using the same testing data set. With use of the ANN output as a summary index, the area under the receiver operating characteristic (ROC) curve (or Az value) [18] was computed for each testing at varying training sample size. Figure 4 indicates that the performance of the ANN on the testing data set continued to improve when the number of training regions increased. The training bias, represented by the difference between the Az values of training and testing databases, monotonically decreases as the number of training samples increases from 60 to 496 as shown in Fig. 4. This study indicated the facts that many of the features extracted from medical images are continuous and span a wide range of values,

0 0

Post a comment