Nishikawa et al. demonstrated that the accuracy of a computerassisted diagnosis scheme was inversely related to the percentage of difficult or subtle medical images in the database. In principle, when using a limited number of images to train an ANN or other classifiers used in medical image diagnosis, the classifiers can achieve any accuracy from 0 to 100% depending on the nature or difficulty of the database . For example, with only 10% change in the composition of the database, the sensitivity of an ANN-based computer-assisted diagnosis scheme for detecting microcalcification clusters in digitized mammograms dropped from 100% to 77% at a false-positive rate of 1.0 per image . As a result, the performance of two different machine learning classifiers trained and tested by different medical image databases is usually not comparable. To solve or minimize this problem, many avenues have been suggested by different researchers, such as establishing a
common database to test different schemes and using standard methods to measure case difficulty, such as size, contrast, and conspicuity of medical abnormalities . However, before a standard method to measure case characteristics is established and agreed upon in the field, the investigators should report the procedure of case selection and measurement protocols in a manner that allows others to reproduce their methodology. Without a detailed description of the case difficulty in the training and testing databases, the performance reported for a specific classifier used in medical image diagnosis may be meaningless to other researchers.
Was this article helpful?