Diagnostic Aid Based on Human Perceptual Features

The first observer performance study of computer classification was done by Getty and colleagues [53]. As described earlier in this chapter, their mammogram reading and decision-making aid consisted of a checklist of 13 features that a radiologist must extract from the mammogram, and an LDA classifier. Their observer study consisted of six general radiologists who practice in community hospitals in the Boston area and who had read at least five mammograms a week in the preceding 5 years. Their study used 118 mammograms with biopsy-confirmed diagnoses: 58 cases were malignant and 60 were benign. The readers first read the 118 cases in the standard condition designed to simulate the usual clinical practice. For each case, the readers rated their confidence that the lesion localized for them on the mammogram was malignant using a 5-point scale (1 = definitely or almost definitely malignant; 2 = probably malignant; 3 = possible malignant; 4 = probably benign; and 5 = definitely or almost definitely benign). After this baseline reading, the readers went through a training session for using the computer aid and then read the same cases again, this time using the computer aid. The purpose of the training was first to help the readers get familiarized with the 13 features required for the computer analysis, second to practice rating the 13 features, and third to familiarize readers with the computer-estimated probability of malignancy. The readers were asked to rate the 13 features in absolute numbers (e.g., mass size, calcification number) or on a scale of 1 to 10. For example, for the spiculation feature, 1 represented definitely no spiculation and 10 represented definitely some spiculation. The training process consisted of a practice run on 44 separate cases. After every 15 cases, a feedback on the average response given by five mammography specialists on those same cases was given for comparison purposes. Subsequently in the enhanced condition, the readers first provided scaled rating values of the 13 features, then received the computer-estimated probability of malignancy, and finally rated their confidence that the lesion was malignant using the same 5-point scale.

The average Az of the six readers in the standard condition was 0.83, and it was 0.88 in the enhanced condition. Az value of the computer classifier alone was 0.86. Five readers performed at a higher Az value in the enhanced condition compared to the standard condition. The one reader who did not improve performed at very high Az values above the group average in both conditions. The overall improvement in Az from 0.83 to 0.88 was statistically significant.

In addition to the performance of the six general radiologists, the performance of five mammography specialists was also measured on the same set of mammograms. The specialists did not use the computer aid, however; their interpretation of the mammograms was used to obtain a baseline performance for purpose of comparison. Interestingly, the baseline performance of the specialists was an Az of 0.88, which is the same value achieved by the general radiologists who had their performance enhanced by using the computer aid. Therefore, Getty et al. concluded that the computer aid provided an enhancement of general radiologists' performance such that it was brought to the level of the specialists'.

0 0

Post a comment