Accuracy and Reproducibility

The accuracy is defined as the deviation between the registration obtained with the algorithm under test and the ground truth. The accuracy may be represented as mean and standard deviation of the differences in transformation parameters, or as an average vector displacement of points of interest (target registration error). Because, in general, ground truth data is unavailable, the reproducibility of algorithms is often tested instead, i.e., by restarting the automatic registration from random starting positions. In an extensive perturbation study in which artifacts were simulated in clinical CT scans, a number of artificial and natural artifacts (for instance, as a model for CT-MR matching) were introduced or suppressed in pairs of pelvic CT scans, and a perturbation study was used to determine reliability and accuracy in a well-known ground truth situation [36]. In this study, chamfer matching turned out to be extremely robust against missing data, low resolution, and poor segmentation of the images. In the presence of artifacts, minimization of the average distance outperformed minimization of the root-mean-square distance. Outliers in the scan from which the point list is obtained must be avoided. For example, rotation of the femurs reduces CT-CT registration accuracy by 1-2 mm. In other situations, the reproducibility is extremely good, with average vector displacements of less than 0.5 mm, i.e., the reproducibility is at subpixel level (the pixel size of the used scans was 1.6 mm, with an average slice distance of 4 mm).

An obvious way to test the accuracy of a registration

FIGURE 12 (a) Example of tumor regression. Two CT scans of the same patient with 3 months in between have been matched based on the skull. The outer enhancement ring is from the first scan, the inner ring is from the second scan. (b) Lung damage visible in a follow-up scan. (c) By overlaying the dose distribution (defined on the planning CT) on the follow-up scan, the given dose at the area of damage can be estimated. A complicating factor for this match was that the scans were made with the arms in a different orientation. Consequently, only the lung tops were used for matching. See also Plate 74.

FIGURE 12 (a) Example of tumor regression. Two CT scans of the same patient with 3 months in between have been matched based on the skull. The outer enhancement ring is from the first scan, the inner ring is from the second scan. (b) Lung damage visible in a follow-up scan. (c) By overlaying the dose distribution (defined on the planning CT) on the follow-up scan, the given dose at the area of damage can be estimated. A complicating factor for this match was that the scans were made with the arms in a different orientation. Consequently, only the lung tops were used for matching. See also Plate 74.

algorithm in the absence of ground truth data is to compare the results of two or more registration methods. When these registration algorithms are statistically independent, the differences, which are attributed the inaccuracy of both algorithms, can be used to estimate the inaccuracy of the individual algorithms. Using three independent registration results (e.g., one by the automatic system under test, and two by independent human observers), Gilhuijs et al. estimated the accuracy of 2D chamfer matching using analysis of variance [17]. In this study, the accuracy of the chamfer matching algorithm for registration of portal images was shown to be similar to a well-trained human observer (0.5 mm SD) for large anterior-posterior radiographs of the pelvis.

To give an impression of the accuracy of both matching procedures for CT-MRI registration, chamfer matching and volume matching have been compared for 12 brain cases and 14 pelvic cases from our image database. In all the cases, the matched MRI were proton density scans. The chamfer matching results had been used clinically for treatment planning and had been visually verified. For the head the differences are extremely small (Table 3). Subsequent visual verification of the volume matching results did not show conclusively that one method was better than the other. Assuming that both registration methods are independent, the errors in either methods are smaller that the listed numbers. The largest observed difference is a few millimeters. For the pelvis, very large differences were found, while visual scoring of the matches showed that the volume registration method did not perform very well. We blame the poor performance of volume registration for the pelvis on the influence of organ motion and shape changes on the registration. In these circumstances, it is a good idea to limit the registration algorithm to bony anatomy, i.e., use a technique such as chamfer matching that works on segmented features.

We have noted that the reliability of the registration depends strongly on the followed protocols for MRI and CT acquisition. For example, the reliability of the matching procedure is reduced if a scan does not cover the complete head.

Another method to test registration accuracy is by comparing (triangulating) registrations of a single CT scan with MR in different orientations in a "full circle." For example, CT is first matched on transverse MR, next transverse MR is matched independently on coronal MR, and finally coronal

MR is matched independently on CT. The product of the three transformations is the identity if all matching steps are perfect. Deviations from identity occur both due to random errors and due to some types of systematic errors. MR was registered on MR (to close the "circle") by minimization of RMS voxel value differences. This method provides an estimate of the registration accuracy on clinical data and can detect random errors and some sorts of systematic errors, such as errors induced by chemical shift in MRI. For this particular problem random errors were on the order of 0.5 mm SD, and a systematic shift of 1 mm was demonstrated due to chemical shift [37].

0 0

Post a comment