Evaluation

Three properties of the proposed method are analyzed: accuracy (agreement with the gold standard), reproducibility (repeatability), and robustness (degree of automation of the measurement without apparent failure). To evaluate

Figure 5.6: Four registration examples of four sequences of different image quality. Each figure shows the flow-mediated dilation frame where maximum vasodilation occurs (left half) registered to the reference B1 frame (right half).

accuracy, a number of expert manual measurements are analyzed and utilized as gold-standard measurements. As well, to get a pattern to compare our method with, accuracy and reproducibility of these manual measurements are also calculated.

We define %FMD as measurement unit for all the sequence values analyzed and represent the relative diameter of a given frame in the sequence to the mean diameter over phase B1 expressed as percentage.

Two statistical methods are used. Firstly, we used Bland-Altman plots [25], a classical method to define limits of agreement between two measurement techniques as indicated by d ± 1.96SD where d is the mean difference (bias) and SD is the standard deviation of the differences.

Secondly, we used analysis of variance to estimate the variability (repro-ducibility) of repeated measurements on every frame. We expressed these also as coefficient of variation (CV), obtained from the mean value (m%FMD) and the standard deviation (SDo%fmd) of the %FMD measurements as indicated

m%FMD

5.4.2.1 Manual Measurements

Manual measurements of arterial diameter were performed in 117 frames corresponding to four sequences of different image quality. Three experts assessed each frame twice in independent sessions. In each sequence several frames were measured: 1 out of 10 in phase B1 (frame number 1, 11... 61) and 1 out of 50 during the rest of the test (frame number 101, 151...). Depending on the duration of the sequence the total number of measured frames was between 28 and 30 per sequence.

Each diameter measurement was obtained by manually fitting a spline to the inner contour of each arterial wall. The diameter was defined as the average distance between both spline curves (see Appendix 5.8). The vasodilation measurements were obtained by dividing the manually obtained diameters by the average diameter over phase B1; the dilation was finally expressed as a percentage, getting %FMD values as defined before.

Gold-standard measurements were derived from these 117 frames. The grand-average of the six diameter measurements done by the three observers is considered the gold-standard arterial diameter estimate for each frame. The gold-standard dilation measurements are obtained by dividing these estimated diameters by the grand-average diameter over phase B1 for each sequence, to get %FMD values. Accuracy, reproducibility, and intra- and interobserver variability of manual measurements were analyzed:

(i) Accuracy. Figure 5.7 shows the Bland-Altman plots comparing the intersession average measurement for each observer and the gold-standard measurements. The biases and standard deviation of the differences for

Observer l Observer II Observer til

Observer l Observer II Observer til

Figure 5.7: Bland-Altman plots comparing the intersession measurement average versus the gold-standard measurements. The horizontal and vertical axes indicate the average %FMD and the difference %FMD, respectively.

VM-lUFMO WfïFMD UW.SFMD

Figure 5.7: Bland-Altman plots comparing the intersession measurement average versus the gold-standard measurements. The horizontal and vertical axes indicate the average %FMD and the difference %FMD, respectively.

Table 5.3: Accuracy of manual measurements

Obs I

Obs II

Obs III

Bias (%FMD)

-0.16

-0.18

0.34

SDa(±%FMD)

0.68

0.94

0.68

SDw(±%FMD)

0.74

1.14

1.41

SDc(±%FMD)

0.86

1.24

1.20

Bias and standard deviation of the differences (SDc), corrected for repeated measurements, between manual and gold-standard %FMD measurements. SDa and SDw stand for the SD of the differences of the intersession average and the within-observer variability.

Bias and standard deviation of the differences (SDc), corrected for repeated measurements, between manual and gold-standard %FMD measurements. SDa and SDw stand for the SD of the differences of the intersession average and the within-observer variability.

the three observers are given in Table 5.3. Standard deviations are corrected to take into account repeated measurements according to the method proposed by Bland and Altman [26].

(ii) Reproducibility. The CV of each group of six measurements is calculated for each one of the 117 manually measured frames. This CV is averaged for all the frames of each one of the four sequences, being considered the CV of the manual measurement for each sequence. These four values are averaged finally, obtaining an overall reproducibility value for manual measurements in our study. The results are shown in Table 5.4.

(iii) Inter- and intraobserver variability. Figure 5.8 shows Bland-Altman plots comparing both sessions of each observer. In order to estimate the overall inter- and intraobserver variability of manual measurements (with correction for repeated measurements) we carried out the procedure proposed by Bland and Altman in [27]. To this end, a two-way Analysis of Variance (ANOVA) with repeated measurements was performed

Table 5.4: Reproducibility of manual and automated measurements

CV (m ± SD)

Seq A

Seq B

Seq C

Seq D

Overall

Manual (%) Computerized (%)

0.95 ± 0.5 0.23 ± 0.1

1.20 ± 0.4 0.26 ± 0.1

0.71 ± 0.6 0.32 ± 0.3

1.35 ± 0.6 0.84 ± 0.4

1.04 ± 0.6 0.40 ± 0.3

Mean and SD of CV (%) measured with respect to %FMD value.

Mean and SD of CV (%) measured with respect to %FMD value.

Figure 5.8: Bland-Altman plots comparing the two manual sessions of dilation measurements of each observer. The horizontal and vertical axes indicate the average %FMD and the difference %FMD of the two sessions, respectively.

Mesn KFW Mean KFUD Mw KFMD

Figure 5.8: Bland-Altman plots comparing the two manual sessions of dilation measurements of each observer. The horizontal and vertical axes indicate the average %FMD and the difference %FMD of the two sessions, respectively.

using Analyse-it v 1.68 (Analyse-it Software Ltd, Leeds, UK). The two-way ANOVA was controlled by observer and measurement frame as fixed factors and by the session number as random factor (Table 5.5). From this analysis, the inter- and intraobserver within-frame %FMD standard deviations were 1.20% and 1.13%, respectively.

5.4.2.2 Computerized Measurements

The scaling factor in the direction normal to the vessel axis that relates each frame to the reference frame constitutes the vasodilation parameter output by the automatic method. As a consequence, the measurements are normalized to the arterial diameter of the reference frame. This normalization is different from that of the gold-standard dilation measurements, which, as described before, were normalized for each sequence to the grand-average diameter over phase B1. To make the computerized measurements comparable to the gold standard,

Table 5.5: A two-way ANOVA of manual measurements of %FMD

Source of variation

SSq

DOF

MSq

F

P

Frame

9329.1

116

80.423

62.82

<0.0001

Observer

19.4

2

9.708

7.58

0.0006

Observer x Frame

359.7

232

1.551

1.21

0.0529

Session

449.4

351

1.280

Total

10157.6

701

SSq: Sum of squares; DOF: degrees of freedom; MSq: mean squares; F: F of Snedecor; p: Snedecor test significance.

SSq: Sum of squares; DOF: degrees of freedom; MSq: mean squares; F: F of Snedecor; p: Snedecor test significance.

Table 5.6: Comparison between different similarity metrics

NMI

MI

GCC

JE

CC

SSD

Bias (%FMD) SD (± %FMD)

+0.05 1.05

+0.11 1.08

+0.25 2.02

-1.00 2.49

+1.03 2.55

+ 1.68 3.92

Bias and difference SD in the comparison between the gold standard measures and the automatic dilation obtained with different similarity measures. Values reported correspond to %FMD values. NMI: Normalized mutual information; MI: mutual information; GCC: gradient image cross correlation; JE: joint entropy; CC: cross correlation; SSD: sum of squared differences.

Bias and difference SD in the comparison between the gold standard measures and the automatic dilation obtained with different similarity measures. Values reported correspond to %FMD values. NMI: Normalized mutual information; MI: mutual information; GCC: gradient image cross correlation; JE: joint entropy; CC: cross correlation; SSD: sum of squared differences.

a new normalization of the former measurements is necessary. To this end, the values measured at each frame are divided by the average values over all measurements of phase B1, and are multiplied by a factor 100 to obtain %FMD values.

(i) Choosing a similarity measure. Several similarity measures traditionally used in image registration were compared to select the most appropriate one. Thus the four sequences where gold-standard measurements were available were processed using the six similarity measures introduced in Table 5.1. Finally gold-standard vasodilations were compared to the automated vasodilations computed using each registration measure. Table 5.6 indicates that NMI yields the most accurate estimates although the results are only marginally better than using MI. NMI is therefore the similarity measure selected.

(ii) Accuracy. Figure 5.9 shows a Bland-Altman plot comparing the automated versus the gold-standard measurements. The SD ofthe differences is 1.05%. The dilation curves obtained by the proposed method are superimposed to the gold-standard measurements in Fig. 5.10 where we also include the 95% confidence interval of the gold-standard measurements for comparison [26].

(iii) Robustness. The whole set of 195 sequences were processed with the proposed method (more than 280,000 frames). The overall result was ranked according to the ability to recover the clinically relevant infor-mationfromthe correspondingvasodilation curve. The results were classified as good, useful, and bad, depending on the amount and severity of the artifacts present in the curve. When, in the opinion of an expert

* .

B*as +

19G SD

\ ♦

♦ ♦

Buis

*

«

Bas -

1,96 SD

95 tM 105 110 115

Mean %FMD

95 tM 105 110 115

Mean %FMD

Figure 5.9: Bland-Altman plot comparing the automatic measurements (using normalized mutual information as similarity measure) versus the gold standard. The horizontal and vertical axes indicate the average %FMD and the difference %FMD of the automatic measurements and the gold-standard measurements, respectively.

Sequence A

Computern«! O GcW-stardarc

»

»«'ó0»»

0 200 400

600 800

1000 1200

Sequence C

Computern) O Gold-atandvd

0 200 400

Sequence B

Q Gdd-itrtvd

0 200 400 600

1000 1200 1

Sequence D

O Göd-standard

0 200 400

900 1000 1200 D 0 200

600 600 1000 1200 fl

Figure 5.10: FMD curves obtained by the proposed automated method (-) and by the gold-standard measurements (•). Error bars show the 95% confidence interval of the gold-standard measurements for comparison.

sonographer, there were no evident artifacts in the vasodilation curve, the result was scored as good (77.3% of sequences). Artifacts considered were, for instance, lack of convergence or unusual vasodilation evolution. A vasodilation curve was ranked as useful (5.2% of sequences) when artifacts appear only in the DI phase (Fig. 5.2), where no medical information is to be extracted, and therefore, it would still be possible to get clinical information from the other phases. When artifacts appeared in any of the other phases, from where clinical information should be derived, the result was ranked as bad (17.5% of sequences). The va-sodilation curves were extracted in a fully automatic fashion with the preprocessing of the reference frame as the only manual intervention from the operator.

(iv) Reproducibility. The four sequences with gold-standard measurements were analyzed with the automatic method in six independent runs. Each time a different reference frame was randomly chosen from within phase B1 and it was manually preprocessed (horizontal repositioning of the vessel and removal of extra luminal structures). The CV was computed using as a basis the six dilation measurements for each frame of each sequence. Subsequently, the mean CV in each sequence was obtained by averaging the CV values of the frames where manual measurements were also carried out. These four values are presented in Table 5.4.

0 0

Post a comment