Note. pj+ and p+j. represent the marginal proportions of Raters A and B. Agreement by chance is computed by the product of the row and column marginals.

Note. pj+ and p+j. represent the marginal proportions of Raters A and B. Agreement by chance is computed by the product of the row and column marginals.

Lee, 1985), whereas others have supported its use because it is an intuitive, very simple, and easy-to-calculate concept (Baer, 1977). As demonstrated by Suen and Lee (1985), applied behavior analyses often include extreme prevalence rates that lead to considerably inflated agreement rates. Consequently, the proportion agreement index seems to be inflated by chance in most applications.

To overcome this problem, Birkimer and Brown (1979) suggested three methods to test the significance of an observed proportion agreement index against the possible percentage agreement by chance. These methods are approximations of the conventional chi-square (%2) test (Hartmann & Gardner, 1979). This index will be presented later in this chapter. Kelly (1977) suggested another method for avoiding the problem of inflation by chance. He postulated that the prevalence of the critical symptom should exceed .20 and should be less than .80 to compute the proportion agreement index. In addition, the computed proportion agreement value should be .90 or higher to indicate an acceptable agreement. Unfortunately, there commonly is no prior knowledge about prevalence rates that would enable a theoretically founded application of the proportion agreement index. Nevertheless, it differs significantly from agreement by chance if (a) both conditions mentioned by Kelly are met and (b) there are more than 15 observations (Ary & Suen, 1985).

Hence, the proportion agreement index should only be used if these two conditions stated previously are met, but it is strongly recommended to test its significance by using the y} test. Nevertheless, the fact that the proportion agreement index cannot be easily compared between studies remains an unsolved problem. For example, agreement of p0= .70 with a prevalence of about .50 reflects much better interobserver agreement than agreement of p0 = .90 with a prevalence of .85.

Was this article helpful?

## Post a comment