## Average Weighted Model AWM Classifier Combination

In our proposed approach, the expert decisions are modeled as a probability density function. From a linear opinion pool of R experts, assume that the rth segmentation expert provides an estimate of the a posteriori probability.

We assume that accompanying this pdf is a linear weight or mixing coefficient, p(r), indicating the contribution of the rth expert in the joint pdf, p(y | x, ©), resulting from the combination of experts. The vector © is the complete set of parameters describing the combined pdf. Hence, following the expert combination, the complete pdf can be written as given that the mixing coefficients satisfy the following constraints: J2r= 1 P(r) = 1 and 0 < p(r) < 1. If we treat the weighted contribution of each expert in the unconditional distribution as probabilities, then statistical models such as mixture of experts (MOE) framework [34] can be trained to learn the individual classifier and weight contribution distributions. For this we propose using the GMM using EM algorithm. We now present a method for identifying the weights in a probabilistic manner motivated by the MOE framework. Our proposed approach is, however, different to the conventional MOE method in two ways: (i) First, the a posteriori pdf from each segmentation expert remains fixed having been generated during segmentation; (ii) second, the mixing coefficients for

each expert, p(r), are determined in an unsupervised manner through statistical methods.

11.4.4.3.1 Maximum Likelihood Solution. The mixing coefficient parameter values for each expert can be determined using the ML principle by forming a likelihood function. Assume that we have the complete dataset, ^, of combined decisions from segmentation experts for each data point, where ^ = {y1,..., yN), and it is drawn independently from the complete distribution p(y | x, ©). Then the joint occurrence of the whole dataset is given as

p(f I ©) = n J2 p(r)p(yn I r, xn) = Z (©) (11.30)

For simplicity, the above likelihood function can be rewritten and expressed as a log likelihood as follows:

log Z (©) = log p(yn I ©) = J2 log£ p(r)p(yn I r, xn) (11.31)

For the above equation, it is not possible to find the ML estimate of the parameter values © directly because of the inability to solve d© = 0 [23]. Our approach used to maximising the likelihood log Z (©) is based on the EM algorithm proposed in the context of missing data estimation [35].

11.4.4.3.2 AWM Parameter Estimation Using EM Algorithm. The EM

algorithm attempts to maximize an estimate of the log likelihood that expresses the expected value of the complete data log likelihood conditional on the data points. By evaluating an auxiliary function, Q in the E-step, an estimate of the log likelihood can be iteratively maximized using a set of update equations in the M-step. Using the AWM likelihood function from Eq.(11.30) the auxiliary function for the AWM is defined as

Q(©new, ©old) = J2 J2 pold(r I yn) log(pnew(r)py I r, xn)) (11.32)

It should be noted that the a posteriori estimate py I r, xn) for the nth data point from the rth segmentation expert remains fixed. The conditional density function pold(r I yn) is computed using the Bayes rule as old. , p(yn I r, xn) p(r)

p (r I yyn) = =^r————(11.33) 1 p(yn I j, xn) p(j)

In order to maximize the estimate of the likelihood function given by the auxiliary function, update equations are required for the mixing coefficients. These can be obtained by differentiating with respect to the parameters set equal to zero. For the AWM, the update equations are taken from [27]. For the rth segmentation expert

The complete AWM algorithm is shown below. Algorithm 2: AWM Algorithm

2. Iterate: Perform E-step and M-step until the change in Q function, Eq. (11.31), between iterations is less than some convergence threshold AVMCOnverge = 25.

(b) Evaluate the Q function, the expectation of the log-likelihood of the complete training data samples given the observation, xn, and the current estimate of the parameters using Eq. (11.31).

4. EM M-step: This consists of maximising Q with respect to each parameter in turn:

1. The new estimate of the segmentation expert weightings for the rth component Pnew(r) is given by Eq.(11.33).

11.4.4.3.3 Estimating the A Posteriori Probability. Using the AWM combination strategy in mammographic CAD, a posteriori estimates are required for each data point following the experts' combination (one for the normal and one for the suspicious class). To determine these estimates, the AWM model is computed for the first class, thereby obtaining the a posteriori estimate p(yn = m1 | xn, ©). From this, the estimate of the second class is determined as p(yn = m2 | xn, ©) = 1 — p(yn = m1 | xn, ©). We now proceed to the results section to evaluate our novel contributions of weighted GMM segmentation experts and the novel AWM combination strategy.

0 0