Bayesian Networks

A Bayesian network (BN) model represents the joint distribution of a set of variables of interest. For example, over all possible joint states of the variables in E and H, a BN would represent P(E, H). From such a joint distribution, we can derive any probability of interest, such as P(H\E).

A BN has two parts: a structure and a set of parameters. The structure contains nodes, which represent model variables,2 as well as arcs, which connect nodes that are related probabilistically. The resulting graph is required to contain no directed cycles, meaning that it is not possible to start at some node and follow a path of arcs that leads back to that same node. The parents of a node (variable) Xi are those nodes that have arcs into Xi. A descendant of Xi is a node Xj for which there is a directed path from Xi to Xj. The following Markov condition specifies the independence that is expressed in a BN: A node is independent of its nondescendants, given just the state of its parents.

The Markov condition is the most important key to understanding Bayesian networks. It tells us that if we want to predict the value of some variable in the Bayesian network, and we already know the values of all of its parents, then no variable (except possibly some of our node's descendents) could possibly give us any useful predictive information beyond that supplied by its parents. The BN structure in Figure 18.1 contains five nodes. The node season could be modeled as having the values spring, summer, fall, and winter. Age represents a patient's age, perhaps discretized into ranges of years. The nodes influenza, cough, and fever could be modeled as being present or absent.

As an example of the BN Markov condition, we see that the structure in Figure 18.1 specifies that cough is independent of season given the state of influenza. If the designer of the network had decided that this independence assumption was unreasonable they could have added an arc from season to cough.

Someone who designs a Bayesian network first chooses the structure. But more work is needed after that. The network must be populated with numerical parameters. For each node Xi, there is a probability distribution P(Xi I parents(Xi)), where parents(Xi) are the parents of Xi in the BN. For example, we might have P(cough = present I influenza = present) = 0.90. If Xi contains no parents, then the probability P(Xi) figure 18.1 An example of a Bayesian network structure.

is specified. The BN Markov condition implies that we can factor the joint distribution of the variables in the model as follows (Neapolitan, 2004):

The fewer the number of parents per node, the fewer the number of parameters needed to specify each conditional probability P(Xi I parents(Xi)), and thus, the fewer the number of parameters in the model. Therefore, a BN with relatively fewer arcs will require relatively fewer model parameters. A BN can thereby represent parsimoniously the joint distribution over n variables.

If the arcs are interpreted as being direct causal relationships (relative to the variables being modeled) then the model is called a causal Bayesian network. For a causal BN, the Markov condition states (in part) that effects are independent of distant causes, given the state of all the proximal causes. For example, fever is independent of the distant influence of season and age, given that we know the state (present or absent) of influenza, which is the proximal cause of fever in the model in Figure 18.1.

Equation 1 specifies a complete joint probability distribution over all n variables in the BN model. From such a joint distribution, we can derive the probability of any subset of nodes conditioned of the state of another subset of nodes. Thus, for the example BN, we could derive P(influenza I cough = present, season = winter, fever = present). Note that information about the patient's age is missing in this conditional probability; in general, we need only condition on a subset of the variables in the model.

Researchers have developed exact BN inference algorithms that take advantage of the independence among the variables that follows from the BN Markov condition when some arcs are missing. These algorithms are often able to derive conditional probabilities relatively efficiently (Neapolitan, 2004). When exact inference would require too much computation time, approximate algorithms are available (Neapolitan, 2004).

2 We use the terms node and variable interchangeably.

0 0