As many of the structural studies reviewed in the previous section illustrate, brain networks (like other biological networks) are neither completely random nor completely regular. instead their local and global structure exhibits significant departures from randomness. A key question concerns how these nonrandom features of brain structural connectivity relate to brain function or dynamics. A consideration of brain evolution may guide our answer. In the course of evolution, brain connectivity is one of the prime substrates, the gradual modification of which in an adaptive context contributes to enhanced fitness and survival. Biological structure/function relationship often become more comprehensible when viewed in the context of evolution, for example when we consider the structure and function of proteins, cellular organelles, or entire body plans. The evolutionary history of the primate and especially human brain may ultimately hold the key for understanding the structural basis of cognition (for a modern review of brain evolution, see Striedter, 2005).

As we approach the question of how structure determines function in the brain, we turn next to measures of brain dynamics based on functional connectivity. As outlined in other chapters of this volume (e.g. Jirsa and Breakspear, this volume) there are numerous approaches to quantifying or measuring brain dynamics. In this chapter, we will focus on measures that attempt to capture global aspects of functional connectivity, i.e. patterns of statistical dependence between often remote neural units or brain regions (Friston, 1993), building on the firm foundation offered by statistical information theory (Cover and Thomas, 1991; Papoulis, 1991). In its most general form, statistical dependence is expressed as an estimate of mutual information. Unlike correlation, which is a linear measure of association, mutual information captures all linear or nonlinear relationships between variables. While the mathematical definition of mutual information is quite straightforward, the actual derivation of valid estimates for entropy and mutual information for any given application can be challenging and is the subject of much ongoing research.

Mutual information between two units A and B is defined as the difference between the sum of their individual entropies and their joint entropy:

Note that MI(A, B) > 0 and MI(A, B) = MI(B, A). The mutual information MI(A,B) will be zero if no statistical relationship exists between A and B, i.e. if A and B behave statistically independently such that H(AB) = H(A) + H(B). The upper bound for mutual information between A and B is given by the lesser of the two entropies. Mutual information expresses the amount of information that the observation of one unit conveys about the other unit. Any reduction of the joint entropy H(AB) such that H(AB) < H(A) + H(B) indicates some degree of statistical dependence between A and B and will result in a positive value for the mutual information.

Mutual information has certain limitations. First, we note that the existence of positive mutual information between A and B does not express causal influences from A on B or vice versa. Hence, mutual information is informative in the context of functional connectivity, but does not allow (by itself) the inference of effective connectivity. Furthermore, in any real or simulated system, the estimation of mutual information critically depends on correct estimates for the individual and joint entropies, which in turn are often derived from their respective state probability distributions. As mentioned above, these estimates can be difficult to derive from small or sparse data sets such as those often encountered in neurobiological applications and, in many cases, additional statistical assumptions have to be made (e.g. Paninski, 2003; Pola et al., 2003).

In (4.1) A and B refer to individual variables representing individual neurons or brain regions. Mutual information can also be defined within larger systems. For example, let us consider a system X composed of n elements that is partitioned into two complementary subsets of elements. One subset consists of k elements and is denoted as Xk, while its complement contains the remaining n — k elements and is denoted as X — Xk. The mutual information between these two subsets is

While mutual information captures the degree of statistical dependence between two elements (or subsets), the integration I(X) of a system measures the total amount of statistical dependence among an arbitrarily large set of elements (Tononi et al., 1994). As the definition (4.3) illustrates, integration can be viewed as the multivariate generalization of mutual information. Considering a system X, composed of a set of elements {x} its integration I(X) is then defined as the difference between the sum of the entropies of the individual elements and their joint entropy:

Given this definition, integration essentially quantifies the divergence between the joint probability distribution of the system X and the product of the marginal distributions of the individual elements (Schneidman et al., 2003a;

McGill, 1954). This measure has also been called the multi-information, as it expresses the total information shared by at least two or -more elements. Integration (multi-information) differs from another multivariate informational measure called the co-information (Bell, 2003), which captures only the information shared by all elements. Venn diagrams illustrating the relationship between mutual information, integration and co-information are shown in Fig. 4. Similar to mutual information, integration may be viewed as the amount of error one makes given the assumption of independence between all variables. Note further that, like mutual information, I(X) > 0. If all elements are statistically independent their joint entropy is exactly equal to the sum of the

element's individual entropies and I(X) = 0. Any amount of statistical dependence between the elements will express itself in a reduction of the element's joint entropy and thus in a positive value for I(X). As is the case for mutual information, an upper bound for I(X) can be calculated from the spectrum of the individual entropies. In summary, integration quantifies the total amount of statistical structure or statistical dependencies present within the system.

Given a system of size n, we can define integration not only for the entire system but also for all hierarchical levels k < n within it. We denote the average integration of subsets of size k as the hierarchical integration < I(Xfc) >, noting that under random sampling the average is taken over all k-out-of-n subsets. Thus, < I(X") >= I(X) and < I(Xx) >= 0. It can be proven that for any given system the spectrum of average integration for all values of k (1 < k < n) must increase monotonically, i.e. < I(Xfc+1) >>< I(Xfc) >. The difference between successive levels < I(Xfc+1) > — < I(Xfc) > increases and approaches a constant value, indicating that the amount of integration (statistical dependence) gained by adding further elements to the system approaches a limit. Intuitively, this characteristic of hierarchical integration reflects similar properties described for informational measures of population redundancy (Schneidman et al., 2003b; Puchalla et al., 2005).

The characterization of the spectrum of average integration across all levels of scale (subset size k) within a given system allows us to examine how and where statistical structure within the system is distributed. How is this possible? Let us say that we find that a system as a whole has a certain amount of statistical structure, measured by its integration I(X) > 0. This means that some statistical dependencies exist somewhere, at some spatial scale, within the system X. But the global estimate of I(X) does not provide information as to whether this structure is homogeneously distributed throughout the system, or whether this structure is localized or "concentrated" among specific units or subsets. If statistical dependencies are homogenously distributed, the system would be, in terms of its functional connectivity, totally undifferentiated, essentially presenting the same view to an observer zooming in on different levels of scale. We might say that such as system lacks any functional segregation. If statistical dependencies exist predominantly within subsets of specific size, there would be parts of the system that are more integrated than others and these integrated subsets would represent local structure. Such a system contains functional segregation in addition to the global functional integration expressed by I(X) at the highest level.

To differentiate between these possibilities, we need a measure that takes into account the full distribution of integration across levels of scale (Fig. 5). Such a measure, which captures the extent to which a system is both functionally segregated (small subsets of the system tend to behave independently) and functionally integrated (large subsets tend to behave coherently), was proposed by Tononi et al. (1994). This statistical measure, called neural complexity CN(X), takes into account the full spectrum of subsets and can be derived either from the ensemble average of integration for all subset sizes 1

Fig. 5. Complexity Cn(X) and C(X) for an example network of 32 units, generated by optimizing C(X) as described in Sporns et al., 2000a. (A) Structural and functional connectivity pattern. (B) Full spectrum of hierarchical integration (levels 1 to n), with neural complexity CN(X) corresponding to the shaded area. Inset at right shows a magnified part of the spectrum around levels 29 to 32, with C(X) corresponding to the difference between hierarchical integration profiles at level n — 1. Rightmost plots show an alternative way of plotting complexity emphasizing a maximal difference in hierarchical integration profiles at a specific level (top), and the accelerating increase in hierarchical integration between successive levels (bottom)

Fig. 5. Complexity Cn(X) and C(X) for an example network of 32 units, generated by optimizing C(X) as described in Sporns et al., 2000a. (A) Structural and functional connectivity pattern. (B) Full spectrum of hierarchical integration (levels 1 to n), with neural complexity CN(X) corresponding to the shaded area. Inset at right shows a magnified part of the spectrum around levels 29 to 32, with C(X) corresponding to the difference between hierarchical integration profiles at level n — 1. Rightmost plots show an alternative way of plotting complexity emphasizing a maximal difference in hierarchical integration profiles at a specific level (top), and the accelerating increase in hierarchical integration between successive levels (bottom)

to n, or (equivalently) from the ensemble average of the mutual information between subsets of a given size (ranging from 1 to n/2) and their complement. CN(X) is defined as:

As is evident from the second expression for CN(X), the complexity of a system is high when, on average, the mutual information between any subset of the system and its complement is high. The hierarchical nature of this measure of complexity spanning all levels of scale within the system is inherently well suited for a system such as the brain, which is characterized by modularity at several different levels, ranging from single neurons to brain regions. Thus, complexity is complementary to recent approaches that investigate brain dynamics in the context of a nested multilevel, multiscale architecture (Break-spear and Stam, 2005).

Another closely related but nonidentical measure of complexity expresses the portion of the entropy that is accounted for by the interactions among all the components of a system (Tononi et al., 1998; Tononi et al., 1999; Fig. 5). There are three mathematically equivalent expressions for this measure, called C(X):

H(xj|X - xj) denotes the conditional entropy of each element xj, given the entropy of the rest of the system X - xj. We note that CN (X) as well as C(X) are always greater or equal to zero. Both CN(X) as well as C(X) will be exactly zero for systems with zero integration (no statistical dependence at any level), and they will be small (but non-zero) for systems that have nonzero integration, but for which this integration is homogeneously distributed within the system.

While the third formulation of C(X) has a straightforward graphical interpretation (Fig. 5), the second formulation of C(X) is perhaps most useful to provide an intuitive computational basis for this measure. C(X) is obtained as the difference of two terms: the sum of the mutual information between each individual element and the rest of the system minus the total amount of integration. Thus, C(X) takes on large values if single elements are highly informative about the system to which they belong, while not being overly alike (as they would tend to be if their total integration, or total shared information, is high). CN(X) and C(X) are closely related (Fig. 5B), but not mathematically equivalent.

Within the context of applications of brain functional connectivity, it is essential to underscore that complexity captures the degree to which a neural system combines functional segregation and functional integration. Extensive computational explorations (Tononi et al., 1994; 1998; Sporns et al., 2000a; 2000b; Sporns and Tononi, 2002) have shown that complexity is high for systems that contain specialized elements capable of global (system-wide) interactions. On the other hand, complexity is low for random systems, or for systems that are highly uniform, corresponding to systems that lack either global integration or local specialization, respectively. The relation of connectivity topology and complexity has recently been analytically investigated (De Lucia et al., 2005).

Was this article helpful?

## Post a comment