Correlations between structure and random walk dynamics in directed complex networks

In this letter the authors discuss the relationship between structure and random walk dynamics in directed complex networks, with an emphasis on identifying whether a topological hub is also a dynamical hub. They establish the necessary conditions for networks to be topologically and dynamically fully correlated (e.g., word adjacency and airport networks), and show that in this case Zipf’s law is a consequence of the match between structure and dynamics. They also show that real-world neuronal networks and the world wide web are not fully correlated, implying that their more intensely connected nodes are not necessarily highly active.

directed edges with associated weights, which are represented in terms of the weight matrix W.
The N nodes in the network are numbered as i = 1, 2, …, N and a directed edge with weight M, extending from node j to node i, is represented as W(i, j) = M.No self-connections (loops) are considered.The in-and outstrength of a node i -abbreviated as os(i) and is(i), correspond to the sum of the weights of the in-and outbound connections, respectively.The stochastic matrix S for such a network is S(i, j) = W(i, j) / os(j) (1)   The matrix S is assumed irreducible, i.e. any of its nodes can be accessible from any other node, which allows the definition of a unique and stable steady state.An agent, placed at any initial node j, chooses among the adjacent outbound edges of node j with probability equal to S(i, j).
This step is repeated a large number of times T, and the frequency of visits to each node i is calculated as v(i) = (number of visits during the walk)/T.In the steady state (i.e. after a long time period T), v S v r r = and the frequency of visits to each node along the random walk may be calculated in terms of the eigenvector associated to the unit eigenvalue.For proper statistical normalization we set 1 ) . The dominant eigenvector of the stochastic matrix has theoretically and experimentally been verified to be remarkably similar to the corresponding eigenvector of the weight matrix, implying that the adopted random walk model shares several features with other types of dynamics, including linear and non-linear summation of activations and flow in networks.
With the frequency of visits to nodes (i.e. the 'activity' of the nodes) obtained as above, the correlation between activity and topology can be quantified in terms of the Pearson correlation coefficient, r.For full correlation, i.e. r = 1, two conditions must be jointly satisfied: (i) the network must be completely connected, i.e. S is irreducible, and (ii) for any node, the instrength must be equal to the outstrength.
The proof of the statement above is as follows.Because the network is fully connected, its stochastic matrix S has a unit eigenvector in the steady state, i.e. v S v r r = .As S(i, j) = W(i, j)/os(j), the i-th element of the vector → os S is given as Since, by hypothesis, ( ) for any i, both An implication of this derivation is that for perfectly correlated networks, the frequency of symbols produced by random walks will be equal to the outstrength/instrength distributions.
Therefore, an outstrength scale-free 3 network must produce sequences obeying Zipf's law 6 , and vice versa.If, on the other hand, the node distribution is Gaussian, the frequency of visits to nodes will also be a Gaussian function; that is to say, the distribution of nodes is replicated in the node activation.A fully correlated network will have r = 1.This is illustrated in Fig. 1a and b, which show r = 1 for a text by Darwin 7 and the network of airports in the USA 8 .Zipf's law is known to apply to the former type of networks 9 .established by the sequence of immediately adjacent words in the texts after the removal of stopwords 10 and lemmatization 11 .More specifically, the fact that word U has been followed by word V M times during the text is represented as W(V,U) = M.The airports network presents a link between two airports if there exists at least one flight between them.The number of flights performed in one month was used as the strength of the edges.
We obtained r for various real networks (Table 1), including the fully correlated networks mentioned above.To interpret these data, we recall that a small r means that a hub (large in-or outstrength) in topology is not necessarily a center of activity.Notably, in all cases considered r is greater for the in-than for the outstrength.This may be understood with a trivial example of a node from which a high number of links emerge (implying large outstrength) but which has only very few inbound links.This node, in a random walk model, will be rarely occupied and thus cannot be a center of activity, though it will strongly affect the rest of the network by sending activation to many other targets.Understanding why a hub in terms of instrength may fail to be very active is more subtle.Consider a central node receiving links from many other nodes arranged in a circle, i.e. the central node has a large instrength, but with the surrounding nodes possessing small instrength.In other words, if a node i receives several links from nodes with low activity, this node i will likewise be fairly inactive.In order to further analyze the latter case, we may examine the correlations between the frequency of visits to each node i and the cumulative hierarchical in-and outstrengths of that node.The hierarchical degrees [12][13][14] of a network node provide a natural extension of the traditional concept of node degree.The cumulative hierarchical outstrength of a node i at the hierarchical level h corresponds to the sum of the weights of the edges extending from the hierarchical level h to the subsequent level h+1, plus the outstrengths obtained from hierarchy 1 to h-1.Similarly, the cumulative instrength of a node i at hierarchical level h is the sum of the weights of the edges from hierarchical level h+1 to the previous level h, plus the instrengths obtained from hierarchy 1 to h-1.The traditional in-and outstrength are the cumulative hierarchical in-and outstrength at hierarchical level 1 (see Supplementary Methods for a more detailed definition of cumulative hierarchical degree).Because complex networks are often also small world structures, it suffices to consider hierarchies up to 2 or 3 edges.
For the least correlated network analyzed, viz.that of the largest connected cluster in the network of WWW links between the pages contained in the massey.ac.nz domain (Massey University -New Zealand) 15,16 -Fig.2, activity could not be related to instrength at any hierarchical level.
Because the Pearson coefficient corresponds to a single real value, it cannot adequately express the co-existence of the many relationships between activity and degrees present in this specific network, as well as possibly heterogeneous topologies.Very similar results were obtained for other WWW network, which indicates that the reasons why topological hubs have not been highly active cannot be identified at the present moment (see, however, discussion for higher correlated networks below).However, for the two neuronal structures of Table 1 that are not fully correlated, activity was shown to increase with the cumulative 1 st and 2 nd hierarchical instrengths, as illustrated in comparing Fig. 3a and 3b for the network defined by the interconnectivity between cortical regions of the cat 17 .Similar results were obtained for the network of synaptic connections in C.
Elegans 18 .In the cat cortical network, each cortical region is represented as a node, and the interconnections are reflected by the network edges.Significantly, in a previous paper 19 , it was shown that when connections between cortex and thalamus were included, the correlation between activity and outdegree increased significantly.This could be interpreted as a result of increased efficiency with the topological hubs becoming highly active.Because of the many factors that may cause hubs failing to be active, one may gain insights into the importance of cumulative hierarchical instrengths by considering the model shown in Fig. 4a.
Here, a star-shaped subnetwork (left-hand side of the figure) has been randomly attached, through P incoming edges, to a larger random network containing 50 nodes and 250 directed edges (righthand side of the figure).Note that the cumulative hierarchical indegree of node i at hierarchy 1 is equal to M (number of incoming neighbors), and its cumulative hierarchical indegree at hierarchy 2 is equal to M + P. We quantified the activity at node i in terms of several values of P for 100 different realizations (i.e.varying the target of the interconnections), obtaining the graph shown in Fig. 4b.The graph points to two regimes: an brief initial portion observed for small values of P, indicating strong positive correlation between the activity and cumulative hierarchical indegree, followed by a region of saturation, where the correlation is greatly diminished.The region with small correlation grows when the number of connections P is increased relatively to the number of nodes in the larger network.This construction involving two attached networks has also been considered for studying the effect of varying levels of clustering coefficients of the reference node i.The results, also obtained after 100 realizations of each case, are shown in the This model shows that a node linked to a set of highly connected nodes with large cumulative hierarchical instrengths, is likely to be more active than another node with the same instrength (cumulative 1 st hierarchical instrength) but lower cumulative 2 nd hierarchical instrength, as indicated by the results of the neuronal networks already discussed.Nevertheless, a more thorough analysis of such dependencies should consider the activation flow between nodes.
Furthermore, for the fully correlated networks, such as word associations obtained for texts by Darwin 7 and Wodehouse 20 , activity increased basically with the square of the cumulative 2 nd hierarchical instrength (see Supplementary Fig. 2).In addition, the correlations obtained for these two authors are markedly distinct, as the Wodehouse work is characterized by substantially steeper increase of frequency of visits for large instrength values (see Supplementary Fig. 3).
Therefore, the results considering higher cumulative hierarchical degrees may serve as a feature for authorship identification.
In conclusion, we have established a set of conditions for full correlation between topological and dynamical features of complex networks, and demonstrated that Zipf´s law can be naturally derived for fully correlated networks.In the cases where the network is not fully correlated, the Pearson coefficient may be used as a characterizing parameter.For a network with very small correlation, such as the WWW links between the pages in a New Zealand domain analyzed here, the reasons for hubs failing to be active could not be identified, probably because of the substantially higher complexity and heterogeneity of this network, including varying levels of clustering coefficients, as compared to the neuronal networks.For the latter, it was possible to verify that the correlation between activity and topology is enhanced if higher cumulative hierarchical instrengths were taken into account.With an artificially-constructed network, we demonstrated this should indeed be expected and found that higher clustering coefficients contribute to decreasing activity.

Cumulative Hierarchical Degrees
The concepts of cumulative hierarchical in-and outdegree are defined for a given node i taking into consideration its hierarchical level.The cumulative 1 st hierarchical indegree is the number of incoming edges extending from the immediate neighbors of node i into that node, i.e. it corresponds to the traditional indegree of node i.These neighbors constitute the 1 st hierarchical level (see Supplementary Fig. 1).This number of connections between the hierarchical levels 0 and 1 plus those between levels 1 and 2 define the cumulative 2 nd level hierarchical indegree.
More generally, the cumulative hierarchical indegree of node i at hierarchy h is equal to the number of directed edges extending from the nodes at hierarchy h+1 to the nodes at hierarchical level h, plus the indegrees obtained from hierarchy 1 to h-1.The cumulative hierarchical outdegrees are calculated similarly, but considering opposite edge directions.The cumulative hierarchical in-and outstrenghts are analogously defined, taking into account the weights of the edges (instead of only the number of edges).
correlation between frequency of visits and corresponding node strengths.

Fig. 1 -
Fig. 1 -The activity (frequency of visits) of each node in terms of its instrength (also called cumulative 1 st hierarchical instrength, as explained below) for the networks obtained from Darwin's text (a) and American airports (b).The word association network was obtained by representing each distinct word as a node while the edges were

Fig. 2 -
Fig. 2 -Analysis of the correlation between activity and node instrength for the WWW links between New Zealand domains.No clear correlation exists with respect to the cumulative 1 st hierarchical in-(a) or outstrengths (b).Such a result is a consequence of the great intricacy of this large network, which involves several correlation substructures, which cannot be expressed into the single Pearson correlation coefficient.

Fig. 3 -
Fig. 3 -The correlations between the frequency of visits to nodes vs. the cumulative 1 st hierarchical instrength (a) and the cumulative 2 nd hierarchical instrength (b) obtained from the cat cortical network.

Fig. 4c ,Fig. 4 -
Fig.4c, from which it is clear that the increase of the clustering coefficient reduces the relative

Table 1 -
Number of nodes (#Nodes), number of edges (#Edges), means and standard deviations of the clustering coefficient (CC), cumulative hierarchical instrengths for levels 1 to 4 (IS1 to IS4), cumulative hierarchical outstrengths for levels 1 to 4 (OS1 to OS4), and the Pearson correlation coefficient between the activation and all cumulative hierarchical instrengths and outstrengths (C IS1 to C OS4 ) for the complex networks considered in the present work.