Reconstructing dynamical networks via feature ranking

Empirical data on real complex systems are becoming increasingly available. Parallel to this is the need for new methods of reconstructing (inferring) the topology of networks from time-resolved observations of their node-dynamics. The methods based on physical insights often rely on strong assumptions about the properties and dynamics of the scrutinized network. Here, we use the insights from machine learning to design a new method of network reconstruction that essentially makes no such assumptions. Specifically, we interpret the available trajectories (data) as features, and use two independent feature ranking approaches -- Random forest and RReliefF -- to rank the importance of each node for predicting the value of each other node, which yields the reconstructed adjacency matrix. We show that our method is fairly robust to coupling strength, system size, trajectory length and noise. We also find that the reconstruction quality strongly depends on the dynamical regime.


Introduction
A foremost problem in modern network science is how to reconstruct (infer) the unknown network topology from the available data [4,9,12,14,41,44]. Namely, while the functioning of real complex networks can often be to some degree observed and measured, their precise topologies (organization of connections among the nodes) is almost never accessible [36]. Yet understanding the architectures of real complex networks is key, not just for applied purposes, but also for better grasping their actual functioning [22,52]. For this reason, the topic of developing new and efficient methods of network reconstruction gained ground within network science [61]. account the (potential) influence of all the other nodes in the network. In contrast, RNA takes a uni-variate stance on correlation, by measuring it against a time series in each network node separately. Additionally, each measure of pairwise correlation used in RNA often assumes a known influence model, e.g., Pearson correlation coefficient (one of the measures used in RNA) assumes that the interdependence between the given pair of nodes is linear.
Among the core techniques in machine learning is supervised learning: one tries to learn from the observation data how does a dependent variable (target) depend on a given set of independent variables (features). To this aim, one searches for a (predictive) mathematical model that is to capture this dependence. This model can also be used to predict the value of the target given the values of the features. In such a model, not all features will play the same role-the target variable will in general depend more on some features than on others. We can therefore rank the features according to their influence on the target, and this is what machine learning literature calls feature ranking [18]. There is a range of different feature ranking methods, such as RReliefF [49] and Random Forest [7], and with a ranking produced by one of these methods, one can improve the learned model in several ways. The simplest of them is to ignore the features (independent variables) with low ranks, as they have little or no influence on the target. Such features are often complicating the model without contributing to its accuracy. In fact, a simplified model without such features can be even more accurate (due to a phenomenon called over-fitting) [18]. Crucial then is to set the best threshold on which features to ignore and which to keep in order to obtain the most accurate model.
In this paper we propose a new method of reconstructing a dynamical network of physical interest from discrete time series of node dynamics. In contrast to the usual formulations of this problem in physics literature, we here build our reconstruction method on the concept of feature ranking. Specifically, we treat the dynamical state of a given node as the target variable, while the previous states of all other nodes are treated as features. We use the dynamical data to quantify how much each feature influences the target, and compute the feature ranking accordingly. Some features will have a strong influence on the target, so it is reasonable to assume that the corresponding nodes are linked to the studied node. It is also safe to assume that low ranked features (nodes) are not connected to the studied node: Of course, the key issue is how to set the best threshold.
Note that in the formulation of our method we made no assumptions about the knowledge of the interaction functions or the dynamical equations of network dynamics. Therefore, our method relies solely on the time series and their properties such as length and possible presence of observational noise (we assume that time series are coming from some empirical measurement/observation of the system).
The rest of the paper is organized as follows. In the next Section we first explain some basic concepts from machine learning and feature ranking, and then explain and motivate our method. In Section Results we illustrate the performance of our method using several examples and study its response to diverse properties of the system. We close the paper with the discussion of our findings and limitations of our method, emphasizing the potentials for practical use.

The reconstruction method
In this section we explain our reconstruction method. For clarity we build it step by step, first explaining the relevant concepts of its machine learning background.
Machine learning, features and feature ranking. Machine learning studies algorithms whose performance improves with 'experience' [40]. Such improvement is typically gained by making the algorithm 'learn' from that experience, which comes in form of many examples of data [20,70]. To 'learn' means to look for patterns in the data and extract them: For example, by making a Fourier decomposition of various sound signals, one can "learn" to differentiate between human speech and birdsong. Machine learning can be seen as an approach to data-driven modeling suited for circumstances when our knowledge about the studied system is limited. This is the core reason why machine learning is being increasingly used in a variety of scientific disciplines, ranging from from medicine and biology [16,19,57], to stock market analysis [24], text classification [55,64] and image identification [50].
Physics community has over the past decade recognized this ability of machine learning, which triggered an array of novel results in diverse fields of physics [23,54,56,66,72], including complex networks [68] and dynamical systems [71]. In particular, machine learning was also used to formulate the network reconstruction problem for several domain sciences [25,58].
In the most common setting of supervised learning, an algorithm uses existing examples of data as inputs, and produces a set of patterns or a predictive model as the output. Examples of data are typically given in the attribute-value representation [73], which means that each data example is described via series of values of attributes (in machine learning also called features or independent variables). Hence, one can use the input data to create a predictive model describing how the target variable depends on the features. The model can be then used to predict the value of the target, given any values of the features, even ones not included in the training data. Furthermore, the model can be used to determine the importance of features or feature ranks.
To illustrate the idea of feature ranking, say we are given the equation and let us assume that function f is not known, but it is known that y depends on several variables x i . In other words, y is the target variable and x i are the features. This type of a task is referred in machine learning as a regression task and it is being solved using a machine learning algorithm M, i.e., f ≈f = M(D), where D is the data set andf is the prediction model that for any given observation (x 1 , x 2 , x 3 ) can be used to predict the value of y,ŷ =f ( and we want to reconstruct (or infer) f . Note that in the data we also have the feature x 3 , which actually does not influence y, but we assume not to know that a-priori. This situation is very common in various scientific domains, as the inspected system is often poorly understood, and the only available data is collected via features that may or may not influence the target variable. One example of a machine learning algorithm M for regression is Random Forest [7]. A Random Forest model is an ensemble of piece-wise constant models, regression trees [8], where model segments correspond to intervals of feature values: The algorithm for learning regression trees uses training data to approximate the optimal splits of the data space into segments with a constant value of the target (regression tree is a hierarchical splitting of the feature space into segments). Each tree in the Random Forest ensemble is learned on a random sample of the learning data set D, and each split in the tree is chosen from a random sample of features x i . The predictionŷ of the ensemble is the average of predictions of all the trees. So, each random sample of input data gives a new tree (an independent splitting scheme), which we average over. While learning a single tree is prone to over-fitting the training data, the ensemble of trees is proven to be more robust, thus leading to accurate predictive models.
The Random Forest machine learning algorithm can be also used for feature ranking. One can compare the prediction error of (i) the Random Forest model learned on the learning data set with the prediction error of (ii) the Random Forest model learned on the randomized training data, where the values of the feature of interest are being randomly permuted between the data points. Intuitively, if the errors of the two models differ a lot, then the importance of the feature is high: Note that in this case, the randomly permuted values of the feature cause the model error increase, hence the feature contributes a lot to the model accuracy. And vice versa: The feature importance is small, if the observed difference is low.
Another example of an algorithm for regression is Nearest Neighbor [1]. Given a data point x, the Nearest Neighbor algorithm finds its nearest neighbors in the learning data set D (with respect to the values of the features) and then predicts the target value of x as an average of the target values of the nearest neighbors with respect to a distance measure (e.g. Euclidean) in the feature space. RReliefF [27] is an extension of the simple nearest neighbor idea for feature ranking. It ranks the importance of features based on the detected differences between nearest neighbor input data example pairs: If there is a feature value difference in a pair with the similar target value, the feature importance is decreased. In contrast, if there is a feature value difference in a pair with dissimilar target values, the feature importance is increased.
Let us assume now that we applied a feature ranking algorithm R (such as Random Forest or RReliefF) on the above data set D and obtained the following feature ranking or ranking scores (values are illustrative): The exact values of the ranking scores are not important, what is important are their relative values. In this case x 1 has the largest score, which means that it is ranked as the most important feature for the target variable y. x 3 , on the other hand, has the lowest score and its influence on the value of y is small, if such influence exists at all (from Eq. 1 we know that it actually does not). This ranking can now be used as input for modeling y with one of the standard regression methods: Instead of fitting it on all three features, we fit only on x 1 and x 2 . However, in practice, deciding where to draw the line and what features to ignore is far from trivial and often depends on the particularities of the problem at hand. In machine learning the issue of identifying the relevant features is a classic problem in its own right called feature selection, and can be studied via several approaches, including feature ranking.
Our reconstruction method. Armed with above insight, we now proceed to our reconstruction method. As already mentioned above, feature ranking methods can be naturally applied to the problem of reconstructing a dynamical network from the observations (time series measurements) of its node dynamics. Assuming that the state of a selected node of a dynamical network represents a target, and that its state is influenced by the connected network nodes, which represent features, one can define a supervised learning problem for this node: Learn a regression model for predicting the state of the selected node from the states of all the other nodes in the network. Note that our aim here is not the predictive model, instead, we are interested in the feature ranking only: we can actually rank the importance of the other network nodes to the selected one, because a highly ranked node is likely to be connected to the selected node. We now only need to repeat this procedure for all the nodes and we can reconstruct the entire network structure.
Note that this articulation of the reconstruction problem includes no assumptions about the network dynamical model or general properties of the dynamics.
Let us now formally present our method. Although we have developed it independently, it is very similar to the method presented in [25]. We start with a general network with N nodes. The state of a node i at time t + 1 is x i (t + 1), and it dynamics is influenced by the states of all nodes connected to i at some earlier time t: Note that this is the most general possible formulation of the network dynamics: Each node's behavior is influenced by unknown node-specific interaction function f i , dependent on all other nodes. We assume total observability of the system, meaning we have access to the trajectories of all nodes at all times. (The above formulation is suitable for discrete systems described with difference equations. For continuous systems described with differential equations, the analogous equation would bė Our method is general enough to work also for this case, provided that the observation data is being measured at a high enough sampling rate to enable reasonably accurate computation of derivatives.) When reconstructing the network, the interaction function f i is not known, but we can use the observation data to model it. The observation data consists of state trajectories (x i (1), x i (2), . . . , x i (L)) for all the network nodes. The Eq. 2 therefore represents our regression modeling problem for node i, where the state variable x i (t + 1) is the target and state variables x j (t); j = 1 . . . N are the features. From the observation data we construct the training data with L − 1 examples and a suitable machine learning algorithm for regression M could be used to compute However, we are not really interested in solving these N regression problems, we only perform feature ranking for them, since it is these rankings that contain information on the connections among the nodes. In other words, we are not interested in reconstructing the interaction function f , but only the network topology. At this stage feature ranking can be done with any of the existing feature ranking algorithms, for the purposes of this paper, we consider the two already mentioned algorithms, Random Forest [7] and RReliefF [27]. Now, by applying a feature ranking algorithm R to the training data D i we get feature ranks (importance scores) for node i where F i j tells us what is the estimated importance of node j for the node i. Note that the values F i j are relative and do not have any physical meaning. By extracting these feature importance scores for all N regression problems, we construct a matrix F of dimension N × N: Now, each element F i j in this matrix quantifies how much is the node i important for the node j. Our assumption is that higher the value F i j is (relative to other matrix elements) more likely it is that the link i-j exists. Hence, we simply extract the reconstructed adjacency matrixÂ from F by setting the threshold θ and assuming that the links only exist for values of F above the threshold. In general, we can construct N 2 different reconstructed adjacency matricesÂ n from F by using each of its elements as a threshold: where n = (1, 2, ..., N 2 ). The threshold θ is a parameter of the method that controls the sensitivity of detecting links in a network; its setting in general depends on the investigated system.
Measuring reconstruction quality. To evaluate how the reconstructed adjacency matrix compares to the real one, we compute a confusion matrix as presented in Table 1. This confusion matrix tells us much more than the simple accuracy of the the recon- struction of all links. For instance, not only we see the number of correctly predicted links and no-links (a and d respectively), but also the number of no-links predicted as links (false positives c) and the number of links predicted as no-links (false negatives b). We evaluate the performance of the reconstruction in terms of the Sensitivity or True Positive Rate (TPR) and the Fall-Out or False Positive Rate (FPR) [15]. These measures are defined as follows: The TPR tells us the ratio between the correctly predicted links and the total true links, while the FPR is the ratio between the predicted links that are actually no-links and the total number of no-links in the network. From these two quantities we can further construct the Receiver Operating Characteristic (ROC) curve [15] by computing the TPR and the FPR for different thresholds θ n . The ROC curve enable us to evaluate our method without pre-selecting a specific threshold value: It actually includes the results for all possible threshold values. The ROC curve is namely a plot in the TPR and FPR space which presents all network reconstructions that our method produces, and by connecting all the dots in the plot one can compute the Area Under ROC curve (AUC). Larger the AUC (max=1), better the method's performance, AUC=1 represents ideal network reconstruction, whereas AUC=0.5 represents reconstruction that is equivalent to random guessing of the link presence. AUC is a measure frequently used in machine learning for binary prediction, and example of which is also our network link prediction.

Results
In this Section we examine the performance of our reconstruction method. We begin by defining the dynamical system that we will employ (but of course, the method will not use that information). The above formulation of our method is based on discretetime systems defined via difference equations. However, as we already noted, the method works also for continuous systems provided we can observe and measure the trajectory. Given this, we decided to utilize a discrete-time dynamical system (map) for studying the performance of our method. The key benefit of this is that we need not to worry about measurement resolution and the implications for the precision of derivative estimates. Results presented in this Section are fully applicable to the case of continuous-time dynamical systems as well. In last Section (Discussion) we shall devote more attention to generalization to continuous-time dynamical systems.
For studying the performance of our reconstruction method we choose the logistic map, defined as Logistic map is a textbook example of discrete-time chaotic dynamical system [17,60]. The nature of the dynamics, primarily the chaoticity of the behavior, depends on the parameter r. Specifically, for r = 4 the dynamics of Eq.9 can be solved analytically, and the map is strongly chaotic.
To design a dynamical network, we attach a logistic maps to each node i and couple them as done in [38,39,51]. The joint equation reads: where the function f (x i (t), r) = rx i (t)(1 − x i (t)) stands for the logistic map in the chaotic regime r = 4. The parameter ε denotes the coupling strength between the nodes, and d i the in-degree of each node. We use random networks defined by the link probability (ρ) [10,13]. Each directed link has a probability ρ to be generated. During our study, we will keep the probability to ρ = 0.1.

An illustrative example
To illustrate our method we examine an example of time series obtained from Eq.10. We consider a random network with N = 25 nodes and set the coupling strength to ε = 0.5. We run it for a random selection of initial conditions and store the obtained time series for each node. The procedure of reconstruction is illustrated in Fig. 1. On the left-hand side the plots show the time series, also to illustrate the nature of signals we are dealing with. On the right-hand side, we show the matrix of the feature importance scores F computed with the Random Forest method and the corresponding true adjacency matrix A. We can see that F attains its maximum values along the diagonal corresponding to high self-dependence of all the nodes. This makes sense since at ε = 0.5 self-dependence is still high. However, the diagonal elements in F are not taken into account for the calculation of the ROC curve since we do not consider self-loops in the network. Finally, the bottom right part of the figure shows the ROC curve and its corresponding area under it AUC= 0.92 which we use to measure the performance of the network reconstruction. Procedure is equivalent in case of RReliefF.

Dependence on the size and coupling strength
Next we examine systematically how does the performance depend on the size of the network (number of nodes) and the coupling strength. To this end, we make a grid of parameters (N, ε). For each pair (each combination) we draw four different realization of random adjacency matrices and random initial conditions chosen from [0, 1]. For each of these random realizations, we generate L = 12800 data points to which we apply the feature ranking method via both algorithms.
First, we study the performance of the method for a range of coupling strengths ε and network sizes N, while keeping the length of input time series constant and relatively large (L = 12 800). In Fig. 2 we present the performance of the method with AUC that is averaged over 4 independent realizations of the adjacency matrix A. We see that at very low coupling strength (ε = 0.01) for all network sizes, neither method (RReliefF -left-hand side, Random Forest -right-hand side) is able to reconstruct the underlying network-the performance is comparable to the random reconstruction (i.e., AUC ≈ 0.5).
As the coupling strength increases, Random Forest performs better than RReliefF, especially at large network sizes. This shows that as we increase the network size RReliefF needs higher coupling strength to detect network interactions. This is not the case for Random Forest, with which we find better than random (i.e., AUC> 0.5) reconstruction in areas where RReliefF is performing almost as a random classification. However, around ε = 0.5, RReliefF starts to improve. Moreover, the impact of increasing network size on the performance is lower at this coupling strength, and we find very good reconstruction performance for both reconstruction methods at N = 100 nodes (AUC RReliefF = 0.91, AUC RF = 0.82). For ε > 0.5 we find very good reconstruction for sizes ranging from N = 12 to N = 50, especially for RReliefF, which is still outperforming Random Forest at these high couplings. Finally, at ε = 0.8 and N = 100, neither method is able to correctly detect network interactions and we end up with reconstruction performance close to random. However, Fig. 2 hides one peculiarity worth examining further. Looking at the case of the largest considered network, N = 100, we see that for both algorithms performance improves with growing of ε until approximately ε = 0.5, but than deteriorates and actually reaches minimum (AUC=0.5) for ε = 0.8. Intuitive explanation is the following: For small coupling strengths there is not enough interaction among the nodes (logistic maps), so their individual chaoticity prevails, and no useful information can be extracted from the trajectories. In the opposite extreme, for very large coupling strengths the interaction is strong enough to induce very correlated dynamics of nodes/maps (synchronization), and such trajectories also fail to reveal useful information about the underlying network. But, between these two extremes, for intermediate coupling strengths, the interaction might be generating peculiar collective effects that do reveal details about the underlying network topology, which are detected by our reconstruction method.
To test this hypothesis we compute the average pair-wise correlation between trajectories C i j . Strongly chaotic trajectories will have (close to) zero C i j , whereas fully regular (synchronized) trajectories will have C i j equal (or close to) one. In Fig. 3 we scatter plot the value of C i j against the value of AUC (Ω) for all the ε's and realizations. And indeed, for small values of C i j , as well as for large values of C i j , the performance is bad for both Random forest and RreliefF. However, when C i j is in the intermediate range the performance is good, and it is in fact excellent for a rather wide range of C i j between roughly 0.1 and 0.9. This indicates that the dynamical regime is intimately related to the "reconstructibility" of networks: The reconstruction is clearly the best when the coupling strength is in the intermediate "Goldilocks" zone, for the feature ranking method using L = 12, 800. We use all combinations of systems size N and coupling strength ε as in Fig. 2. Specifically, points with the same system size include all values of ε.
where the interaction is "just right" for the system to reveal important details about its internal structure, which are captured by our reconstruction method. We also note that the system reaches the synchronous state (correlations close to 1) only for the largest system size. This is due to the increase of the average link per node that happens in the larger systems (since we kept the link density constant). This increment of the number links allows the system to be more 'synchronous', and thus exhibit stronger correlations.

Dependence on the length of the input time series
Next we investigate the influence on the input time series length on the performance of the method. In Fig. 4 we present the performance as a function of the time series lengths for both RReliefF (left-hand side) and Random Forest (right-hand side). We keep the network size constant N = 25 and plot the performance for different characteristic coupling strengths ε. When the coupling strength is low (ε = 0.01), increasing the input time series length does not improve the reconstruction performance for any of the two methods and the performance remains close to the one of the random reconstruction. At ε = 0.05 we start to get better than random at input length L = 800 for both methods, which perform similarly. Then, at ε = 0.25, as L increases, the quality of the reconstruction increases at a higher rate for Random Forest than for RReliefF. Finally, at ε = 0.6, we have a high reconstruction performance even for very short input length L = 50. Here, adding more input data points does not improve the reconstruction performance dramatically and at L = 800 the performance only slowly increases.
To get a further insight into the reconstruction performance for short input time series lengths, we now keep the coupling strength constant ε = 0.6 and in Fig. 5 we plot the performance of the reconstruction as a function of L for different network sizes N. We see that for both feature ranking methods, at N = 12, the performance is almost perfect and the additional time series data points only slightly improve the reconstruction. We find a similar behavior at N = 25 where for time series length L = 200 the performance is already very high (AUC> 0.9) and only slightly increases with longer time series. As the network size increases (N = 50, 100), the performance for short input time series L = 50 decreases significantly. However, our method still performs better than random even for small values of L, when N(N − 1)/2 ≈ LN: In other words, when the total number of links, N(N − 1)/2, to be predicted is similar to the total number of data points, LN, we use for the prediction. This suggests that we can use our method even when the available time series is very short.

Influence of the noise
Finally, we investigate the impact of noise on the performance of our reconstruction method. We select a realistic set up using observational white noise with zero mean, which, as opposed to the dynamical noise, does not affect the evolution of the network node dynamics. Here, we use the original system from Eq. 10 and once it is simulated, we add the noise with zero mean and an amplitude of σ .
x i (t) = x i (t) + ξ ; with ξ as the observational white noise with standard deviation σ . We now repeat all the computations onx i (t) instead of on x i (t). We have to keep in mind that the addition of noise changes both the target valuex i (t + 1) and the features (x 1 (t),x 2 (t),. . . ,x N (t)) are influenced by the observational noise. In Fig. 6 we present the average area under the ROC curve AUC with RReliefF (left-hand side) and Random Forest (right-hand side) as a function of the amplitude of the Gaussian white noise σ . We keep N = 25 and plot the performance for different coupling strengths ε. At ε = 0.01, noise does not really influence our results and the reconstruction performance is still close to random. At higher coupling strengths, the change in performance as the noise amplitude grows is lower. For instance, at ε = 0.06 the performance for both feature ranking methods almost does not change until σ = 0.5. At σ = 0.5, with RReliefF we perform close to random for all the coupling strengths. However, Random Forest is more robust as even at σ = 0.5 we have a better than random reconstruction for ε = 0.25, 0.6. This effect can be explained due to the fact that noise is added to both sides of Eq. 10. Finally, at σ = 1 the performance decreases until we get close to random performance. We have to keep in mind that the two last noise amplitudes considered (σ = 0.5, 1) have a similar or higher amplitude to the original time series amplitude, which is an overestimation of real world examples of noise levels. Therefore, for both feature ranking methods (and especially for Ran- dom Forest, which is itself known to be robust to noise) our method is very robust to observational noise.

Discussion
We designed a novel method for reconstructing (inferring) networks of dynamical units from observation (measurements) of their node trajectories. It is based on feature ranking, a common methodology of machine learning. By ranking the "features", which are the values of the trajectories of other nodes, we can extract information on what other nodes are most likely to be connected with the considered node. We test the performance of our method using networks of coupled logistic maps, and obtain good results for a range of coupling strengths and network sizes. Also, our method is able to perform well even for relatively short trajectories and it's fairly robust to noise.
The key novelty of our method is that it requires no assumption on the knowledge of interaction functions or the dynamical model of the network. Also, we make no hy-potheses on the nature of the available trajectories (data). We consider this as the main contribution of our paper, since most of the similar methods in the current (physics) literature make assumptions about above mentioned details that can sometimes be rather strong. So, while our method is not based on physical insight into the collective dynamics, it is immediately applicable to practically any complex dynamical system (of physical interest or otherwise), requiring no prior knowledge about system's internal details. Note that the information on the interaction function was not used in testing the method's performance (logistic map). However, that information was also not revealed by the method -the method reconstructs only the topology.
Still, our method does have some limitations. Specifically, the performance seems to deteriorate as the system size increases. For reconstructing large networks, long observation trajectories are needed for accurate reconstruction. This could hinder the applicability to large systems that abound in applications. But despite this, for certain dynamical regimes (ranges of coupling strength), the performance remains good independently of system size. This represents a hope for applications to large systems. In contrast, for other dynamical regimes (too small or too strong coupling), the performance worsens. This cannot be helped, since in those dynamical regimens (full chaos or full regularity) the system reveals nothing about its underlying structure. Also, while our method in general reacts well to the noise, too excessive noise deteriorates the performance. On the other hand, as discussed above, our method is not limited by a any assumptions about prior knowledge on the system under scrutiny.
Another issue revolves around the generalization to continuous systems. This paper's presentation was based on maps, but with minor modifications, the method can be generalized to continuous-time dynamical systems (defined via differential rather than difference equations). To this end, one would need to replace the right-hand side of Eq. 2 withẋ i (t), the time derivative (rate of change) of the state of a node i at time t. Since we assume total observability of the system, one can compute the derivatives of the state trajectories numerically [48] and use the same regression and feature ranking algorithms to infer the network structure. However, further empirical evaluation of the proposed method is needed in such a setting.
The core purpose of this paper is to present the concept and theoretical underpinnings for our method. Next step is to apply it to real systems in two steps. First, one should use a scenario where the ground truth is available to test the performance in this setting. Second, our method can be used to actually reveal the topologies of real networks whose topologies are still unknown. We envisage one such possible application in functional brain networks, where the interest is the brain large-scale connectivity obtained without knowing (or inferring) the interaction function(s). Also, data on gene expression can be used in a similar way to reconstruct gene regulation networks under given circumstances.
An open question in the context of network reconstruction methods is their comparison. This is not trivial, since various methods depart from different hypotheses and knowledge about the system, which makes their merits harder to compare. Another distinction is also between what different methods are reconstructing: interaction functions and network topology, or just the topology? Hence, our method can be meaningfully compared only to methods that: (i) reconstruct only the topology, (ii) make no assumptions on interaction functions, and (iii) rely on discrete measurements of dy-namical trajectories of nodes. While such comparison would be of interest in the field, it is beyond the scope of the present paper. We here also mention that our method requires no interference with the systems (such as e.g. random resets), and is as such non-invasive. However, such interference could further improve its performance.
We close the paper with discussing the possible avenues of future work. That primarily includes improving this method by combining the rankings obtained with different feature ranking methods. Here, we have only compared the performance of our methods when using Random Forest or RReliefF algorithms for ranking. In further work, one can also combine an ensemble of rankings obtained with different feature ranking algorithms [45]. Furthermore, we focus here solely on ranking of features. Instead, one could explore the use of the predictive models that are learned using Random Forest or other supervised learning algorithms. In particular, the predictive models can reveal various relevant aspects of the mechanisms of interactions between network nodes, in many real-world cases when these mechanisms are not known.