Multifunctionality in a Reservoir Computer

Multifunctionality is a well observed phenomenological feature of biological neural networks and considered to be of fundamental importance to the survival of certain species over time. These multifunctional neural networks are capable of performing more than one task without changing any network connections. In this paper we investigate how this neurological idiosyncrasy can be achieved in an artificial setting with a modern machine learning paradigm known as `Reservoir Computing'. A training technique is designed to enable a Reservoir Computer to perform tasks of a multifunctional nature. We explore the critical effects that changes in certain parameters can have on the Reservoir Computers' ability to express multifunctionality. We also expose the existence of several `untrained attractors'; attractors which dwell within the prediction state space of the Reservoir Computer that were not part of the training. We conduct a bifurcation analysis of these untrained attractors and discuss the implications of our results.


I. INTRODUCTION
Multifunctionality is an essential element of biological neural networks [1][2][3] .These multifunctional networks are distinct pools of neurons capable of performing a multitude of mutually exclusive tasks.To elaborate with example, it was found that a subset of the same bundle of neurons in the brain of the medicinal leech (Hirudo medicinalis) can switch their activity pattern once it senses a change in its surroundings to drive either a swimming or crawling motion 4 .It is reported a) Electronic mail: andrew_flynn@umail.ucc.ie that a cluster of neurons located in the pre-Bötzinger complex (a region of the mammalian brain stem), is responsible for regulating a switching between different respiratory patterns 5 .Depending on a particular input, the neurons in this region of the brain can alter their activity pattern accordingly to elicit a switching between eupneic (regular) breathing, sighing or gasping.Furthermore, it is argued that multifunctionality in neural networks may naturally emerge from an efficient use of limited resources (in this case, neurons) and thus an evolutionary advantage in enduring environmental changes, reflecting the developmental history of certain organisms 6 .
Nevertheless, what is ubiquitous amongst these multifunctional neural networks is that they in principle resemble a system with more than one modus operandi.Based on a particular input, there is a distinct activity pattern expressed by the neurons in the network in order to perform one of the many tasks required of it.When needed, these multifunctional neurons switch to another activity pattern to collectively execute a different task while the network connections remain fixed.Therefore, if an artificial neural network was trained to sustain more than one desired activity pattern it would in this sense be multifunctional.From a dynamical systems perspective, this type of behaviour is akin to a multistable system or a system with a coexistence of attractors.For further reading on multistable systems see Pisarchik and Feudel 7 .
There is much to be gained in the pursuit of artificial intelligence by articulating our current knowledge of biological neural networks and dynamical systems in machine learning environments.In this paper we employ such a bilateral rationale to encapsulate multifunctionality in an artificial neural network by training a 'Reservoir Computer' [8][9][10] (RC) to facilitate the coexistence of more than one desired activity pattern.
The RC approach to machine learning has been successfully applied to a number of problems, for example, time series prediction 11 , visual identification tasks 12 , real-time detection of epileptic seizures 13 , and inferring from limited time series data; unmeasured state variables 14 , Lyapunov exponents 15 , and causal dependencies between variables 16 .
The basis of our research involves using a recent formulation of a continuous-time RC, presented by Lu, Hunt, and Ott 17 .Here the RC was trained to perform short term predictions of a chaotic Lorenz system 18 and reconstruct the 'cli-arXiv:2008.06348v1[cs.NE] 10 Aug 2020 mate' (qualitatively similar dynamical behaviour) of its famous butterfly shaped chaotic attractor.Taking this result into account, we train a RC to promote a coexistence of reconstructed chaotic attractors in its prediction state space, thus becoming multifunctional.In order to demonstrate the flexibility of our approach we consider training scenarios in which the climate of these chaotic attractors is reconstructed from a variety of sources.For example, we consider the case where these chaotic attractors are generated from two different systems entirely.We devise a training technique to 'blend' data with a certain weighting parameter from these chaotic attractors.The choice of this weighting along with a parameter involved in tuning the memory of the RC is critical to achieving multifunctionality.We investigate the optimal setting of these parameters, from which we infer the regions in this parameter space where the RC achieves multifunctionality.
However, while we train the RC to realise more than one chaotic attractor in its prediction state space, we find several 'untrained attractors' also residing here.These attractors inhabit the prediction state space but were not part of the training and limit the regions in which multifunctionality is obtained.A bifurcation analysis of these untrained attractors reveals some interesting dynamics where, for example, one of these attractors undergoes a period doubling route to chaos.
The structure of the rest of the paper is as follows.In Section II we provide details of the RC approach to attractor reconstruction and present the training procedure we use to achieve multifunctionality in a RC.Next, in Section III, the problem of epitomising multifunctionality in a RC is further conceptualised.The trajectories on the chaotic attractors we use as our training data are also given here.In Section IV we present our main findings and then provide an extended discussion of our results in Section V.

II. RESERVOIR COMPUTING
Echo-State Networks 8 (ESNs) and Liquid-State Machines 9 (LSMs) are two independently proposed designs of artificial neural networks with recurrent connections that can be trained to provide a self-sustained activity pattern.While ESNs were engineered in the context of machine learning and LSMs were developed from a computational neuroscience perspective, they share a similar philosophy and as a result have become increasingly synonymous under the umbrella term of 'Reservoir Computing' or indeed a 'Reservoir Computer' (RC) 10 .Both are based upon the notion that as long as the internal connections, or in this case the 'reservoir', possess certain characteristics it is not necessary to adapt the internal weights of the network in order to achieve a desired learning outcome.Instead, it is sufficient to find an appropriate readout layer for a given task.Consequentially, as the internal connections of the reservoir remain unaltered throughout the training it can be represented physically.There are many interesting constructions of these 'Physical Reservoir Computers', using, for example, an optoelectronic system 19 , or an octopus inspired soft robotic arm 20 .For a recent review see Tanaka et al. 21.
For the purpose of keeping this paper self-contained, we FIG. 1: Listening reservoir: The input signal u(t) is projected by σ W in to drive a response from the reservoir state, r(t).
now outline the anatomy and implementation stages of the RC setup we use that was proposed by Lu, Hunt, and Ott 17 .We also present the technique we devised in order to train the RC to reconstruct the climate of more than one chaotic attractor based on a given initial condition.

A. Listening Stage
In the listening stage, as illustrated in Fig. 1, the input data is used to drive the 'listening reservoir' away from its initial state.This drive-response system evolves according to, Here r(t) ∈ R N is the state of the reservoir at a given time t and N denotes the number of artificial neurons.γ is a parameter arising from the transformation of the discrete-time formulation in Jaeger and Haas 11 to continuous-time.M ∈ R N×N is the adjacency matrix representing the internal network connections of the reservoir.The input strength parameter, σ , and W in ∈ R N×D , the input matrix, when multiplied together represent the weight given to the input vector u(t) ∈ R D as it is projected into the reservoir.Here D is the dimension of the input data.We compute trajectories of Eq. ( 1) using the 4 th order Runge-Kutta method with time step τ = 0.01.In the listening stage, we allow Eq. (1) to evolve from t = 0 to t = t listen = 200 in order to remove any dependency the RC may have on its initial condition and synchronise to the input.

B. Training Stage
In the training stage, we focus on the data generated from Eq. (1) and u(t) for t listen ≤ t ≤ t train = 400.The aim of training is to determine a post-processing function of the reservoir state, ψ (r(t)), which can replace the input.Like in Lu, Hunt, 2: Predicting reservoir: The readout layer W out q( r(t)) replaces the external input to the reservoir.
and Ott 17 , we consider post-processing functions of the following form, ψ(r(t)) = W out q(r(t)). ( Here q(r(t)) ∈ R 2N is a vector function which returns a vector where the first N elements are r(t) and the second N elements are r 2 (t).While this step differs from many other manifestations of the readout layer, it is said to be advantageous over linear functions of r(t) and embeds further nonlinearity in the network.The 'output matrix', W out ∈ R D×2N is determined by a ridge regression procedure which we now outline.Each evaluation of q(r(t)) during the training time is stored in columns of the regularization matrix, X, X = q(r(t listen )) q(r(t listen + τ)) • • • q(r(t train )) .(3)   The target data matrix, Y, is constructed in a similar manner, Finally W out is calculated as, here β is the regularization parameter, its role is to discourage overfitting and penalise large elements of W out from occurring.I is the identity matrix of the appropriate dimension.

C. Predicting Stage
In the predicting stage we compute solutions of the 'predicting reservoir' described by the following equation, with r(0) = r(t train ).This setup is illustrated in Fig. 2.
If the training was successful, the readout from the reservoir, W out q( r(t)), should be an approximation of the original input which we denote as, û(t) ≈ u(t).
In our numerical experiments we set, N = 1000 and the reservoir state is initialised as r (0) = (0, 0, . . ., 0) T = 0 T .Here T denotes the transpose operation.M is constructed as a random matrix of sparse Erdös-Renyi connectivity with a specific spectral radius, ρ.To elaborate, the matrix is designed such that each element is chosen independently to be nonzero with probability P = 0.04 (i.e.sparsity = 0.04 or degree = 40) and these nonzero elements are chosen uniformly from (−1, 1).This random sparse matrix is rescaled such that the magnitude of its largest eigenvalue is ρ.For example, if ρ is close to 1, this in effect means that the input takes a long time to die out within the reservoir, which is more preferable in tasks requiring a large memory 22 .The W in matrix is designed such that each row has only one nonzero randomly assigned element, chosen uniformly from (−1, 1).The time damping factor is kept constant at γ = 10 while ρ, σ and β are varied in order to achieve a desired learning outcome.
We remark that it is difficult to choose ρ, σ , γ, and β for a specific task.To combat this, algorithms have been developed which optimise these parameters to reduce the error in time series prediction problems 23 .As we will see, a multifunctional RC requires that for a given set of parameters, it can reconstruct the climate of more than one attractor.However, certain attractors may require different parameter settings to be reconstructed.In this paper we investigate the effects in performance that changing ρ has in terms of the RC reconstructing a pair of attractors and therefore achieving multifunctionality.We choose to work with a random Erdös-Renyi topology in order to provide the RC with enough dynamical flexibility to solicit multistable dynamics.
Next, we present the training technique we designed that combines data from different sources in order to construct a single output matrix that allows for the reconstruction of more than one chaotic attractor.

D. Training with the 'Blending Technique'
We adapt the regression procedure from Sec. II B to instead use data from two input sources and the corresponding reservoir output in both training stages.From a philosophical perspective, it is necessary that the matrices M and W in and parameters ρ, σ and β remain identical when generating both training data sets.This data is stitched together in what we call the 'blending technique' with the 'blending' or 'weighting' parameter, α ∈ [0, 1].The particular construction of the regularization matrix, X, and target data matrix, Y, used in the regression are now outlined.
First, the training data collected from the reservoir regarding each individual attractor, X S 1 and X S 2 , for S 1 and S 2 some arbitrary attractors, are grouped together and 'blended' in the following concatenation, Based on this construction, when α = 0 or α = 1, one data set completely dominates the training.The same procedure is applied to the corresponding target data matrices in order to obtain the equivalent Y C .To avoid any biases in the concatenation step we take advantage of the memory-less based readout layer by randomly reordering each column of the matrices, X C and Y C , corresponding to the input and reservoir state at a given time.These matrices are used in the ridge regression formula in Eq. ( 5).
The aim is to find an α that will give rise to an output matrix, W α out , which, depending on the initial condition (IC), allows the RC to reconstruct one or the other attractor.We provide a schematic of the desired outcome in Fig. 3.

III. MULTIFUNCTIONALITY AND DYNAMICAL SYSTEMS
The characterisation of neuronal behaviour regularly invokes the language of dynamical systems theory 6,24 .Conceptualising certain traits of neurons from this perspective can act as a bridge between biological and artificial neural networks where advancements in one can contribute to the other.
Considering the claim made in Section I, that a multifunctional neural network in principle resembles a system with a coexistence of attractors, and that a RC can be trained to reconstruct the climate of a chaotic attractor, we pose the following: can a RC be trained to permit a coexistence of reconstructed attractors?Such a RC can be said to, in the spirit of our claim, express multifunctionality.
To demonstrate this we consider the following tasks which require multifunctionality by training a RC to reconstruct a coexistence of chaotic attractors from: • Case I: a multistable system.
• Case II: a system with different parameter settings.
• Case III: two different systems entirely.
In order to provide an adequate testing ground we require input data from a system that allows for the generation of multiple coexisting chaotic attractors under a wide range of parameters.An example of such a system was presented in Guan et al. 25 and described by the following set of equations, Here, x(t) = (x 1 (t), x 2 (t), x 3 (t)) T , defines the state of the system at a given time t.a, b, c, and d are the system parameters.For example, when setting (a, b, c, d) = (5, 15, 3, 12) there is a coexistence of two single-scroll chaotic attractors as illustrated in Fig. 4. Initialising Eq. ( 8) with x A 1 (0) = (1, 1, 1) T , results in the state of the system settling on the attractor A 1 , seen as the blue trajectory in Fig. 4. The other attractor, A 2 , indicated by the orange trajectory in Fig. 4, is arrived at once Eq. ( 8) is initialised from Trajectories on these attractors, A 1 and A 2 are considered as the training data for Case I.
In Case II we set the problem of reconstructing a doublescroll chaotic attractor in addition to the attractor A 2 from Fig. 4.This particular double-scroll chaotic attractor, which we call B 1 , is reached by initialising Eq. ( 8) with x B 1 (0) = (1, 1, 1) T and setting the system parameters as (a, b, c, d) = (5, 8, 2, 2) as shown in Fig. 5.
In Case III, we consider reconstructing A 2 and the chaotic butterfly attractor, L, generated by the Lorenz system 18 .
We remark that, the Lorenz attractor L and the attractors chosen from Eq. ( 8) share no common region in state space.Otherwise, this adds a further level of difficulty in training a RC to distinguish which attractor a given trajectory belongs to.Alternatively, to expose further criteria needed for the RC to exhibit multifunctionality we suggest studying the effect of decreasing the distance between two disjoint attractors.

IV. RESULTS
In this section we illustrate the events in which a RC was trained to exhibit multifunctionality by successfully reconstructing the coexistence of attractors as specified in Case I, II and III.We focus on establishing the optimal blending of the training data with respect to ρ, so to determine the regions in which multifunctionality was achieved.Following these observations we examine the circumstances in which the reconstruction of a given attractor fails and detect a number of 'untrained attractors'; attractors residing within the prediction state space that were not part of the training.We then track the evolution of these attractors with respect to α and ρ and uncover a number of 'behind-the-scenes' bifurcations.

A. Attractor Reconstruction
There are many means of assessing the accuracy of a predicted time series.In this work we choose to calculate, θ S (W S out ) as the Normalised Root Mean Square Error (NRMSE) of the prediction in comparison to the target time series averaged over all state variables of a given attractor S. θ S (W S out ) i for the i th state variable is calculated as, In our results we set t predict = 600 as the prediction end time and t predict − t * is time that error sampling begins from.u i (t) and ûi (t) are the i th state variables of the target and predicted time series.The max (•) and min (•) functions are measures of the maximum and minimum value of the time series evaluated from t predict − t * to t predict .The closer θ S (W S out ) is to 0 the more accurate the prediction of S. It was found empirically that for θ S (W S out ) > δ = 0.35 attractor reconstruction fails.Throughout our work we keep β = 10 −2 , σ = 0.2 in Case I and III, and let σ = 0.4 in Case II.
First, we conduct an error analysis of the predicted time series when using the task specific matrices, W A 1 out , W A 2 out , W B 1 out , and W L out , to determine if the attractors can be reconstructed in the usual sense.In Fig. 6 we provide a picture, where having set t * = t train , the error analysis indicates attractor reconstruction was achieved for all ρ ∈ [0.1, 0.9] using each of the task specific matrices as θ S (W S out ) < δ in Case I-III.With this established we now search for values of α which give rise to a single readout matrix, W α out , that allows the RC to reconstruct either of the attractors specified in Case I-III.After applying the blending technique we calculate the following error function of the particular readout matrix used to reconstruct a given attractor, Here θ S (W α out ) is the NRMSE when using the blending technique to reconstruct an attractor S with a certain α.This measure of error implies that if ε S (α) < 1, the prediction of S was Starting with the task of reconstructing attractors from a multistable system in Case I, we set ρ = 0.7 with the resulting ε S (α) vs. α plot shown in Fig. 7a.
Here we see that on one hand as α is increased from 0, there is large decrease in ε A 1 and stays relatively close to 1 for α 0.22, while on the other hand, ε A 2 remains relatively near 1 for α 0.88, and then grows as α is increased thereafter.From this we deduce that for 0.22 α 0.88 multifunctionality is expressed by the RC.We illustrate in Figs.7b-7c a successful implementation of the blending technique for α = 0.5.The predicted trajectory on A 1 and A 2 when using the task specific matrices, W A 1 out and W A 2 out , are plotted in orange and the predictions using the multifunctional output matrix, W α out , are plotted in green.A trajectory on the actual attractors from Eq. ( 8) in each figure is plotted in blue.
We conduct a similar analysis for Case II and III with Fig. 8 illustrating examples where the RC was successfully trained to be multifunctional.We see in Fig. 8a that when training the RC to reconstruct both B 1 and A 2 in Case II, the desired coexistence of chaotic attractors is achieved for ρ = 0.3 and α = 0.5.We find that in Case III when setting ρ = 0.65 the RC is successfully trained to express multifunctionality as it can reconstruct both L and A 2 for α = 0.85 as seen in Fig. 8b.
The results illustrated in Figs.7-8 show that a RC can be trained to exhibit multifunctionality.Furthermore, this broadens the current set of applications a RC is capable of.We show that instead of having to change parameters in a system for it to exhibit a different behaviour, one can merge separate modes of operation from various parameter choices of the system to coexist in the prediction state space of a multifunctional RC.We also demonstrate that the combination of attractors is not limited to a single system as it is possible to combine attractors from different systems to coexist.We remark that given this ability there is the prospect of designing an appropriate controller for Eq. ( 6) to switch between attractors.For further reading on 'inter-attractor control' see Richter 26 .
Although we have illustrated instances in which appropriate values of α were found to give rise to multifunctionality, this (a) Case II: Reconstructing attractors B 1 and A 2 .cannot be said for all α values nor for other choices of γ, ρ, σ and β .Moreover, relying on an error analysis alone is not sufficient enough to classify the parameter regions in which multifunctionality is achieved.This RC is designed such that if attractor reconstruction fails then the predicted trajectory will not blow up to infinity in a finite amount of time.However, the prediction can decay toward some other stable attractor.Furthermore, given the nature of our study, it is also possible that the RCs predicted trajectory on one chaotic attractor can switch to the other chaotic attractor.
Following this argument, in the next section we employ a means of characterising the resultant attractor that the prediction settles to and from this identify the regions in the (α, ρ)plane where multifunctionality is achieved.

B. Exploring multifunctionality in the (α, ρ)-plane
In this section we analyse the long term behaviour of the RC initialised with r(0) corresponding to A 1 , A 2 , B 1 or L (for the appropriate Case), and trained for a given α ∈ [0, 1] and ρ ∈ [0.1, 0.9].
We choose to assign a colour to each point in the (α, ρ)plane that characterises the prediction of a given attractor.
More specifically, a point in the (α, ρ)-plane is coloured: 1. Yellow, if the predicted time series is periodic for t predict − 40 ≤ t ≤ t predict , we say the prediction has decayed to some limit cycle.
2. Green, if the predicted time series remains constant for t predict − 10 ≤ t ≤ t predict , we say the prediction has decayed to some fixed point.
3. Purple, if for t * = 40, θ S (W α out ) ≤ δ for the predicted time series in comparison to the target time series of the other chaotic attractor, we then say that the prediction has switched from one chaotic attractor to the other.4. Blue if conditions 1-3 are not fulfilled and if for t * = 40 that θ S (W α out ) ≤ δ for a given attractor S, we then say that the climate of S was well reconstructed.

Red if otherwise to signal that closer inspection of the prediction is needed.
The result of this analysis for Case I is shown in Fig. 9.
As expected, we see that attractor reconstruction is impossible when initialising the RC with r(0) corresponding to A 1 in Fig. 9a (A 2 in Fig. 9b) for α = 0 (1).We now know that the prediction in this scenario consistently decays towards some fixed point for all values of ρ.However, when increasing (decreasing) α from 0 (1) there is a more varied sequence of events for a given ρ.Prior to achieving attractor reconstruction the predicted trajectory on A 1 (A 2 ) at times tends towards some limit cycle or switches to A 2 (A 1 ).Fig. 9a also shows that for large ρ, the RC favours the reconstruction of A 2 as opposed to A 1 where the prediction of A 1 mainly switches to A 2 until some critical α value where attractor reconstruction is achieved.Nevertheless, there is some middle ground where both attractors can be successfully reconstructed for a given pair of α and ρ, this common blue area between Figs. 9a-9b are the regions in which we say multifunctionality was achieved.We provide a separate picture in Fig. 9c to explicitly show these regions.Through the same reasoning we show in Fig. 10a and Fig. 10b the regions in the (α, ρ)-plane in which the RC was successfully trained to express multifunctional behaviour for both Case II and Case III.
We see a relatively large common blue region in the (α, ρ)plane for Case I in Fig. 9c.However, for Case II and III the regions of multifunctionality in Figs.10a-10b are relatively much smaller.Case II and III require a higher level of dynamical flexibility from the RC as not only do the chaotic attractors come from different settings of Eq. ( 8) and different systems entirely but also vary characteristically, i.e. single-scroll and double-scroll.A particular setup of the RC may happen to favour reconstructing one flavour of attractor over the other, thus a contributing factor towards the reduced regions of multifunctionality seen here.
Focusing on the prediction of A 1 for ρ = 0.7 as α is increased from 0 in Fig. 9a, we see that before multifunctionality is achieved the prediction repeatedly falls to a fixed point.We plot these fixed points for various values of α in Fig. 11.Also seen here is a case where the prediction switches from A 1 to A 2 when α = 0.2.
We ask, what happens to these fixed points in Fig. 11 as multifunctionality is achieved?Is it the case that the successful reconstruction of A 1 entirely takes the place of these fixed points in the prediction state space or, do these fixed points still exist and we are unable see them?Furthermore, are there other attractors lurking in this prediction state space that we are not immediately aware of?We delve further into these questions in the following section.FIG.
12: Time trace of the predicted x 3 variable, with the RC trained for a given α and ρ as indicated above each plot, starting from the random initial condition ξ i for i = 1, 2, . ...

C. Detecting Untrained Attractors
To get a broader picture of the prediction state space we initialise the RC, trained for a given ρ and α, with many (1000) random initial conditions (RICs) and observe the resultant trajectories.We restrict our study to Case I.
In Fig. 12 we set ρ = 0.7 and train the RC for α = 0.45, 0.50, and 0.55.We plot x ξ i 3 as the trajectories of the predicted x 3 variable starting from the i th RIC.
For α = 0.5 we see that there exists a stable fixed point located within the middle of both reconstructed chaotic attractors.As we begin to apply more weight to the data from A 2 with α = 0.45, we see another stable fixed point appearing closer to the reconstructed A 2 .Similarly for α = 0.55 we find a stable fixed point closer to the reconstructed A 1 .So while the RC was successfully trained to reconstruct the climate of both A 1 and A 2 there are additional attractors populating the prediction state space that were not involved in the training, we call these the 'untrained attractors'.
The results in Fig. 12 show that small changes in α can give rise to a bistability of untrained attractors.This suggests that there are some 'behind-the-scenes' bifurcations taking place in the prediction state space.However, further discussion is needed to consider α as a bifurcation parameter of the RC.
While ρ is a parameter of the RC itself, α is a parameter that is strictly involved in the training procedure.Therefore to consider α as a bifurcation parameter of the RC we need to assess if for a small change in α there is a relatively smooth change in the elements of the W α out matrix generated from a specific combination of α and ρ.In other words, a continuous function which maps (α, ρ) → W α out needs to exist.To expand upon this notion we consider some randomly chosen elements of W α out and observe their evolution with respect to α and ρ.The result of this is shown in Fig. 13 for four randomly chosen elements of W α out .The smoothness of these surface plots indicate that a relatively small change in α or ρ in turn gives rise to a small change in the elements of W α out , to which we generalise, contributes to an overall change in the dynamics of the RC.From this we now consider α as a bifurcation parameter of the RC.
Our study now moves towards tracking the evolution of these untrained attractors and identifying some of these 'behind-the-scenes' bifurcations.

D. Bifurcation Analysis of Untrained Attractors
In this section we track the evolution of these untrained attractors firstly in the (α, x 3 )-plane for a given ρ and then in the (α, ρ)-plane.
We do this by initialising the state of the RC, trained for a certain α and ρ, with one of the corresponding fixed points shown in Fig. 12.We track the evolution of this fixed point with respect to α by repeating the process of incrementally changing α, retraining and initialising the state of the RC with the fixed point corresponding to the previous α.The result of this for ρ = 0.7 is shown in Fig. 14a.
Here we see the extent of the bistabilities found in Fig. 12.We also find branches of fixed points and bistabilities closer to the end points of α.The evolution with respect to α of the previously found fixed point located in the middle of the chaotic attractors is now labelled as the branch FP 1 and the evolution of the fixed points closer to A 2 and A 1 are labelled as FP 2 and FP 3 respectively.We label the evolution of the fixed points closest to α = 0 as the branch FP 4 and those closest to α = 1 as FP 5 .
Giving rise to these bistabilities are the hysteresis cycles created here.As indicated by the arrows in Fig. 14a, when moving along the FP 2 branch, as α is increased there is a point where the state of the RC jumps to the FP 1 branch.After this transition, if α were instead decreased we then remain and continue to track along the FP 1 branch to the point where the state of the RC returns to the FP 2 branch, and so on.
Fig. 14a also demonstrates that when the RC fails to reconstruct A 1 then the predicted trajectory falls onto the FP 4 and FP 2 branches.The black points plotted in Fig. 14a are the fixed points that the predictions of A 1 decay towards in Fig. 11.As these black points line up directly with the branches of FP 4 and FP 2 we conclude that as attractor reconstruction of A 1 is achieved that these fixed points do not suddenly disappear but that their existence is intrinsic to the dynamics of this RC setup.In modes of failure these branches of fixed points provide routes to stability should the predicted trajectory fall off A 1 .This also gives greater insight to the behaviour of the RC at the boundaries of multifunctionality.Furthermore, these untrained attractors have dynamics of their own and interact amongst themselves giving rise to these 'behind-the-scenes' bifurcations.
These stable branches of equilibria would ordinarily be connected by branches of unstable equilibria but given the nature of our approach these cannot be explicitly determined.However, as the end points of these branches are seemingly being drawn together, i.e the right and left end points of both FP 4 and FP 2 and likewise for FP 3 and FP 5 , we infer that this is a signature of the existence of unstable branches of equilibria and evidence that at these end points are saddle-node (SN) bifurcations.To help strengthen this claim, we increase ρ to 0.77 and track the evolution of the fixed points as before.The result of this is shown in Fig. 14b where the right and left end points of FP 4 and FP 2 as well as FP 3 and FP 5 have connected together resulting in two branches of fixed points which we call, FP 2,4 and FP 3,5 .This particular behaviour is indicative of two cusp bifurcations taking place about α ≈ 0.034 and 0.988 as ρ is increased from 0.7 to 0.77.In addition, we see a region of 'tristability' here for 0.4867 ≤ α ≤ 0.5165 where there is an overlap between all three branches.
We continue exploring and characterising the behaviour of these untrained attractors by tracking their evolution in the (α, ρ)-plane.The result of this is depicted in Fig. 15.
Fig. 15 also reveals the existence of several limit cycles.As ρ is decreased from 0.27, the bistability between FP 3 and FP 1,2 is lost at (α, ρ) ≈ (0.5837, 0.2449).As these branches begin to drift apart, we show in Fig. 16 for ρ = 0.21 that a period-1 limit cycle, LC 1 , is born in this gap.Plotted here is the evolution of the maximum and minimum values of LC 1 versus α.As ρ is decreased further, the bistability between the FP 3 and FP 1,2 branches resumes from (α, ρ) ≈ (0.577, 0.1965) resulting in the death of LC 1 .
We also see a relatively small region of tristability between FP 1,2 , FP 7 and FP 4 in Fig. 16.While the left end of FP 1,2 had at one time appeared to be growing towards and potentially connecting to FP 4 in another cusp bifurcation, instead it has broken off due to a cusp bifurcation taking place on FP 1,2 at (α, ρ) ≈ (0.212, 0.087) resulting in a new branch of fixed points FP 7 as seen in Fig. 16.The existence of FP 7 is relatively brief as its own end points are quickly drawn together and disappear entirely at (α, ρ) ≈ (0.062, 0.179) in Fig. 15.This particular behaviour also occurs on FP 3 where we will later provide a more detailed picture and explanation of these events.We also find the beginnings of a branch of stable fixed points emerging from the right hand side of Fig. 16 which we label as FP 6 .Later we will discuss some interesting properties emanating from this branch.
Given the abundant variety of behaviour exhibited by the untrained attractors for large α and small ρ, as seen in the inset plot of Fig. 15, we provide a more expansive picture in Fig. 17 depicting how certain bifurcations arise.
The cusp bifurcation taking place at (α, ρ) = (0.965, 0.185) gives rise to a relatively small tristable region.This cusp bifurcation originates from a buckling of the FP 3 branch where a new branch of stable fixed points emerges, labelled as FP 8 in Fig. 17.
A limit cycle is born at the right end point of FP 3 for ρ ≈ 0.181.In Fig. 17 we plot the evolution of the maximum and minimum x 3 -values of this period-1 limit cycle which we label as LC 2 .Close to this cusp bifurcation we see for ρ = 0.18, that LC 2 is contained within a Hopf-bubble as it exists between two supercritical Hopf bifurcations.However, as ρ is decreased further, the amplitude of oscillation of LC 2 increases and the bubble bursts.We also find that LC 2 undergoes period-doubling (PD) bifurcations, the first of which was found nearby (α, ρ) = (0.959, 0.166).These PD bifurcations attribute to the significant reduction in α-values for which we are able to track LC 2 .Also shown in Fig. 17   rise to a period-4 limit cycle.
The existence of FP 8 is relatively short, as ρ is decreased its end points are drawn closer together until this branch becomes but a single point-like attractor at (α, ρ) ≈ (0.98, 0.137) and we are no longer able to track.This event and that mentioned earlier regarding FP 7 is evidence that further flavours of cusplike bifurcations take place in the prediction state space.A codimension-3 bifurcation analysis (involving either σ , β or γ) could, for example, unveil a swallowtail bifurcation point (the point at which two cusp branches collide).For reading on more cusp-like bifurcations see Kuznetsov 27 or Guckenheimer and Holmes 28 .Alternatively, if swallowtail bifurcation points were found to take place in lower dimensional representations of Eq. ( 6) (for N = 2, 3, . ..), then their existence FIG.17: Behaviour of the Untrained Attractors in the predicted x 3 direction for large α and small ρ could be generalised to the higher dimensional picture.The benefit of reducing the dimension would allow for the use of numerical continuation software like AUTO 29 on Eq. ( 6).Moreover, this approach facilitates the study of unstable attractors and in identifying further dynamical features inherent to Eq. ( 6).We leave this for future work.
As shown in Fig. 17, we find another branch of fixed points which we label as FP 9 .For ρ = 0.135 we see here that at the right end point of FP 9 a limit cycle labelled as LC 4 is born.Increasing ρ results in the loss of LC 4 as the end points of FP 9 are being drawn closer together where eventually at (α, ρ) = (0.989, 0.153) we are no longer able to track.FIG.18: (ρ = 0.107): Evolution of FP 6 as it transition to a limit cycle and then to a chaotic attractor through an infinite sequence of period-doubling bifurcations.Also shown are snapshots of this limit cycle along its route to chaos.
Throughout Fig. 17 we plot the evolution of the previously mentioned branch, FP 6 .As ρ is decreased, a period-1 limit cycle, labelled as LC 3 , is born from the left end point of FP 6 .The brown region in Fig. 15 depicts the coexistence of LC 3 , LC 4 , and FP 5 .However, as we follow the evolution of the local maxima and minima of LC 3 there are certain points in the (α, ρ)-plane where it undergoes a PD bifurcation.Furthermore, there at times at which one PD bifurcation leads to another and in turn triggers a period-doubling cascade (PDC) ultimately resulting in chaotic behaviour.We find two relatively small distinct regions in Fig. 15 (coloured in black and cyan) where PD bifurcations of LC 3 lead to PDCs.An example of this particular sequence of PD bifurcations is shown in Fig. 18 when setting ρ = 0.107.
Starting from α = 1, we see in Fig. 18 how FP 6 evolves as α is decreased.We plot the local maxima and minima of LC 3 sprouting from the left end point of FP 6 for α = 0.9986 and continue to track as α is reduced further.Here we see the first of these PD bifurcations occurring at α ≈ 0.989 where there are now two distinct local maxima and minima of LC 3 .This particular behaviour is depicted below the bifurcation diagram in Fig. 18 with snapshots of LC 3 as it travels along its route to a PDC in the (x 1 , x 3 ) prediction state space.Here, we see how LC 3 transitions from period-1 at α = 0.9895 to period-2 at α = 0.9885.Decreasing α further to 0.9875 we see that there has been an infinite number of PD bifurcations giving rise to a chaotic attractor.Periodic behaviour briefly resumes for decreasing α further before entering another PDC as a second bout of chaos begins at α = 0.98618 and ends at 0.98638.Despite that multifunctionality is not achieved in the rela-tively small regions in which we find these PDCs it is a remarkable result as it shows that when attempting to train the RC to promote a coexistence of two desired chaotic attractors it is also possible for another chaotic attractor to exist in the background.Moreover, for a different choice of the other reservoir parameters, these PDCs could play a larger role in the prediction state space and further influence the RCs ability to reconstruct a given attractor.Throughout our results we have kept the topology of the RC matrices, M, and W in fixed.Given a different initialisation of these matrices we expect that quantitative changes in these bifurcation figures would occur but that the main characteristics would remain.Overall, these results indicate that regardless of the particular attractor one is attempting to reconstruct, there will always be some untrained attractor present in the prediction state space.The long term characterisation of the RCs prediction of a given attractor in Figs.9a-9b combined with the classification of the untrained attractors in Fig. 15 provides us with a clearer picture of the prediction state space for a given α and ρ.When put together we are able to see the regions in which multifunctionality was achieved, the manner in which the prediction fails, and the particular untrained attractor it can tend towards.Moreover, one could argue that attractor reconstruction may fail if the basin of attraction of the desired attractor interferes with one of these untrained attractors.

V. CONCLUSION
In this paper we have demonstrated that a RC can be trained to exhibit multifunctionality whereby, for a given initial condition, the climate of more than one chaotic attractor can be successfully reconstructed in its prediction state space.
In order to train a RC to express multifunctionality we introduce the 'blending technique' as a means to combine and weight data from two different sources.We test the flexibility of this technique by training the RC to reconstruct a coexistence of chaotic attractors from; a system which already exhibits multistability, two parameter settings of a system and, two different systems entirely.
However, in order to achieve the desired outcome, there is a crucial dependence on certain reservoir and training parameters, in particular the 'blending' parameter, α, and the spectral radius of the reservoirs internal connection matrix, ρ.When varying these parameters, we find an abundance of nontrivial transitions between multifunctionality and modes of failure as there is a competition between attractors in the prediction state space of the RC for a given α and ρ.For example, when attempting to reconstruct the climate of a given chaotic attractor, there are times where the prediction can switch to the other.In this case, the RC is able to reconstruct the climate of only one chaotic attractor.This behaviour comes as a consequence of training a RC to reconstruct the climate of more than one attractor.Furthermore, we see that in the event of failure, the predicted trajectory on a given chaotic attractor can tend toward a limit cycle or fixed point.
On closer inspection, we find that even when the RC is successfully trained to express multifunctionality there are addi-tional attractors existing within the prediction state space that were not involved in the training, we call these the 'untrained attractors'.Moreover, it is shown that these untrained attractors have dynamics of their own and play a significant role in the event that the desired attractor cannot be successfully reconstructed.
By tracking the evolution of these untrained attractors with respect to α and ρ we have identified a number of 'behind-thescenes' bifurcations.In particular when tracking the evolution of FP 6 we see in Fig. 18 how this branch of stable fixed points transitions to the limit cycle LC 3 which subsequently undergoes a series of period-doubling bifurcations to the point of provoking a period-doubling cascade inevitably leading towards the creation of a chaotic attractor.So while we try to train the RC to reconstruct a coexistence of two specific chaotic attractors in its prediction state space there is another chaotic attractor created in a sub rosa fashion.
The ability to combine attractors from different sources to then coexist in the same state space broadens the current set of RC applications.This demonstrates that instead of having to change parameters in a system for it to exhibit a different behaviour, the various desired modes of operation can coexist in the state space of a multifunctional RC where an appropriate controller could in theory be designed to switch between attractors 26 .This draws further parallels to biological neural networks, where such a control mechanism is comparable to neuromodulators 30 , a hierarchical system of neurons which some believe to be involved in instigating the switching of activity patterns in multifunctional neural networks.Additionally, we have illustrated that these input sources are not limited to the one system.We show this in Fig. 8b where the RC was successfully trained to permit the coexistence of the Lorenz butterfly attractor, L, and the chaotic attractor A 2 generated from Eq. (8).Naturally the question emerges as to the amount of attractors that can be successfully trained to coexist in the RCs prediction state space, we leave this for future work.
The results of this paper emanate from employing the dyadic 'two-way street' approach of framing neurological features in the context of dynamical systems.Moreover, this work indicates that if other hypotheses or known facets of the brain can be articulated in this manner then there lies the potential to portray it artificially. FIG.

FIG. 9 :FIG. 10 :
FIG. 9: (a)-(b):Long-term behaviour of prediction in the (α, ρ)-plane for Case I.Each colour characterises the attractor the RC eventually settles to starting from a particular IC: Initial Condition.(c): Plotted in blue are the regions of multifunctionality.

24 FIG. 11 :
FIG.11: Case I (ρ = 0.7, IC: A 1 ): x 3 (t) vs. t for A 1 coloured in black and the predicted trajectories for various values of α as indicated in the plot legend.

5 FIG. 13 :
FIG.13:The evolution of four randomly chosen elements of the W α out matrix with respect to changes in α and ρ.

FIG. 16 :
FIG. 16: Tracking Untrained Attractors for ρ = 0.21: Max and min values of limit cycle, LC 1 , born at the right and left end points of the FP 1,2 and FP 3 branches.Also seen are the FP 4 , FP 5 , FP 6 and FP 7 , branches of stable fixed points.
S out ) vs. ρ when using the task specific matrices to reconstruct the attractor S from Case I, II and III.more accurate when using W α out over W S out with the opposite being said if ε S out Case III : θL W L out FIG. 6: θ S (W