Voltage-controlled superparamagnetic ensembles for low-power reservoir computing

We propose thermally-driven, voltage-controlled superparamagnetic ensembles as low-energy platforms for hardware-based reservoir computing. In the proposed devices, thermal noise is used to drive the ensembles’ magnetization dynamics, while control of their net magnetization states is provided by strain-mediated voltage inputs. Using an ensemble of CoFeB nanodots as an example, we use analytical models and micromagnetic simulations to demonstrate how such a device can function as a reservoir and perform two benchmark machine learning tasks (spoken digit recognition and chaotic time series prediction) with competitive performance. Our results indicate robust performance on timescales from microseconds to milliseconds, potentially allowing such a reservoir to be tuned to perform a wide range of real-time tasks, from decision making in driverless cars (fast) to speech recognition (slow). The low energy consumption expected for such a device makes it an ideal candidate for use in edge computing applications that require low latency and power. Artiﬁcial intelligence, and in particular machine learn- ing (ML) techniques, are widely used across a variety of sectors, but the associated energy cost of training highly complex models has become a signiﬁcant challenge 1 . This has led to a desire to produce eﬃcient, hardware- based neuromorphic platforms to replace the current state of the art i.e. software-based models evaluated on traditional CMOS computing 2 .

Artificial intelligence, and in particular machine learning (ML) techniques, are widely used across a variety of sectors, but the associated energy cost of training highly complex models has become a significant challenge 1 . This has led to a desire to produce efficient, hardwarebased neuromorphic platforms to replace the current state of the art i.e. software-based models evaluated on traditional CMOS computing architectures 2 .
Recurrent neural networks (RNN) are inspired by the high interconnectivity of biological systems and can solve complex time-dependent tasks, such as speech recognition. However, their temporal interconnectivity requires complex training methods that are computationally expensive and difficult to implement on hardware 3 . The reservoir computing (RC) paradigm provides a solution to this by using a RNN with fixed synaptic weights (the reservoir) to transform inputs into a higher dimensional representation prior to them being passed to a simple, feed-forward output layer with trainable weights 4 . Time multiplexing allows RC to be performed even if the reservoir consists of only a single dynamical node 5 .
RC is also particularly well suited to hardware-based implementations as the reservoir can be replaced with any suitable physical system providing it has the correct properties: (a) the ability to produce nonlinear transformations of input data and (b) fading memory of past inputs. Such implementations have the potential for lower energy costs than reservoirs simulated on conventional computers as they can directly utilise the intrinsic functionalities of the physical systems. This has led to a wide range of systems being proposed as suitable a) a.welbourne@sheffield.ac.uk reservoirs 6 . In particular, magnetic systems exhibit nonvolatility and nonlinearity, and thus are positioned as ideal candidates 7 . Previous investigations have demonstrated the suitability of a wide range of magnetic systems for use as reservoirs, including spin-torque nanooscillators, dipole-coupled nanomagnets, and arrays of interconnected nanowires [8][9][10][11][12][13][14] .
Here, we propose strain-mediated, voltage-controlled superparamagnetic ensembles as platforms for creating ultra-low-energy hardware-based reservoirs. Using a combination of micromagnetic simulations and analytical modelling, we will show that these ensembles possess the reproducible behaviour, non-linearity and fading memory required for RC and demonstrate (using a model of the system) competitive performance on two benchmark machine learning tasks: spoken digit recognition and chaotic time series prediction.
The modelled systems consist of ensembles of circular CoFeB nanodots with diameters in the range 40 nm to 90 nm and thicknesses of 4 nm ( Fig. 1(a)). The dots' diameter is fixed within a single ensemble, but different values allow access to different timescales of the ensembles' responses. We assume a large ensemble (over 1000 dots, or a 6.3 µm square array) with nanodots spaced by 160 nm to minimise dipolar coupling. The dots are assumed to have saturation magnetisation M s = 1200 kA m −1 , exchange stiffness A ex = 10 pJ m −1 and a growth or shape-induced 15,16 uniaxial anisotropy A schematic diagram illustrating the geometry of an individual dot is shown in Fig. 1(b). A fixed magnetic field (40 % of the anisotropy field) is applied at 90 • to the anisotropy axis. Magnetoelastic control of the arrays' magnetic states is provided by voltage-controlled uniaxial strain applied at 45 • to the anisotropy axis (an This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset.  ultra-lower-power consumption technique 17 ). We base our parameters for this on the artificial multiferroic heterostructure PMN-PT (011)/CoFeB where strains of up to 0.175 % can be applied, leading to a strain anisotropy of comparable magnitude to typical growth-induced intrinsic anisotropies 18,19 . Input is via voltage applied perpendicularly across the PMN-PT layer. An output derived from, for example, giant magnetoresistance would be proportional to the magnetisation of the array (see Fig. 1(a)).
To model strain-mediated voltage control of the CoFeB nanodots, we express their energy using the Stoner-Wohlfarth model 20,21 . The application of strain to isotropic, amorphous CoFeB results in the addition of a uniaxial magnetoelastic anisotropy term with mag-nitude K σ = 3/2λ s Eǫ, where λ s is the magnetostriction coefficient (31-75 ppm 19,22 ), E the Young's modulus (216 GPa 23 ), and ǫ the voltage-induced strain 24 . When combined with an intrinsic uniaxial anisotropy (K) the normalized energy per unit volume e(θ) = E(θ)/KV is then: where θ is the angle of the magnetization to the intrinsic anisotropy axis, is a fixed applied field, and φ, ρ are the angles of the strain anisotropy axis and field direction to the intrinsic anisotropy respectively. The two anisotropies can be combined into a single term 25 (see supplementary), resulting in a anisotropy axis that is rotated by chang-This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset.

PLEASE CITE THIS ARTICLE AS DOI: 10.1063/5.0048911
ing the magnitude of K σ through application of strain ( Fig. 1(b)). Fig. 1(c) shows how the energy barriers of the system evolve under tensile and compressive strains. Here, φ = 45 • , ρ = 90 • , h = 0.4 and k σ = ±0.2. The strain changes the energy profile such that the energy barriers are no longer equal for the two states. The superparamagnet's magnetization exhibits stochastic behavior, switching between the minimum energy states with a characteristic dwell time controlled by these energy barriers. Using the Néel-Arrhenius law, the switching rate from state i (magnetization right) to j (magnetization left) is: (2) where f ij 0 is the attempt frequency (typically 10 9 -10 11 s −1 26 ), k B the Boltzmann constant, E ij b is the energy barrier from state i to j, e ij b the reduced energy, and β ′ = KV /k B T the scaled thermal energy. The thermal switching of superparamagnets has previously been used for random number generation 27,28 , adiabatic computing algorithms 29 , and encoding nonlinear functions 30 .
This stochastic switching is illustrated in Fig. 1(d), which presents micromagnetic simulations (T = 300 K) of thermal switching between two ground states in a 40 nm diameter CoFeB cylinder performed using the MuMax3 31,32 software package. Changing the strain applied manipulates the dwell time of the magnetisation in the two states: tensile(compressive) strain results in an increased dwell time in the right(left) magnetization state. Fig. 1(e), plots the probability of not switching as a function of time in a state for the tensile strain case. The decay rate of these exponential distributions gives the dwell time and switching rate for the state. Fig. 1(f) compares rates extracted from the micromagnetic simulations (points) with those derived from energy barriers defined by the macrospin model (Eq. 1). A value of f 0 = 0.5 × 10 9 Hz is selected such that the values of w ij at zero applied strain match between the two models. The macrospin model agrees reasonably with the micromagnetic simulations, however, the data match less well at higher magnitudes of strain. The deviation occurs for values of k σ (high) and β ′ (low) where the assumption of Néel-Arrhenius like switching between two states approaches the limit of its validity (E b ∼ 3k b T ) 33 . We do not calculate micromagnetic rates for all values of β ′ as the run time required for the simulations scales exponentially. However, we expect the agreement is as good or better for larger β ′ where the assumption E b ≫ 3k b T is always met. The macrospin model is, therefore, used to model the CoFeB dots throughout the reminder of this paper.
We now consider how the stochastic behaviour of single nanodots manifests in an extended ensemble as predictable collective behavior. The average magnetization obeys simple rules that, when subject to a temporally varying input in the form of a globally applied strain, produce a reproducible, complex, nonlinear response with fading memory, thus fulfilling the basic requirements of a hardware-based reservoir.
We model the extended ensemble as a two-state system governed by the master equation 33 : where p i is the normalized population in state i such that p 1 + p 2 = 1, and ω ij is the transition rate as defined in Eq. 2. This model does not take into account the error introduced by a finite system size. However, it is appropriate for systems with a large number of nanodots (as we assume here) since the error scales as 1/ √ n. Integrating Eq. 3 gives the time-dependent probability for fixed transition rates: where ω = ω 21 + ω 12 . Given our system with a fixed applied field, the reduced magnetization (unit length) along the x direction (magnetization right) can be written: where δ i refers to the slight rotation of the energy minimum away from the x axis due to the fixed field. Fig. 2(a) plots the equilibrium (t → ∞) response of reduced magnetization to an applied strain anisotropy (k σ ). The response is nonlinear and While the exact form depends on the scaled thermal energy (β ′ ), it maintains sigmoidal like features that sharpen with increasing KV /k B T . Importantly, the curve remains nonlinear as required for reservoir computing. Fig. 2(b) illustrates the complex response of the reduced magnetisation to a temporal input sequence. Section A, where a single input is turned on and held, demonstrates excitation of the magnetisation away from zero net magnetisation. Section B, where no input is present, demonstrates decay behavior back towards zero net magnetisation. In all other sections, applied strain inputs are held for a duration less than the zero-input base time (T 0 ) of the system, and thus the system response exhibits both excitation and decay terms. The zero-input timescale scales as T 0 ∝ exp (0.36β ′ ), a variation from nanoseconds to milliseconds for β ′ = 10 to 50, which allows a reservoir to be designed with a timescale to match a particular real-time task (see supplementary for further details). The sharp transitions upon changes of input are due to the slight rotations of the minimum energy states within single dots (change in δ i ) and are expected to occur on a timescale much faster than the thermally driven response of the ensemble. Since p i (t) depends on p i (t − ∆t), the system exhibits a fading memory: the response of the reservoir depends on the current and previous inputs.
This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset. Together, the reproducible response, nonlinearity, and fading memory of the superparamagnetic ensembles suggest they have the correct properties to be used as hardware reservoirs. Furthermore, as discussed by Hu et al. 17 the estimated power consumption for magnetoelastically driven MRAM is 160 aJ per bit written (a rotation of 90 • ). Our system uses ∼1000 such bits, but less than 15 % of the strain required for a 90 • rotation; this gives an approximate estimate of 24 fJ per input. This compares favorably to the write energy per bit of STT-RAM of 100 fJ 17 . While this figure is not precise, it indicates the potential for ultra-low-energy consumption. Furthermore, Safranski et al. 34 have verified that uncorrelated thermally-driven switching can be observed in CoFeB MTJs for dwell times of 5 ns and above, which we can take as a floor for the operational speed of these nanodots. Longer dwell times can easily be induced by scaling up the size of the magnetic elements, thus increas-ing the energy barrier between states. Therefore, the timescale of the ensembles' responses could be easily engineered across many orders of magnitude to allow them to perform a wide range of real-time tasks.

PLEASE CITE THIS ARTICLE AS
Having established that strain-mediated, voltagecontrolled superparamagnetic ensembles possess the qualities required for RC, we now test our system's performance against two common ML tasks. We treat the nanodot array as a single dynamical node with complex internal behavior and thus adopt the approach described by Appeltant et al. 5 (Fig. 3(a)-(d)).
The RC process is as follows. A raw input signal with N in dimensions is discretized such that S(nδt) = S n giving a 2d matrix of size N samples by N in . For the training part, each input sample has a corresponding desired output y n of dimension N out , forming a matrix of N samples by N out . A mask matrix is used to project a random combination of the input dimensions on to each of the N v virtual neurons and is constant for all samples of the input. For a 1d input signal, such as the NARMA10 task ( Fig. 3(b)), the random nature of the mask is designed to stimulate a range of dynamics. For a sample index n and virtual neuron i the total input voltage is: where ∆v represents a scaling factor to give an input voltage into the reservoir, γ is the feedback strength and x (n−1)i the output value of the reservoir for virtual neuron i but for sample n − 1. This matrix of input voltages is serialized as 1d time series with each virtual node presented in order before the applying the next input sample. This is demonstrated in Fig. 3(c), which shows the masked input, masked input with added feedback term, and finally the raw output of the reservoir at the transition point between samples 56 and 57. The masked input voltage for each virtual neuron is held constant for the neuron duration θ, thus the reservoir time between each input sample is ζ = N v θ. The raw output of the reservoir is measured at the end of each virtual neuron duration to give a transformed value, which for the nth input sample and virtual neuron i is The aim of RC is that transformed values from e.g. different input classes are now linearly separable by a hyperplane due to the higher dimensional transformation. Thus, a set of linear output weights are used to classify the output. The predicted output of sample n and output dimension k isỹ where W kj is the output weight matrix (Fig. 3(d)). A constant bias term is included as the j = 0 element of This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset. the weights with an extra column in x as x n0 = 1 for all n. The output weights are found by using ridge (or Tikhonov) regression. 35 . This gives a direct solution for the weights as

PLEASE CITE THIS ARTICLE AS
where λ is the L2-norm regularisation hyper-parameter and I the identity matrix. λ is found through a grid search to find the lowest mean-square error which is sampled using 5-fold cross validation of the training data set for each task to prevent overfitting. We use two common machine learning tasks as proof of principle of the reservoir's operation: spoken digit recognition (TI-46 dataset) and chaotic time series prediction (NARMA10). These require a reservoir with good nonlinearity and memory respectively. Performance heat maps for the spoken digit task are shown in Fig. 3(e).
The task requires the recognition of the spoken digits 0-9 by five female speakers from the NIST TI-46 dataset 36 . Preprocessing separates the data into 13 frequency bands using a MFCC filter 37 before it is masked. Training and validation was performed on 100 utterances per speaker, testing on 160. The input rate (θ/T 0 ) and the scaling of the input (maximum k σ ) are varied and the error rate for the test dataset is plotted. 50 virtual nodes was found to give good performance. The left hand heat map is for an ensemble with a base timescale T 0 = 1.4 µs (KV /k b T = 20, nanodot diameter 57 nm), the right hand heat map for T 0 = 66 ms (KV /k b T = 50, nanodot diameter 90 nm). In both cases, an error rate of 5 % can be achieved, showing that this reservoir can perform well at this task across a range of timescales. To provide a baseline, the error rate when the reservoir is removed and the masked input is treated as the output is 23 %. The relatively high base level is due to the inherent nonlin-This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset.
PLEASE CITE THIS ARTICLE AS DOI: 10.1063/5.0048911 earity in the preprocessing discussed by Abreu Araujo et al. 38 . However, the important demonstration here is the increase in performance with the reservoir, which is comparable to other hardware based reservoirs 5,8,38 . The complexity of the maps, while not discussed in further detail here, may be due in part to coupling between input strength and internal timescale (see supplementary). The goal of the chaotic time series task (NARMA10) is to predict the response to white noise (random numbers drawn from [0, 0.5]) of a discrete-time tenth order nonlinear auto-regressive moving average 39,40 : y n = y n−1 0.3 + 0.05 10 k=1 y n−k + 1.5S n−1 S n−10 + 0.1.
The task for the network is to predict the NARMA10 output y n given the input S n . We can characterize the success of the network via the metric, Normalized Root Mean Squared Error (NRMSE): where y n is the true output (Eq. 10) andỹ n the predicted output from the network (Eq. 8). For this task, we make use of the delay line feedback (Fig. 3(a)) similar to the Mackey-Glass reservoir discussed by Appeltant et al. 5 . This delay line (an addition of the output at time t − ζ to the input at time t) serves to boost the memory capacity of the system; an important requirement for the NARMA10 task. Fig. 3 Fig. 3(d) demonstrates a typical predicted and true signal for this level of NRMSE. As with the speech recognition task, we can compare to the value achieved when the reservoir is removed and the masked input taken as the output to be trained: a NRMSE ≈ 1. Using a Mackey-Glass reservoir (with optimisation parameters based on those in Appeltant et al. 5 ) a value of 0.50 was achieved.
It should be noted that this value differs from that given by Appeltant; this is believed to be due to a differing definition of NRMSE. While we have demonstrated the feasibility of this system for machine learning, we now address two of the key potential challenges in realising it as an experimental device. Firstly, reproducibility of the magnetic behaviour. Whether due to deviations from the Stoner-Wohlfarth model, small pertubations from long-range magnetostatic coupling, or variation in PMN-PT response from nanodot to nanodot, the magnetic response curve may differ from that presented in Fig. 2. However, one of the chief advantages of the system envisaged is that the behaviour does not have to be realised exactly in order to produce the behaviour required for machine learning, as long as the response of the system is nonlinear, reproducible and has fading memory. As the output is derived from an average across an ensemble, slight changes in behaviour from nanodot to nanodot are averaged out across the whole array. This should give the device high tolerance to small deviations. This is in contrast to typical magnetic logic and memory devices where slight device to device variations can cause significant problems with reproducibility. We have also demonstrated, in Fig. 3(e), that high performance can be achieved for systems with differing magnetization response curves. This leads to the second consideration: how reproducible the device will be run-to-run. The reproducibility of the signal will largely depend on the size of the array, with the error scaling as 1/ √ n. By increasing the size of the array, the error can be made arbitrarily small. It will be important to determine how large an error can be tolerated, and this will be an avenue of future research. We anticipate that the threshold of the noise tolerance will be problem specific. The tolerance of the system to changes in magnetic response and the increase in reproducibility due to the aggregate nature of the device should help to mitigate the key challenges of realizing a physical device.
In conclusion, we have proposed strain-mediated voltage-controlled superparamagnetic ensembles as platforms for hardware-based reservoir computing. We have demonstrated competitive outcomes on two machine learning benchmark tasks (spoken digit recognition and chaotic time series prediction) as well as explaining the properties of the ensemble which give rise to their ability to act as a reservoir. By tuning internal parameters (input rate, input scaling, and feedback strength), this performance can be delivered across a range of timescales, from µs to ms. This would allow a physical realization of such a reservoir to be tuned to provide computation in real time for a wide range of possible physical inputs: from decision-making in driverless cars (fast) to speech recognition (slow). The simplicity of the system, coupled with the low energy consumption for such a voltagecontrolled, thermally-driven device, makes it an ideal candidate for use in edge computing applications where high performance is needed at low latency and power.

SUPPLEMENTARY MATERIAL
See the supplementary material for a discussion of the full form of the anisotropy, and the tunable behaviour of the timescales. This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset.
PLEASE CITE THIS ARTICLE AS DOI: 10.1063/5.0048911