Comparing domain wall synapse with other Non Volatile Memory devices for on-chip learning in Analog Hardware Neural Network

Resistive Random Access Memory (RRAM) and Phase Change Memory (PCM) devices have been popularly used as synapses in crossbar array based analog Neural Network (NN) circuit to achieve more energy and time efficient data classification compared to conventional computers. Here we demonstrate the advantages of recently proposed spin orbit torque driven Domain Wall (DW) device as synapse compared to the RRAM and PCM devices with respect to on-chip learning (training in hardware) in such NN. Synaptic characteristic of DW synapse, obtained by us from micromagnetic modeling, turns out to be much more linear and symmetric (between positive and negative update) than that of RRAM and PCM synapse. This makes design of peripheral analog circuits for on-chip learning much easier in DW synapse based NN compared to that for RRAM and PCM synapses. We next incorporate the DW synapse as a Verilog-A model in the crossbar array based NN circuit we design on SPICE circuit simulator. Successful on-chip learning is demonstrated through SPICE simulations on the popular Fisher's Iris dataset. Time and energy required for learning turn out to be orders of magnitude lower for DW synapse based NN circuit compared to that for RRAM and PCM synapse based NN circuits.


Introduction
Crossbar array based analog hardware Neural Network (NN) is considered to be extremely time and energy efficient in executing NN algorithms for data classification applications because it computes at the location of memory itself unlike CPU, GPU and even the recent digital neuromorphic chips which all have memory and computing separate at their smallest cores [1,2,3,4,5,6]. Such crossbar based NN needs an analog Non Volatile Memory (NVM) device, also known as synapse, at each of the intersection points of the crossbars. Typically a Resistive Random Access Memory (RRAM) or a Phase Change Memory (PCM) device is used as synapse [1,7,8,9,10]. Training the NN in hardware (on-chip learning) is achieved by modulating the conductances of the synapses, corresponding to weights stored in synapses, with electrical programming pulses at every iteration. Though the conductance of RRAM and PCM synapses changes by orders of magnitude due to programming pulses, conductance response characteristic is highly non-linear and asymmetric (between positive and negative conductance update) [1,11,12,13]. This leads to issues with design of peripheral circuits for on-chip learning. Learning accuracy suffers. Time and energy consumed in the learning process are also very high [1,11,12,13,9].
Spin orbit torque driven Domain Wall (DW) device based on heavy metal-ferromagnet hetero-structure has been recently proposed and experimentally demonstrated to exhibit synaptic behaviour [3,14,15,16,17,18,19,20]. In Section II of this paper, we simulate such DW synapse based on experimentally calibrated micromagnetic model. We show that though the range of conductance variation is much smaller for DW synapse than RRAM and PCM synapse, the conductance response of DW synapse to programming current pulse is linear and symmetric unlike RRAM and PCM synapse. In Section III, we design crossbar array of DW synapses in SPICE circuit simulator, with the synapses being Verilog-A models developed from our micromagnetic simulation results. Fully Connected Neural Network (FCNN) algorithm, with Stochastic Gradient Descent (SGD) based weight/ conductance update, has been used here for on-chip learning [17,21]. Conductance of DW synapse has been quantized here unlike in Bhowmik et al. [17] to take the effect of DW pinning by defects into account [22,23,24]. Despite the quantization, high accuracy is obtained on a popular machine learning dataset-Fisher's Iris [25], in our circuit simulations. We next show that the time taken and energy consumed for on-chip learning of the DW synapse based NN circuit are orders of magnitude lower than RRAM and PCM synapse based NN circuit. Section IV concludes the paper. To the best of our knowledge, this is the first comparison study between a spintronic synapse and RRAM/ PCM synapse, with respect to on-chip learning in NN hardware.

Device Level Comparison
Schematic of our heavy metal/ ferromagnetic metal hetero-structure based domain wall based synapse is shown in Fig. 1. The operating physics of the device has been discussed extensively in [3,14,15,17]. The core physics is that of spin orbit torque driven DW motion, which has been extensively studied through simulations and experiments in the past [26,27,28,29]. When in-plane current ("write" current) flows through the heavy metal layer ("write" path), a DW in the ferromagnetic layer above the it experiences spin orbit torque. If the DW is of Neel type due to presence of Dzyaloshinskii Moriya Interaction (DMI) [26,27,30,31] at the interface, it moves even in the absence of magnetic field,as observed in several experiments [26,27,30] and also our micromagnetic simulations (Fig. 2).
In this paper, we consider a device with lateral dimensions 1000 nm × 50 nm. Thickness of the heavy metal (Pt) layer is taken to be 10 nm, which is greater than the spin diffusion length. Hence, we can consider the vertical spin current density injected by the heavy metal layer on the ferromagnetic layer above it (J s ) = in-plane charge current density (J c ) × spin Hall angle (0.07 here, considering Pt) [37,38,39]. Thickness of the ferromagnetic layer above the heavy metal layer is taken to be 1 nm. Dynamics of the moments of this layer under the influence of this spin current is simulated using micromagnetic simulation package "mumax3" [36] to model such spin current driven DW motion inside it. We choose micro-magnetic simulation parameters for the ferromagnetic layer based on that used for Pt(heavy metal)/ CoFe (ferromagnet)/ MgO devices in the simulation study of Emori et al. [30], which is based on experimentally observed spin orbit torque driven DW motion in the same devices. The parameters can also be found in Supplementary Material (Section 1) accompanying this paper.
Since the DW is of Neel type (DMI = 1.2 × 10 −3 J/m 2 ), average magnetization inside the wall ( M avg ) and direction of spin polarization of the electrons at the interface of heavy metal and ferromagnet due to current flowing through heavy metal ( σ) form a non-zero cross product (Fig. 2). The effective magnetic field experienced by DW is equal to that cross-product [26,28,30]. As a result, DW moves as seen in our micro-magnetic simulations (Fig. 2). Triangular notch regions with Perpendicular Magnetic Anisotropy (PMA) constant = 9 × 10 5 J/m 3 are present on the edges of the simulated ferromagnetic layer in our simulations. PMA in rest of the layer = 8 × 10 5 J/m 3 . These notch regions mimic defects, which pin the domain wall for in-plane charge current lower than a certain threshold value [15,22,23,24,31]. Hence, our micro-magnetic simulation Fig.( 3(a)) shows that only above a certain threshold value of current density (≈ 5 × 10 6 A/cm 2 ), velocity of the domain wall is linearly proportional to the current density [32]. Hence in our device we have only moved the domain wall with a current pulse (3 ns long) of fixed magnitude (25 µA) (Fig. 2),corresponding to a current density of 5×10 6 A/cm 2 (Fig. 2) so that the domain wall is never pinned by defects [22,23,24]. Pinned ferromagnetic regions are present at  each edge of the free layer to stabilize the DW at the edge and prevent it from getting destroyed [3,14,33].
Following [14] conductance of the "read" path (vertical tunnel junction structure in Fig. 1) of the synapse is given by, where < m z > represents the average out of plane magnetization component of the free layer (< m z > = 1 corresponds to up and < m z > = -1 corresponds to down), G max is maximum conductance of MTJ and G min is minimum conductance of MTJ. Taking the Resistance-Area product of the MTJ to be [34] 4.04 × 10 −12 ohm/m 2 and TMR ratio of 120 % [35], G min ≈ 2.9 × 10 −3 mho and G max ≈ 6.1 × 10 −3 mho. Moment of the fixed layer is in down direction ( Fig. 1). As observed from our micromagnetic simulation, "write" current pulse of magnitude 25 µA and positive polarity always moves DW to the right by a fixed distance of ≈ 20 nm (  2). Hence < m z > decreases and following equation (1) conductance increases by a step of 0.071 × 10 −3 mho ( Fig. 3(b)). Current pulse of same magnitude and negative polarity moves DW to the left, < m z > increases and conductance decreases by the same step of 0.071 × 10 −3 mho ( Fig. 3(b)). Hence conductance response to a series of programming "write" current pulses of equal magnitude (25 µA) is linear and is also symmetric between positive and negative pulses. Also conductance of DW synapse and hence the corresponding weight of the synapse only takes quantized values and thus we take defect pinning into account. Energy consumed through Joule heating per programming pulse of 25 µA for conductance increase/ decrease by a single step is calculated to be 0.18 fJ (Table I).
Next we compare the conductance response of this DW synapse with that of typical RRAM and PCM synapse. Verilog-A model provided by [40], experimentally benchmarked against [41], has been used for RRAM modeling. Following the 1T1M (one transistor, one memristor) configuration [42,43,44] we connect this RRAM device with a 65 nm technology node transistor (from UMC library) in Cadence Virtuoso circuit simulator ( Fig. 4(a)). We observe that when gate voltage pulses of fixed magnitude and duration (200 ns) are applied at the gate of the transistor for conductance increase (voltage of top electrode kept higher than that of bottom electrode for that purpose) (Check Supplementary Material-Section Material-Section 2). To achieve a linear increase in conductance gate voltage pulses of increasing magnitude (SET pulses) need to be applied ( Fig. 4(b)). This has been observed experimentally in the RRAM devices of [42,44,45]. Thus, though the conductance varies over a much wider range for RRAM synapse than DW synapse (Table I), the conductance response is inherently non-linear in nature. As a result, if a certain value of weight update is needed for any synapse for an iteration during on-chip, different magnitude of voltage pulses may need to be applied to bring about the same weight update, depending on what weight/ conductance value of the RRAM synapse is before that iteration. This makes designing the analog peripheral circuit for weight update very complicated. In fact, the demonstrations of on-chip learning in RRAM based crossbar NN array so far use a digital FPGA unit or an on-chip CMOS based digital processor, connected to the analog crossbar array, for weight update [44,46]. ADC-s and DAC-s needed as a result, which can potentially consume a lot of energy and slow down the circuit. Energy consumed in the 1T1M circuit of Fig. 4(a) ranges between 12 pJ (minimum gate voltage) and 51 pJ (maximum gate voltage), which is much larger than energy consumed for weight/ conductance update by a single step in a domain wall synapse ( Fig. 3(b)) ( Table I). Apart from non-linearity, another issue with conductance response of RRAM synapse is asymmetry between positive and negative update of conductance. If we apply the same gate voltage pulses as in Fig. 4(b) in the reverse order in order to decrease the conductance of the synapse (bottom electrode at higher voltage than top for that purpose), we see that the conductance hardly decreases (Check Supplementary Material-Section 2). Rather in order to decrease conductance by a certain step, long duration (6 µs) and high magnitude (2.5 V) voltage pulse (hence high energy consuming), known as RESET pulse, needs to be applied at the gate of the transistor for abrupt conductance decrease to the smallest value. It is followed by pulses of gradually increasing voltage pulses (SET pulses) to then increase the conductance. Conductance response characteristic of PCM synapse we simulated, based on model developed in Nandakumar et al. [48] (See Supplementary Material-Section 3 for more details), is more linear than RRAM i.e. programming current pulse of fixed magnitude 90 µA and duration 50 ns increase the conductance of the PCM synapse fairly linearly for a larger number of pulses ( ≈ 12) (Fig. 4(c)). Energy associated with each such pulse is 5 pJ [48,2], still much higher than that for domain wall synapse (Table I). Conductance decrease on the other hand is carried out by an abrupt RESET pulse that consumes 30pJ energy each [2], followed by a series of SET pulses much like RRAM synapse. Thus the conductance response characteristic of PCM synapse is still asymmetric like RRAM synapse.

Network Level Comparison
Next we design crossbar array based Fully Connected Neural Network (FCNN) with domain wall synapses [17] and compare the energy and speed performance for on-chip learning with that for equivalent FCNN designed with RRAM and PCM synapses. It is to be noted that this NN is of the second generation non-spiking type [49] and uses standard Stochastic Gradient Descent (SGD) algorithm for weight update [21]. Verilog-A model of domain wall synapse is designed, based on its conductance response obtained from micromagnetic physics as shown in Fig. 3(b)) and inserted in crossbar schematic designed on Cadence Virtuoso circuit simulator (Fig. 5).
Fisher's Iris dataset, a popular machine learning dataset, is used for the training [25]. Since the dataset is not completely linearly separable, in order to carry out accurate classification on it with a FCNN without a hidden layer which we design here, the 4 input features corresponding to each sample are passed through some basic filters first to convert to 16 features [50]. Input voltages, proportional to these 16 input features, are applied on the crossbar as shown in (Fig. 5 ). Read currents, proportional to product of weight of the synapse and each input feature, add up following Kirchhoff's current law and enter the neuron/ activation function circuit at each output node. Thus the input Vectorweight Matrix Multiplication (VMM) is carried out in the crossbar array. [1,17]."tanh" neuron/activation function (f ) acts on the read current at each output node. This function has been designed with transistors in differential amplifier configuration, as shown in Bhowmik et al. [17]. A weight update circuit follows which calculates the common part of weight update at each output node, using the same SGD method and circuit described in Bhowmik et al. [17]. The common part of weight update computed at each output node is next multiplied with the inputs using the multiplier circuit (x) as shown in (Fig. 5). In Bhowmik et al. [17], write current proportional to the output of the multiplier (x) at each synapse acts on the DW synapse and updates its weight. However, since conductance of the DW synapse here takes only quantized values and is updated by write current pulses of fixed magnitude (25 µA) only ( Fig. 3(b)), an additional quantizer circuit (Q) is present after the multiplier circuit here unlike in Bhowmik et al. [17]. Design and typical output of the quantizer circuit can be found in Supplementary Material (Section 4), accompanying this paper. Despite the fact that conductance and hence weight of each synapse takes only quantized value, on-chip learning is achieved with 89 % train and 92 % test accuracy on the Fisher's Iris dataset (Table I). Test accuracy turns out to be slightly higher than train accuracy because the number of samples available in the dataset is low (100 train, 50 test), so a correct or wrong result just with respect to 1 or 2 samples changes the accuracy number by a few percent. Similar crossbar based FCNN is designed next with RRAM and PCM synapses, with conductance response as shown in Fig. 4. Similar accuracy for on-chip learning is achieved on Fisher's Iris dataset (Table I). However, net energy consumed in the synapses for on-chip learning is several orders of magnitude higher for RRAM/PCM synapse than DW synapse (Table I). This is expected because we already showed in Section II that energy consumed for each programming pulse that causes increase of conductance by a step (SET pulse) is orders of magnitude higher for RRAM/PCM synapse than DW synapse. Also, high energy consuming RESET pulses are still needed even though the need for decreasing conductance of a synapse is reduced by using 2 RRAM or 2 PCM per synapse [9,44] (Check Supplementary Material-Section 5). Also, training takes much longer for RRAM/ PCM synapse based FCNN compared to DW synapse based FCNN because of the need of occasional long duration RESET pulses (in microseconds). Since at each iteration (each sample in the training set) weights of all synapses need to be updated simultaneously, even if one synapse needs a RESET pulse of microsecond duration, time needed to carry out that iteration is in microseconds. Since DW synapse does not have this issue, time taken for each iteration during learning is 3 ns only (duration of each programming pulse for DW synapse in Fig. 3

Conclusion
Thus in this paper we have shown through device and network level simulations that onchip learning in DW synapse based NN circuit can consume much less time and energy than RRAM and PCM synapse based NN circuit.

Simulation of domain wall synapse
The lateral dimensions of the device are taken to be 1000 nm × 50 nm. Thickness of the ferromagnetic layer above the heavy metal layer is 1 nm. Thickness of heavy metal layer is 10 nm.
Triangular notch regions with Perpendicular Magnetic Anisotropy (PMA) constant = 9 × 10 5 J/m 3 are present on the edges of the simulated ferromagnetic layer in order to mimic defects, which can pin the domain wall if the driving current pulse magnitude is below a certain threshold.
Verilog-A model provided by [40], experimentally benchmarked against [41], has been used for RRAM modeling. Following the 1T1M (one transistor,one memristor) configuration [42,43,44] we connect this RRAM device with a 65 nm technology node transistor (from UMC library) in Cadence Virtuoso circuit simulator.

Simulation of RRAM synapse
Scheme for conductance increase-1. Voltage at the Top Electrode is 2V and Bottom Electrode is 0V, to enable conductance increase. When gate voltage pulses of fixed magnitude and duration (200 ns) are applied at the gate of the transistor , conductance increases and then saturates to a fixed value. When gate voltage pulses of fixed but larger magnitude are applied, conductance saturates to a higher final value ( Fig. 6(a) of Supplementary Material).
2. Voltage at the Top Electrode is again 2V and Bottom Electrode is again 0V, to enable conductance increase. If gate voltage pulses of fixed duration (200 ns) but increasing magnitude are applied at the gate of the transistor, the conductance goes up linearly ( Fig.  4(b) of main manuscript).
Scheme for conductance decrease-1. Voltage at Top Electrode = 0 V and voltage at Bottom Electrode = 4.8 V, to enable conductance decrease. If we apply the same gate voltage pulses as in Fig. 4(b) of main manuscript in the reverse order in order to decrease the conductance of the synapse, we see that the conductance hardly decreases (Fig. 6(b) of Supplementary Material).
2. With voltage at Top Electrode being 0 V and Bottom Electrode being 3.5 V, when a long duration (6 µs) and high magnitude (2.5 V) voltage pulse, known as RESET pulse, is applied at the gate of the transistor, conductance abruptly decreases from maximum to minimum value.

Simulation of PCM synapse
To model the conductance response of the Phase Change Memory (PCM) synapse, the model developed in [48], based on the experiments conducted on Ge 2 Sb 2 Te 5 devices, has been used by us. By averaging the different conductance response curves generated by the model due to the stochasticity inherent in it, we have obtained the conductance response of the PCM synapse as shown in Fig. 4(c) of main manuscript.
Scheme for conductance increase-The conductance increase characteristic is found to be more linear for PCM synapse than the case of RRAM synapse. For RRAM synapse, applying programming pulse of same magnitude led to saturation of conductance within first ≈ 5 pulses ( Fig. 1(a) of Supplementary Material). Hence programming pulses of linearly increasing magnitude are needed to increase the conductance linearly for a wide range of pulses and hence obtain many more conductance/ weight states ( Fig. 4(b) of main manuscript). However, as observed in Fig.4(c) of main manuscript, programming current pulses (SET pulses) of magnitude 90 µA and duration 50 ns increase the conductance of the PCM synapse fairly linearly for a larger number of pulses ( ≈ 12) [48]. The energy associated with each such pulse is 5 pJ [48,2].
Scheme for conductance decrease-Conductance decrease is carried out by an abrupt RESET pulse that consumes 30pJ energy each [2], followed by a series of SET pulses (for conductance increase in small steps) much like RRAM synapse.

Quantizer Circuit for Domain Wall based Spintronic NN
To limit our "write" current to a magnitude of 25µA for either polarity, an additional "quantizer" circuit is added after the multiplier circuit which multiplies common part of weight update with the input (Fig. 7 of Supplementary Material). The quantizer circuit consists of a couple of op-amps working in "Comparator" configuration, which compare the input voltage with voltages at two different points in a potential divider circuit, followed by an op-amp in "Summing amplifier" configuration which adds the output voltages of the two comparator circuits (Fig. 7 of Supplementary Material). The output of the overall quantizer circuit is hence either ≈ 2.5 × 10 −3 V, 0 or ≈ 2.5 × 10 −3 V. When this output voltage is applied on the "write" terminal/ "write" path of the domain wall synapse, "write" current of three possible values (≈ −25µA, 0 ,≈ 25µA) flows through the domain wall synapse. As a result conductance of the synapse goes up or down by a fixed step (≈ 0.071 × 10 −3 mho) or stays unchanged (Fig. 8 of Supplementary Material).  [44,9] used here ,in order to increase the net conductance of the synapse, conductance of one device (positive synapse) is increased. In order to decrease the net conductance of the synapse, conductance of the other device (negative synapse) is increased. But this way, during the course of the learning, conductance of either or both synapses reaches the maximum and then Reset pulses are needed to lower the conductance to the minimum value. Frequent use of Reset pulses increases the overall energy consumption in the scheme.