Superconducting optoelectronic loop neurons

Superconducting optoelectronic hardware has been proposed for large-scale neural computing. In this work, we expand upon the circuit and network designs previously introduced. We investigate circuits using superconducting single-photon detectors and Josephson junctions to perform signal reception, synaptic weighting, and integration. Designs are presented for synapses and neurons that perform integration of rate-coded signals as well as detect coincidence events for temporal coding. A neuron with a single integration loop can receive input from thousands of synaptic connections, and many such loops can be employed for dendritic processing. We show that a synaptic weight can be modified via a superconducting flux-storage loop inductively coupled to the current bias of the synapse. Synapses with hundreds of stable states are designed. Spike-timing-dependent plasticity can be implemented using two photons to strengthen and two photons to weaken the synaptic weight via Hebbian-type learning rules. In addition to the synaptic receiver and plasticity circuits, we describe an amplifier chain that converts the current pulse generated when a neuron reaches threshold to a voltage pulse sufficient to produce light from a semiconductor diode. This light is the signal used to communicate between neurons in the network. We analyze the performance of the elements in the amplifier chain to calculate the energy consumption per photon created. The speed of the amplification sequence allows neuronal firing up to at least 20MHz, independent of connectivity. We consider these neurons in network configurations to investigate near-term technological potential and long-term physical limitations. By modeling the physical size of superconducting optoelectronic neurons, we calculate the area of these networks. A system with 8100 neurons and 330 430 total synapses will fit on a 1! 1 cm2 die. Systems of millions of neurons with hundreds of millions of synapses will fit on a 300 mm wafer. For multiwafer assemblies, communication at light speed enables a neuronal pool the size of a large data center (105 m2) comprised of trillions of neurons with coherent oscillations at 1MHz. © 2019 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). https://doi.org/10.1063/1.5096403


I. INTRODUCTION
Complete understanding of the information processing underlying cognition remains a significant scientific challenge. Progress in neuroscience, computer science, psychology, and neural engineering makes this a fruitful time for elucidation of intelligence. Biological experiments and software simulations would be greatly augmented by artificial hardware with complexity comparable to systems we know to be conscious. Intelligent systems implemented with hardware optimized for neural computing may inform us regarding the limits of cognition imposed by the speed of light while providing technological opportunities sufficient to spawn a new domain in the computing industry.
Neural computing appears uniquely capable of the distributed, yet integrated, information processing that characterizes intelligent systems. Many approaches to neural computing are being developed, and the maturity of the semicondutor industry makes CMOS a wise initial platform. Yet, the central role of communication in neural computing indicates that hardware incorporating different physics may be advantageous for this application. In previous work, 1 we considered the potential for superconducting optoelectronic hardware to perform neural operations. The principal assumption guiding the design of the hardware platform is that photons are the entities best suited for communication in large-scale neural systems. The hardware platform leverages optical communication over short and long distances to enable dense local fan-out as well as distant communication with the shortest possible delay. In this work, we present details of the design of superconducting optoelectronic neurons and networks that appear capable of achieving the functions required for cognitive computing.
This work expands upon the initial concepts presented in Ref. 1 in several ways. Most importantly, we introduce circuits that utilize light for communication between neurons but set synaptic weights in the electronic domain, thereby efficiently utilizing photons and enabling complex, event-based plasticity mechanisms. In Ref. 1, the proposed neural circuits were based on arrays of single-photon detectors (SPDs) in series or in parallel, and it was assumed that synaptic weights would be accomplished by varying the number of photons that reached the detector arrays. This approach to neuron design is inefficient because the production of photons dominates energy consumption in superconducting optoelectronic circuits, and more photons are generated than are used if the synaptic weight is established via variable attenuation in the photonic domain.
In the present work, we show that Josephson junctions (JJs) in conjunction with SPDs straightforwardly accomplish many neural operations and can be used to construct synapses that require a single photon for communication, with the synaptic weight set by the amount of electrical current generated during a photon-detection event. The simple SPD/JJ circuit investigated here transduces single-photon communication events to integrated supercurrent. We describe how to combine the signals of many such synapses in a shared integration loop using mutual inductors. We also discuss how to use these synapses to achieve inhibitory connections and dendritic processing. While the goal of photonic communication is to achieve high fan-out, in Ref. 1 fan-in was proposed to occur in the photonic domain. Here, we re-envision the dendritic tree and propose instead that a hierarchy of dendritic loops performs well for high fan-in. Regarding learning, we present techniques for controlling the synaptic weights, either through direct, electronic control for supervised training and machine learning or through correlated photon detection for spike-timing-dependent, short-term, homeostatic, and metaplasticity.
This work also goes beyond the initial concepts presented in Ref. 1 with the analysis of amplifier circuits capable of converting the current output from a JJ circuit to the voltage necessary to generate light from a semiconductor diode. Regarding scaling, this work builds on Ref. 1 with the consideration of networks characterized by graphs employed by cognitive systems. We analyze such networks to model spatial scaling and power consumption. We propose a routing architecture and consider the size of the computation and communication circuits in order to estimate network area and power density. We find that a network with 1 Â 10 6 neurons and 200 Â 10 6 synapses can fit on a 300 mm wafer and would dissipate less than 1 W of device power, illustrating the feasibility of cooling large-scale superconducting optoelectronic systems with conventional cryogenics. We conclude with the analysis of scaling limits due to the finite speed of light.
These refinements of superconducting optoelectronic network (SOEN) concepts have been summarized in a previous publication. 2 The present work is a more thorough treatment of the subject, exploring the conceptual motivation of this work as well as many details of circuit concepts and simulations that were not included in Ref. 2. We begin with a summary of the neuroscientific principles that guide the design of this hardware platform.

A. General principles of cognitive systems
The foundational assumption of this work is that light is the physical entity best suited to achieve communication in cognitive neural systems. To motivate why light is essential for large-scale neural systems, we must describe the systems we intend to pursue.
Broadly speaking, we wish to pursue devices and networks capable of acquiring and assimilating information across a wide range of spatial, temporal, and categorical scales. In a neural cognitive system, spatial location within the network may correspond to information specific to content area or sensory modality, and spatial integration across the network corresponds to the integration across informational subjects and types. Information processing must occur across many levels of hierarchy with effective communication across local, regional, and global spatial scales, as well as temporal scales. These systems must continually place new information in context. It is required that a cognitive system maintain a slowly varying background representation of the world while transitioning between dynamical states under the influence of stimulus. The objective of this paper is to design general cognitive circuits with structural and dynamical attributes informed by neuroscience, network theory, and dynamical systems. Stated generally, systems combining functional specialization with functional integration are likely to perform well for many cognitive tasks. 3,4 The theme of localized, differentiated processing combined with information integration [5][6][7][8][9][10][11] across space 12,13 and time [14][15][16][17][18] is central to the device and network designs we consider here. In the spatial domain, the demand for integration of information from many local areas requires not only dense local connectivity (as measured by a clustering coefficient [19][20][21] ) but also connections between these local areas, which serve to combine the local information in a larger context at higher cognitive levels 22 (as measured by a short average path length 23 ). High clustering combined with short average path length defines a small-world network. 24 For the highest performance, we expect this trend of integration of locally differentiated information to repeat across many scales in a nearly continuous manner 12,13 such that any node in the system is likely to be not only processing information with local neighbors but also receiving information from simpler, less-connected units, and transferring information to complex, highly connected units. Networks with this organization across scales are governed by power-law spatial scaling. 25 The patterns are related in the temporal domain where transient synchronized oscillations integrate information from various brain regions. [14][15][16] Information exchange can occur on very fast time scales, and results of these computations must be combined over longer times. The spatial structure of the network and its operation in the time domain are not independent. 3,4,26 Fast, local dynamics integrate information of closely related nodes through transient neuronal functional clusters, 27 while activity on slower scales can incorporate input from larger regions. 28 Networks with this organization in time are governed by a power-law frequency distribution, 17,18,29 characteristic of self-organized criticality. 30 Power-law spatial and temporal distributions underlie systems with fractal properties, 18,29 and self-similarity across space and time is advantageous for cognition. 18,[30][31][32][33][34] These conceptual arguments regarding information integration across spatial and temporal scales lead us to anticipate networks utilizing hierarchical configuration, with processing on various scales being integrated at high levels to form a coherent cognitive state. 22 The constitutive devices most capable of achieving these network functions are relaxation oscillators, 18,35 dynamical entities characterized by pulsing behavior 36 with resonant properties at many frequencies. 37,38 Neurons are a subset of relaxation oscillators with complex operations adapted for spike-based computation. 39 We return to relaxation oscillators shortly.
To illustrate how differentiated processing and information integration are implemented by neurons for cognition, consider vision. 40 In early stages of visual processing, neurons located near each other in space will show similar tuning curves 41 in response to presented stimuli, thus forming locally coherent assemblies selecting for certain features of a visual scene. 16 These locally differentiated processing units are constructed from architectural motifs 42,43 and are manifest in biological hardware as minicolumns and columns, 44 which are dedicated to modeling a subset of sensory space. 45 To form a more complete representation of an object within a visual scene, or to make sense of a complex visual scene with many objects, the visual system must combine the information from many differentiated processors. This integration is accomplished with lateral connections between columns 3 as well as with feed-forward connections from earlier areas of visual cortex to later areas of visual cortex. 40 Such an architecture requires some of the neurons in any local region to have long-range projections, motivating the need for local connectivity for differentiated processing combined with distant connectivity for information integration across space.
To understand information integration in the time domain, consider synchronized oscillations at various frequencies in the context of the binding problem. 46,47 Stated as a question, the binding problem asks how the myriad stimuli presented to the brain can be quickly and continuously organized into a coherent cognitive moment. In the limited context of vision, we ask how a complex, dynamic visual scene can be structured into a discernible collection of objects that can be differentiated from each other and from an irrelevant background. 48 Many studies provide evidence that fast, local oscillations are modulated by slower oscillations encompassing more neurons across a larger portion of the network. [14][15][16][17][18]28,49,50 In the case of columns in visual cortex, local clusters tuned to specific stimuli will form assemblies with transient synchronization at high frequencies (γ band,  ). The information from many of these differentiated processors is integrated at higher levels of processing by synchronizing larger regions of neurons at lower frequencies (α band, 1-5 Hz, and θ band, 4-10 Hz 17,28 ). The transient synchronization of neuronal assemblies is closely related to neuronal avalanches, 30,32 cascades of activity across many frequencies.
Neuronal avalanches are observed in networks balanced at the critical point between order and chaos. [30][31][32][33][34] Self-similarity in the temporal domain implies operation at this critical point, 30,31,34 and operating at this phase transition is necessary to maximize the dynamic range of the network. 32 Inhibition and activity-based plasticity are crucial for achieving this balance. 17,18,51 Networks of excitatory principal neurons interspersed with inhibitory interneurons 52 with small-world characteristics naturally synchronize at frequencies determined by the circuit and network properties. 18 Slower frequency collective oscillations of networks of inhibitory interneurons provide short windows when certain clusters of excitatory neurons are uninhibited and, therefore, susceptible to spiking. 53 This feedback through the inhibitory interneuron network provides a top-down means by which the dynamical state of the system can provide broad information to the local processing clusters. 16,50 Regions of cortex with higher information integration focus attention 15 on certain aspects of stimulus by opening receptive frequency windows at the resonant frequencies of relevant subprocessors, providing a mechanism by which binding occurs and background is ignored. 16,17,49,50 The result of this inhibitory structuring of time is a network with dynamic effective connectivity. 22,50 By constructing a network with a smallworld, power-law architecture from highly tunable relaxation oscillators and employing feedback through inhibitory oscillations, we produce a system that can change its effective structural and resonant properties very rapidly based on information gleaned from prior experiences of a large region of the network. 17,50 This model of binding requires a means by which the resonant frequencies of neuronal assemblies can be associated with certain stimuli, and a means by which the inhibitory interneuron network can learn to associate different assemblies with different frequencies. Plastic synaptic weights make such adaptation possible. Synapses provide a means for the connectivity of the network to shape dynamics and functionality, and synapses adapt their states based on internal and external activities. As cortex evolves through dynamical states on various temporal and spatial scales, information stored in synapses is integrated across the network and used as a feedback to distributed subprocessors. 16,50 For a cognitive system embedded in a dynamical environment to provide adaptive feedback as well as robust memory, the system must comprise a large number of synapses changing on different time scales due to different internal and external factors. 54 Synapses with many stable values of efficacy can significantly increase memory retention times, 55 and synapses that adapt not only their state of efficacy but also their probability of state transition are crucial for maximizing memory retention times. 54,56 Adaptation of probability of state transition is a mechanism of metaplasticity, 57 and many forms appear in biological systems. We expect a cognitive system to utilize differentiated regions of neurons, some with synapses changing readily between only two synaptic states, and other regions with synapses changing slowly between many distinguishable states. We further expect the network to update not only synaptic weights but also the probability of changing synaptic weights. The dynamical state of the system can then sample synaptic memory acquired at many times, in many contexts, while quickly adapting the dynamical trajectory as new stimulus is presented.
To summarize, cognition appears to require differentiated local processing combined with information integration across space, time, and experience. The structure of the network determines the dynamical state space, and the structure of the network adapts in response to stimulus and internal activity. We now ask the question: What physical systems are best equipped to perform these operations?

B. Physics and hardware for cognition
The aforementioned insights from neuroscience lead us to emphasize several features of neural systems in hardware for cognition. First, we must use a physical signaling mechanism capable of achieving communication across networks with dense local clustering, midrange connectivity, and large-scale integration. Second, the relaxation oscillators that constitute the computational primitives of the system must perform many dynamical functions with a wide variety of time constants to enable and maximally utilize information processing through transient synchronized assemblies. Third, a variety of synapses must be achievable, ranging from binary to multistable. The strength of these synapses must adjust due to network activity, as must the update frequency.

Optical communication
A principal challenge of differentiated computation with integrated information is communication. The core concept of the superconducting optoelectronic hardware platform is that light is excellent for this purpose. Light excels at communication for three reasons. First, light experiences no capacitance, resistance, or inductance, so dense, local clustering as well as long-range connections can be achieved without charge-based wiring parasitics. Second, it is possible to signal with single quanta of the electromagnetic field, thereby enabling the energy efficiency necessary for scaling. Third, light is the fastest entity in the universe. Short communication delays are ideal for maximizing the number of synchronized oscillations a neuron can participate in as well as the size of the neuronal pool participating in a synchronized oscillation. Light-speed communication, therefore, facilitates large networks with rich dynamics.
We have argued elsewhere 1 that parasitics of electronic interconnects limit communication in neural systems. These limitations are ultimately due to the charge of the electron and its mass. Signals in the brain are transmitted via ionic conduction. The operating voltage of biological neurons is near 70 mV, so the energy penalty of CV 2 =2 is significantly reduced relative to semiconducting technologies operating at 1 V, and direct connections are possible. Yet, the low mobility of ions results in slow signal velocities, severely limiting the total size of biological neural systems. 18 Uncharged massless particles are better suited for communication in cognitive neural systems. Light is the natural candidate for this operation. It is possible for a single optical source to fan its signals out to a very large number of recipients. This fan-out can be implemented in free space, over fiber optic networks, or in dielectric waveguides at the chip and wafer scales. For large neural systems, it will be advantageous to employ all these media for signal routing. The presence of excellent waveguiding materials and a variety of light sources inclines us to utilize optical signals with 1 μm λ 2 μm. Additionally, because the energy of a photon and its wavelength are inversely proportional, optoelectronic circuits face a power/ area trade-off. Similar circuits to those presented here could be implemented with microwave circuits, but the system size would likely be cumbersome. Operation near telecommunication wavelengths appears to strike a suitable compromise.

Superconducting electronics
The foundational conjecture of the proposed hardware platform is that light is optimal for communication in cognitive systems. The subsequent conjecture is that power consumption will be minimized if single photons of light can be sent and received as signals between neurons in the system. Superconducting single-photon detectors are the best candidate for receiving the photonic signals. In addition to selecting microscale light sources and dielectric waveguides, we choose to utilize superconduting-nanowire single-photon detectors [58][59][60][61] to receive photonic signals because of the speed, 62 efficiency, 61 and scalable fabrication 63 of these devices.
Utilizing superconducting single-photon detectors contributes to energy efficiency in three ways. First, because a single photon is a quantum of the electromagnetic field, it is not possible to signal with less energy at a given wavelength. Second, because the device is superconducting, it dissipates near zero power when it is not responding to a detection event. Third, the power dissipated by the detector during a detection event is on the order of the energy used to generate the photon being detected, so a significant additional energy penalty is not incurred.
The choice of employing superconductors has several important ramifications. It requires that we operate at temperatures that support a superconducting ground state (%4 K), so cryogenic cooling must be implemented. While cooling is an inconvenience, employment of superconducting detectors brings the opportunity to utilize the entire suite of superconducting electronic devices, [64][65][66] including JJs and thin-film components such as current 67,68 and voltage 69 amplifiers. Semiconductor light sources also benefit from low temperature, 70 and silicon light sources become an option. 63 We have emphasized that the charge and mass of electrons are a hindrance for communication. Yet, the interactions between electrons due to their charge make them well suited to perform the computation and memory functions of synapses and neurons. In particular, superconducting devices and circuits are exceptionally capable of achieving the complex dynamical systems necessary for cognition. To elucidate the specific type of dynamical devices we intend to employ, we now elaborate upon the strengths of relaxation oscillators for cognitive systems.

Relaxation oscillators
A defining aspect of cognitive systems is the ability to differentiate locally to create many subprocessors, but also to integrate the information from many small regions into a cohesive system, and to repeat this architecture across many spatial and temporal scales. A network of many dynamical nodes, each with the capability of operating at many frequencies, gives rise to a vast state space. As computational primitives that can enable such a dynamical system, oscillators are ideal candidates. In particular, relaxation oscillators 18,[35][36][37][38]49,[71][72][73] with temporal dynamics on multiple time scales 37 have many attractive properties for neural computing, which is likely why the brain is constructed of such devices. 74 We define a relaxation oscillator as an element, circuit, or system that produces rapid surges of a physical quantity or signal as the result of a cycle of accumulation and discharge. Relaxation oscillators are energy efficient in that they generally experience a long quiescent period followed by a short burst of activity. Timing between these short pulses can be precisely defined and detected. 18 Relaxation oscillators can operate at many frequencies 38 and engage with myriad dynamical interactions. 49 The oscillator's response is tunable, 38 they are resilient to noise because their signals are effectively binary, 75 and they can encode information in their mean oscillation frequency as well as in higher-order timing correlations. 14,[76][77][78][79][80] The relaxation oscillators we intend to employ as the computational primitives of superconducting optoelectronic networks can be as simple as integrate-and-fire neurons 39,41 or more complex with the addition of features such as dendritic processing 14,[77][78][79][80] to inhibit specific sets of connections 17,18,52 or detect timing correlations and sequences of activity. 14,80 While our choice to use superconductors was motivated by the need to detect single photons, we find superconducting circuits combining SPDs and JJs are well suited for the construction of relaxation oscillators with the properties required for neural circuits.

Neuron overview
We refer to relaxation oscillators sending few-photon signals that are received with superconducting detectors as superconducting optoelectronic neurons. In the specific neurons studied in this work, integration, synaptic plasticity, and dendritic processing are implemented with inductively coupled loops of supercurrent. We refer to devices of this type as loop neurons. The loop neuron presented in this paper is shown schematically in Fig. 1. A full circuit diagram is shown in Fig. 23. The remainder of this section is an overview of the circuits described in more detail in the rest of the paper. Operation of loop neurons is summarized as follows.
Photons from upstream neurons are received by superconducting SPDs at a neuron's synapses. Using Josephson circuits, these detection events are converted into an integrated supercurrent which is stored in a superconducting loop. The amount of current that gets added to the integration loop during a photon detection event is determined by the synaptic weight. The synaptic weight is dynamically adjusted by another circuit combining SPDs and JJs. When the integrated current of a given neuron reaches a (dynamically variable) threshold, an amplification cascade begins, resulting in the production of light from a waveguide-integrated semiconductor light emitter. The photons thus produced fan out through a network of dielectric waveguides and arrive at the synaptic terminals of other neurons where the process repeats.
In these loop neurons, a synapse consists of an SPD in parallel with a JJ (which together transduce photons to supercurrent), and a superconducting loop, which stores a current proportional to the number of detected photon arrival events. This loop is referred to as the synaptic integration (SI) loop. Within each neuron, the loops of many synapses are inductively coupled to a larger superconducting loop, referred to as the neuronal receiving (NR) loop, thereby inducing an integrated current proportional to the current in all the neuron's synapses. When the current in this NR loop reaches a threshold, the neuron produces a current pulse in the form of a flux quantum. This current is amplified and converted to voltage to produce photons from a semiconductor pÀiÀn junction.
The currents in the synaptic and neuronal loops are analogous to the membrane potential of biological neurons, 41 and the states of flux in these loops are the principal dynamical variables of the synapses and neurons in the system. Inhibitory synapses can be achieved through mutual inductors with the opposite sign of coupling. Dendritic processing can be implemented straightforwardly by adding intermediate mutually inductively coupled loops between the synaptic and neuronal loops. Synapses can be grouped on dendritic loops capable of local, nonlinear processing and inhibition. 14,18,52 Dendrites capable of detecting specific sequences of synaptic firing events 77,80 can also be achieved. Neurons with multiple levels of dendritic hierarchy can be implemented as multiple stages of integrating loops. Clustering synapses on multiple levels of hierarchy in this way enables information access at gradually larger length scales across the network through transient synchronization at gradually lower frequencies. 28 The temporal scales of the loops can be set with L=r time constants, so different components can operate on different temporal scales, enabling relaxation oscillators with rich temporal dynamics. These relaxation oscillators can be combined in networks with dynamic functional connectivity, reconfigurable through inhibition and synaptic plasticity. 50,52 These receiver circuits and integration loops are presented in Sec. II.
Synaptic memory is also implemented based on the stored flux in a loop, referred to as the synaptic storage (SS) loop. The state of flux in the SS loop determines the current bias to the synaptic receiver circuit discussed above. This current bias is the synaptic weight. If the SS loop is created with a superconducting wire of high inductance, the loop can hold many discrete states of flux and, therefore, can implement many synaptic weights. In Sec. III, we investigate synapses with a pseudocontinuum of hundreds of stable synaptic levels between minimal and maximal saturation values, and we show that transitions between these levels can be induced based on the relative arrival times of photons from the presynaptic and postsynaptic neurons, thereby establishing a means for spike-timing-dependent plasticity (STDP) with one photon required for each step of the memory-update process.
While synapses with many stable levels are advantageous to extending memory retention times, 55 it is also important to implement synapses that change not only their efficacy based on preand postsynaptic spike timing but also their probability of changing their efficacy. 54 Just as the synaptic weight is adjusted through a and inhibitory (S i ) synapses, as well as synaptic weight update circuits (W ). The wavy, colored arrows are photons, and the straight, black arrows are electrical signals. The synapses receive signals as faint as a single photon and add supercurrent to an integration loop. Upon reaching threshold, a signal is sent to the transmitter circuit (T), which produces a photon pulse. Some photons from the pulse are sent to downstream synaptic connections, while some are used locally to update synaptic weights. current bias on the receiver circuit, the probability of changing the synaptic weight can be adjusted through a current bias on the synaptic update circuit. As in the dendrites, we see a hierarchy can be achieved. In the case of synaptic memory, the synaptic weight and its rates of change are implemented in a loop hierarchy, and the state of flux in the loops can be dynamically modified based on photons generated by neural activity. Similar mechanisms can be utilized to adjust the synaptic weight based on short-term activity from the presynaptic neuron 81 or on a slowly varying temporal average of the postsynaptic activity. 82,83 The synaptic memory circuits we develop in Sec. III are logical extensions of binary memory cells utilized in superconducting digital electronics. 65,66 The aspect of superconducting optoelectronic neuron operation that is most difficult to achieve is the production of light. The superconducting electronic circuits that perform the aforementioned synaptic and neuronal operations operate at millivolt levels, whereas the production of the near-infrared photons desirable for communication requires a volt across a semiconductor diode. When a neuron reaches threshold, an amplification sequence begins. Current amplification is first performed, and the resulting large supercurrent is used to induce a superconducting-to-normal phase transition in a length of wire. When the current-biased wire becomes resistive, a voltage is produced via Ohm's law. This device leverages the extreme nonlinearity of the quantum phase transition to quickly produce a large voltage and an optical pulse. These transmitter circuits are discussed in Sec. IV.
The photons of a neuronal pulse are distributed over a large axonal network of passive dielectric waveguides. These waveguides terminate at each of the downstream synaptic connections. A downstream synaptic firing event will occur with near-unity probability at any connection receiving one or more photons. Photons of multiple colors can be generated simultaneously or independently, and different colors can share routing waveguides, while being used for different functions on the receiving end, such as synaptic firing and synaptic update. The number of photons produced during a neuronal firing event determines the probability that distant connections will receive the signal, and this reach can be manipulated with the current bias to the light emitter. The network of waveguides that routes the communication events is discussed in Sec. V.
To make the analogy to biological neural hardware explicit, synapses are manifest as circuits comprising superconducting SPDs with JJs. These synapses transduce photonic communication signals to supercurrent for information processing, and this supercurrent plays the role of the membrane potential. The dendritic arbor is a spatial distribution of synapses interconnected with inductively coupled loops for intermediate integration and nonlinear processing. The integration function of the soma is also achieved with a superconducting loop, and the threshold is detected when a JJ in this loop is driven above its critical current. The firing function of the soma (or axon hillock) is carried out by a chain of superconducting current and voltage amplifiers that drive a semiconductor diode to produce light. The axonal arbor is manifest as dielectric waveguides that route photonic signals to downstream synaptic connections. Gap junctions may be realized with evanescent couplers between waveguides of the axonal arbor, but we do not consider gap junctions further in this paper.
Loop neurons combine several core devices: superconducting single-photon detectors, [58][59][60][61] Josephson junctions, 64-66 superconducting   mutual inductors, 84 superconducting current 67,68 and voltage  amplifiers, 69 semiconductor light sources, 1,63 and passive dielectric waveguide routing networks. [85][86][87] While all the components of these neurons have been demonstrated independently, their combined operation has not been shown. The experimental effort to achieve circuit integration is underway. The physical principles of their operation and the designs presented in this paper indicate the potential for loop neurons to achieve complex, large-scale systems. The straightforward implementation of inhibition; the realization of a variety of temporal scales through L=r time constants; single-photon-induced synaptic plasticity; and dynamically variable learning rate, threshold, and reach indicate these relaxation oscillators are promising as computational primitives. In conjunction with dense local and fast distant communication over passive waveguides, the system appears capable of the spatial and temporal information integration necessary for cognition and binding.
We do not propose superconducting optoelectronic networks (SOENs) as an alternative to established neural hardware, but rather as a symbiotic technology. The success of neural CMOS (including optical communication above a certain spatial scale) will contribute to the success of SOENs, as it will be advantageous for SOENs to interface with CMOS via photonic signaling over fiber optic links between cryogenic and ambient environments. SOEN hardware is particularly well suited to interfacing with other cryogenic technologies such as imaging systems with superconducting sensors, 88,89 as are commonly employed for medical diagnostics, 90 exoplanet search, 91-93 cosmology, 94 and particle detectors. 95 Another intriguing application is in conjunction with other advanced computing technologies such as flux-based logic [96][97][98] and quantum computers. [99][100][101][102][103] One can envision a hybrid computational platform 104,105 wherein a quantum module utilizes entanglement and superposition, while a neural module performs quantum-limited measurements and learns the behavior of the quantum system, and classical fluxon logic controls the operation of both. A superconducting optoelectronic hardware platform is likely to satisfy the computation and communication requirements of this hybrid technology.
At this point, we have described the motivations for loop neurons, and we have summarized their operation. The remainder of the document contains technical details of circuit operations (Secs. II-IV) and scaling analysis (Sec. V).

II. SYNAPTIC RECEIVER CIRCUITS
The focus of this section is on the conversion of photonic communication events on many synapses to an integrated total signal stored in the neuron. These optoelectronic devices must meet several criteria: (1) the neuron must be able to achieve leaky integrate-and-fire functionality 39,41 wherein activity on multiple synapses contributes to an integrated signal with a controllable leak rate; (2) single-photon detection events must contribute to the integrated signal, and the amount each detection event contributes to the integrated signal should depend on a dynamically reconfigurable synaptic weight; (3) neurons that are sensitive to the sum of spike events must be achievable in order to make use of rate-coded signals, 106 and neurons that are sensitive to the timing between afferent spikes must also be achievable in order to make use of temporal coding; 14,39,75,77 (4) the circuits must scale to thousands of synaptic connections to integrate information across moderately sized cognitive circuits; 180 (5) the dynamic range of the neuron and synapses should allow activity on a large fraction of the synapses to contribute to a neuronal firing event, yet repeated activity on a small fraction of the synapses should also be able to induce a neuronal firing event; (6) synapses with inhibitory as well as excitatory functionality must be achievable, and inhibition must work in conjunction with dendrites 17,18,80 to enable synchronization on multiple time scales. [14][15][16][17]52 This section explores circuit designs satisfying these criteria.

A. Circuit operation
The synaptic receiver is enclosed within the dashed boundary of Fig. 2(a). A simple instantiation of the synaptic circuit is shown in Fig. 2(b), and the operation of the synapse is illustrated schematically in Fig. 2(c). This receiver circuit in the context of the other components of the neural circuit is shown in Fig. 23 in Sec. IV. The synaptic receiver circuit comprises an SPD in parallel with a JJ embedded in a flux-storage loop. This design is a reasonable starting point for a single-photon-sensitive synapse because it achieves transduction of a photonic signal to a superconducting electronic signal with a simple circuit. The design is similar to other superconducting particle detector circuits, such as transition-edge bolometers. 65 The operation of the synaptic circuit proceeds as follows. The SPD, shown as a variable resistor in series with an inductor in Fig. 2(b), has zero resistance in the steady state, and it switches to a high-resistance state temporarily upon absorption of a photon. When a photon is detected, an electrical current is diverted from the SPD to a JJ, referred to as the synaptic firing junction (J sf ). The current diverted from the SPD [ Fig. 2(c), part (i)] causes the net current through J sf to exceed I c , generating a series of fluxons [ Fig. 2(c), part (ii)]. We refer to this detection of a photon by the SPD and subsequent generation of fluxons by the JJ as a synaptic firing event. The synaptic weight of the connection is implemented via the current bias across J sf , controlled by I sy . The effect of this synaptic weight is to change the duration the JJ bias current exceeds I c , and, therefore, the number of fluxons generated during a synaptic firing event. If the synaptic weight is weak, a small number of fluxons, and, therefore, a small total amount of supercurrent, will be generated. If the synaptic weight is strong, a large number of fluxons, and, therefore, a large amount of supercurrent, will be generated during the synaptic firing event. The SPD response is virtually identical whether the number of photons present is one or greater than one, and for energy efficiency, it is advantageous to send the fewest number of photons possible to each synaptic connection. The SPD response also does not depend strongly on the frequency of light across a bandwidth broad enough for multiplexing. 61 Implementing synaptic weight in the electronic domain in this manner makes use of both the speed and energy efficiency of JJs, while leveraging the strengths of light for communication.
The supercurrent generated during each synaptic firing event is added to a superconducting loop, called the synaptic integration (SI) loop, which integrates the total current from all synaptic firing events at that synapse [ Fig. 2(c), part (iii)]. Many synapses will be coupled to a larger neuronal receiving (NR) loop via mutual inductors. The NR loop combines the signals from all the synapses connected to the neuron. Ultimately, the current coupled to the NR loop is increased using a current transformer that induces current in a final loop, the neuronal thresholding (NT) loop. The NT loop is a superconducting loop that contains a JJ (J th ) that produces an output current pulse when its critical current (threshold) is reached. 108 This threshold can be dynamically varied with a current bias. The current pulse generated when the neuron reaches threshold is amplified and ultimately used to trigger a photon-generation event.
The number of flux quanta generated in a synaptic firing event depends on the relation between I c , I spd , and I sy , as well as the SPD time constant, L spd =r spd . If the bias current I is close to but greater than I c , the time-averaged voltage across the junction will be given by hVi % R ffiffiffiffiffiffiffiffiffiffiffiffiffi ffi I 2 À I 2 c p , (in the limit of an overdamped junction 66 ) where R is the junction resistance in the nonsuperconducting state. The rate of generation of flux quanta is given by r fq ¼ hVi=Φ 0 . 66 This generated flux is trapped in the SI loop. The utilization of a JJ in this circuit is advantageous to decouple the amount of current added to the loop from the time it is stored in the loop. The current in each SI loop decays with the τ si ¼ L si =r si time constant, which can be chosen over a broad range. By choosing τ si to be different for different synapses, one can diversify the temporal information provided to the neuron. 17,28,30,81 The circuit of Fig. 2(b) captures the concept of the receiver, but its performance is limited in this configuration because the SI loop saturates at a small current. Higher saturation current is achieved by separating the transduction operation from the SI loop by a Josephson transmission line (JTL), 65,66 as shown in Fig. 3(a). This form of the receiver circuit is the form used as a synapse in this work.
In the configuration of Fig. 3(a), the fluxons produced by the switching of J sf during a synaptic firing event propagate down a JTL (a single JJ in this study) and drive the switching of a junction inside the SI loop. The fluxons from multiple synaptic firing events can be stored in the SI loop, and, therefore, we may wish to use a loop that can contain many fluxons. The current added to the loop by a single Wb is the magnetic flux quantum. The SI loop can maintain a linear response in the presence of many synaptic firing events if L si is chosen to be large, or the SI loop can saturate if L si is chosen to be small, thus providing one means of implementing short-term plasticity. 81 The SI loops are inductively coupled to the NR loop, [ Fig. 3(b)], which stores a current proportional to a weighted sum of the currents in all the SI loops. The use of mutual inductors allows many synapses to add current to an NR loop without introducing leakage current pathways. Finally, the NR loop couples to a third loop, the NT loop. The mutual inductor coupling the NR loop to the NT loop serves as a transformer to step up the current to be detected at threshold. The NT loop may not need to be a separate loop when the number of synapses, N sy , is small. The performance of the NT loop upon reaching current threshold is discussed in Sec. IV.
In Fig. 4, we simulate the operation of the synaptic receiver as it experiences a synaptic firing event. We use WRSpice 109 to model the circuit of Fig. 3(a). We treat the SPD as a current source with exponential rise with 100 ps time constant followed by exponential decay with 50 ns time constant. The amplitude of the SPD current pulse is 10 μA. All circuit parameters used in this work are given in Appendix A. Figures 4(a)-4(c) show the activity of a synaptic firing event for I sy ¼ 800 nA, 1 μA, and 3 μA. With I sy ¼ 800 nA, the junction is briefly driven above I c , and eight fluxons are transmitted to the SI loop. The synaptic firing event causes the current in the SI loop, I si , to increase by 1.7 nA. If we increase the bias current to I sy ¼ 1 μA, the synaptic firing event produces 33 fluxons and adds 7 nA to the SI loop. Further increasing the synaptic bias (SB) to 3 μA gives the behavior shown in Fig. 4(c). In this case, 497 fluxons add 103 nA to the SI loop. The period of the voltage pulses is observed to decrease through the duration of the SPD pulse, demonstrating the decrease of r fq as current returns to the SPD and the net bias across the JJ decays, as discussed above. Details of synaptic firing are shown in Figs. 4(d)-4(g). The energy consumed by a synaptic firing event is discussed in Appendix A.
The analysis of Fig. 4 gives the currents and voltages present during a synaptic firing event for three values of I sy . Systematic analysis of I si vs I sy finds a quadratic trend. A principal objective of this analysis is to determine the range of synaptic bias currents over which we would like to operate. Operating with a minimum synaptic bias of 1 μA enables us to work close to the energy-efficiency limit of the circuit, and we anticipate that the exact number of fluxons produced during a firing event will be noisy, much like the activity of a biological neuron. 75 The amount of current added to the SI loop during a synaptic firing event with strong synaptic weight should be significantly larger than the amount of current with a weak synaptic weight. We choose I sy ¼ 3 μA to be the largest synaptic bias at which we would like to operate, and thus, a synaptic firing event with a strong synaptic bias adds 15 times as much current to the SI loop (and, therefore, the NR loop and the NT loop) as a firing event with a weak synaptic bias. This ratio is entirely tunable based on the needs of the system. Learning-either supervised or unsupervised-should adjust the synaptic bias current over the range 1 μA , I sy , 3 μA for the parameters considered here. Circuits accomplishing this are discussed in Sec. III.
In the circuit of Fig. 3(a), synaptic activity can result in saturation of the SI loop, wherein additional synaptic firing activity adds no additional current to the loop. The value of I si at which saturation occurs can be controlled in circuit design, and in the present work, it is near 10 μA. The amount of current added to the loop by a single fluxon is given by ΔI si ¼ Φ 0 =L si , so the number of synaptic firing events that cause saturation can be adjusted across a broad range. With a small value of L si , a single synapse event can saturate the loop, and with a large value, the loop can maintain a linear response in the presence of many repeated synapse events. The SI loop with the design described above can receive over 1000 synaptic firing events when I sy ¼ 1 μA, and 82 synaptic firing events when I sy ¼ 3 μA before saturation of the loop occurs (assuming L si ¼ 10 μH and τ si ! 1). If the loop contains a resistance, the trapped flux will leak with the L=r time constant, leaving the synapse ready to receive further synaptic firing events.

B. Multisynaptic neurons
In general, a neuron will combine signals from many synaptic connections and produce a pulse when this combined signal reaches a threshold. We would like to know how devices will perform when many synapses are integrated with a single NR loop. An important question is how many SI loops must receive activity to drive the NT loop to threshold. We have analyzed the current induced in the NT loop as a function of the number of synapses inductively coupled to the NR loop. We find that with symmetrical mutual inductors of 100 pH, if the NR loop receives 1000 inputs, a saturated SI loop will still induce nearly 500 nA in the NT loop. If the NT loop requires 5 μA to threshold (a number comfortably above the noise level), then 10 saturated synapses can cause the neuron to fire. Evidence from neuroscience indicates that a neuron with N synapses is commonly driven to fire by activity on ffiffiffiffi N p synapses 110,111 (with significant diversity in responses observed), and this analysis finds that coupling through mutual inductors makes the required synaptic strength possible, even when 1000 connections are made to a single NR loop. Further improvement is likely possible with asymmetric mutual inductors. Perhaps more importantly, in mature network architectures, neurons receiving input from thousands of synapses are likely to employ a complex dendritic tree. It is unlikely that any one dendrite will receive activity from 1000 synapses, and the range of synaptic efficacy achievable with these circuits is wellmatched to the threshold current levels that will be convenient in practice.
We wish to know how inductively coupling multiple SI loops to a single NR loop affects the operation during synaptic firing events with regard to cross talk. Cross talk between synapses is small in this configuration, primarily because the self-inductance of each SI loop (L si ) is much larger that the mutual inductance between each SI loop and the NR loop (M sy ). For example, a typical value for L si is on the order of 100 nH, while a typical value for M sy is on the order of 100 pH. To arrive at analytical expressions for the cross talk between two SI loops coupled to the same NR loop, we assume that one synapse experiences a synaptic firing event, current is added to that SI loop, and we ask how much current is induced in the other SI loops due to their mutual coupling to the same NR loop. In the limit that L si ) M sy , this induced parasitic current scales as M sy =(N sy L si ), where N sy is the total number of synapses (SI loops) coupled to the same NR loop. For typical values of M sy and L si , this quantity is on the order of 10 À3 for N sy ¼ 1 and decreases as synapses are added to the loop. Therefore, cross talk between SI loops coupled to the same NR loop is not problematic with this fan-in design. We may also ask about the ratio of the intended current induced in the NR loop to FIG. 4. Operation of the synaptic firing circuit during a synaptic firing event for three values of I sy . The three traces in each of these plots are normalized such that the maximum of each trace within the displayed time window is set to one and the minimum is set to zero. The colors of the traces are in reference to the current paths and voltage node labeled in Fig. 3 the parasitic current induced in adjacent SI loops. This quantity is independent of N sy , and in the same limit of L si ) M sy , we find that the ratio of the current induced in adjacent SI loops to the current induced in the NR loop is M sy =L si , which again is on the order of 10 À3 for typical circuit parameters.
We have further investigated this cross talk numerically. WRSpice simulations show that a synaptic firing event of a neuron with a single synaptic connection and a synaptic firing event of a single synapse connected to an NR loop with 10 synaptic connections produce an identical number of fluxons with identical timing. The effect of timing delay between two synaptic firing events on different synapses in a neuron of N sy ¼ 10 has also been considered. The total current added to the NR loop is independent of the timing delay between the two synaptic firing events. These linearities with respect to N sy and pulse timing delay are attractive features of inductively coupled synapses. Contexts in which nonlinearity with respect to arrival time is desirable, such as for temporal coding 14,77 or dendritic processing, 79,80 are likely to employ two-photon receiver circuits or dendritic loops as discussed in Sec. II C.
It is important for a neuron to be able to receive excitatory and inhibitory connections. 41,52,110,112 Inhibitory connections keep the network from experiencing runaway activity and are crucial for temporal synchronization. [14][15][16][17][18] Inhibitory connections can be constructed with the same photon-to-fluxon transduction circuit presented thus far by changing the sign of M sy . We investigate a neuron with seven excitatory and three inhibitory connections in Fig. 5. The figure shows a time trace of I ni as three excitatory and two inhibitory synaptic firing events occur. An excitatory event and an inhibitory event occur in synapses with I sy ¼ 1 μA, and the other events occur in synapses with I sy ¼ 3 μA. This plot demonstrates the dynamic state of a multisynaptic neuron under the influence of excitation and inhibition.
The symmetry between inhibitory and excitatory synapses is broken by I th b , the current bias across the thresholding junction. The circuit can be designed so the saturation of all inhibitory SI loops is insufficient to add enough counter current to the NT loop to overcome I th b and reach threshold. Thus, repeated excitatory events can drive the neuron to spike, but repeated inhibitory events can only move the device further from threshold and cannot trigger a spike, much like the polarizing effects of inhibitory interneurons in biological neural systems.

C. Dendritic processing
In addition to neurons that integrate single-photon pulses, as described in Sec. II A, it is desirable to achieve neurons that detect coincident signals from two or more presynaptic neurons for detecting temporally coded information. 14,[77][78][79][80]113 The mutual information regarding a stimulus conveyed by two or more neurons can be approximated by a Volterra expansion 39 with the leading term corresponding to firing rate, and the second-order term representing correlations. 76 In biological neurons, temporal synaptic sequences can be detected using hardware nonlinearities present in dendrites, 79,80 which perform important cortical computations. Detection of timing correlations and sequences can be achieved in optoelectronic hardware using two (or more) SPDs in a similar circuit to the synaptic receiver of Fig. 3(a).
In Fig. 6, we analyze a two-photon symmetrical coincidence detection circuit. The circuit diagram is shown in Fig. 6(a). The two SPDs are biased symmetrically, and the circuit is designed such that if either SPD detects a photon in isolation, the current across J sf remains below I c , but if both detect a photon within a certain time window, the current across J sf can exceed I c , adding current to the SI loop. The amount of current added to the SI loop is plotted as a function of the difference in arrival times between two photons in Fig. 6(b). WRSpice was again used for these simulations, but in this case, the SPDs were modeled not as current sources but as resistors of 5 kΩ with 200 ps duration occurring at specified photon-arrival times. 62 The time scale over which correlated events are detected is set by the L spd =r spd time constant of the circuit. In the main panel, this time constant is 500 ns, and in the inset, it is 50 ns. Longer correlation windows can be straightforwardly achieved, and the shortest correlation window will be limited by the latching time of the SPDs.
Due to the symmetric biasing of the two SPDs, the circuit of Fig. 6 is insensitive to order of photon arrival. By breaking this symmetry, similar receiver circuits that detect ordered correlations can be used for Hebbian learning. The two-SPD circuit of Fig. 6 can also be extended to detect other sequences of activity, including sequences with more photons.
Dendritic processing can also be used for intermediate nonlinear processing between synapses and the NR loop. 114 An example circuit is shown in Fig. 7. Here, multiple SI loops are inductively coupled to another loop, which contains a JJ. Only when the junction is driven above its critical current does an appreciable signal get added to the dendritic integration (DI) loop, which is an intermediate, nonlinear processor between the SI loop and the NR loop.

FIG. 5.
A neuron with seven excitatory and three inhibitory synaptic connections. The excitatory and inhibitory current inputs are shown as green and red traces and are referenced to the left y axis. The blue trace is I sy , referenced to the right y axis. At time t ¼ 10 ns, a synaptic firing event occurs on an excitatory synapse with I sy ¼ 1 μA. At time t ¼ 50 ns, a synaptic firing event occurs on an excitatory synapse with I sy ¼ 3 μA. At time t ¼ 90 ns, a synaptic firing event occurs on an inhibitory synapse with I sy ¼ 1 μA. At time t ¼ 130 ns, a synaptic firing event occurs on an excitatory synapse with I sy ¼ 3 μA. At time t ¼ 160 ns, a synaptic firing event occurs on an inhibitory synapse with I sy ¼ 3 μA. The colors in this plot are not in reference to Fig. 3.
In this case, the DI loops are analogous to dendrites. An important role of dendrites is in conjunction with inhibitory interneurons that can temporarily suppress the efficacy of an entire dendrite. 17,18,52 At the bottom of Fig. 7, we show how an inhibitory interneuron may be inductively coupled to a dendrite. When inhibition is applied to the loop, it may be impossible for the synaptic connections to drive the JJ above threshold and add flux to the DI loop. Many levels of loop hierarchies can be combined in this way to achieve various nonlinear functions as well as current amplification before the NT loop.
Dendritic processing in conjunction with inhibitory interneurons contributes to network synchronization on various temporal and spatial scales. [14][15][16]18,27,52 The approach to dendrites shown in Fig. 7 is one way inhibition could be used with the synapses presented here to achieve these functions thought to be necessary for cognition. 17,50 In this context, engineering synaptic and dentritic circuits with a variety of time constants (analogous to membrane time constants) is important, as these time constants affect synchronization frequency 49 and enable neurons with a greater diversity of synapses. 115 As discussed in Sec. I, power-law dynamics are necessary for information integration and self-organized criticality, and a power-law frequency distribution can be achieved through the superposition of exponential decay functions with a diversity of time constants. 30 To achieve this with the dendritic processors shown in Fig. 7, resistors are placed in each DI loop. The L=r time constant of each DI loop will set its temporal response, and in this way, different dendrites can be given different time constants. Similarly, a resistor can be placed in each SI loop so that each synaptic excitation has a characteristic time constant, as discussed previously. These resistors will also accomplish the task of purging flux from the SI and DI loops to avoid saturation. As indicated in Fig. 7, inhibition can be applied at various points in the loop hierarchy, including specific synaptic loops, dendritic loops, the neuronal loop, and even the current source to the light emitter. These different structural implementations of inhibition are analogous to the three main forms of inhibition observed in biological neurons, wherein interneurons target dendrites, the soma, and the axon initial segment. 52

D. Discussion regarding synaptic receiver circuits
The present section has investigated a superconducting optoelectronic neuron receiver circuit utilizing an analog photon-to-fluxon  transducer, based on an SPD in parallel with a JJ, that couples flux to a storage loop. The synaptic weight can be enacted by changing the bias to the JJ. One thousand of these synapses can be inductively coupled to an integration loop and ultimately to a thresholding JJ. Designs for single-photon-sensitive receivers capable of operating on rate-coded signals as well as two-photon-sensitive receivers capable of operating on temporally coded signals have been discussed. Excitatory as well as inhibitory behavior has been simulated, and a hierarchy of loops for dendritic processing has been proposed.
An important area of future investigation regards the resilience of the proposed circuits to flux noise. There are several reasons why flux noise is less likely to be problematic in this context than in superconducting digital circuits or superconducting qubits. First, loop neurons are likely to utilize flux-storage loops with large inductance, and thermal current noise scales as L À1=2 . Second, during a synaptic firing event, tens to hundreds of flux quanta are generated. Therefore, thermal fluctuations that result in the production of order one flux quantum at a synaptic firing junction are of little consequence. This form of noise will lead to a reduction in the resolution of synaptic weights (reduced synaptic bit depth), but will otherwise maintain viability. Third, this form of information processing is classical, so issues related to noise in the phase of the superconducting wave function that are relevant to superconducting qubits are not pertinent to loop neurons. Fourth, many of the loops used in these computational circuits have L=r leak rates by design. Flux that may be trapped in loops during initial cooling will be dissipated rapidly. The synaptic storage loops (to be discussed in Sec. III) are intended to store memories without leak, and, therefore, these loops may be susceptible to unwanted flux trapping. However, even this effect may not be problematic, resulting only in a statistical distribution of initial synaptic weights. The plasticity mechanisms to be discussed next will adapt the network's synapses to a functional operating point. Nevertheless, these qualitative arguments regarding resilience to noise require more rigorous theoretical and experimental investigation, which will be the subject of future work.
In Sec. I, we argue that cognitive systems benefit from information integration across spatial and temporal scales. Temporal integration is achieved with a power-law distribution of neural oscillation frequencies. The receiver circuits presented in this work enable this functionality in at least two ways. First, they are fast and can detect photon communication events at 20 MHz and possibly faster. The brain oscillates at frequencies from 0.05 Hz to 600 Hz. 17 We assume that loop neurons can oscillate from 1 Hz to 20 MHz, and the actual range may be larger. While the human brain oscillates at frequencies spanning 4 orders of magnitude, these receivers could contribute to oscillations across 7 orders of magnitude or more, indicating potential for information integration across very large networks 28 (see Sec. V C 3). The second manner in which these receivers are well suited to achieving a power-law frequency distribution is that their oscillatory response is tunable, so each neuron can participate in a broad range of oscillations. This tunable response is achievable by changing synaptic weights as well as the threshold of the JJ in the NR loop or DR loops via bias currents. Tunability also results from changing which synapses are effective at a given time using inhibition and dendritic processing. Such dynamic effects in synapses and neurons in the brain are crucial for maximally utilizing the time domain for information integration. 18 Finally, we point out that while the circuits presented here utilize photons for communication and to trigger synaptic firing events, similar functionality is achievable using only fluxons. The SPD in Fig. 2(b) can be replaced by a JJ. The same techniques of utilizing a hierarchy of integration loops, dendritic processing, and synaptic weighting can be used in those circuits as well. Achieving the communication necessary for large networks 107 will be cumbersome with purely electronic circuits. Yet, such neurons may fire at rates beyond 10 GHz with very low power consumption when driving up to %20 synaptic connections. Networks combining electronic and optoelectronic neurons extend the power-law degree distribution to lower degree and the power-law frequency distribution to higher frequency. Purely electronic, JJ-based neurons and synapses have been proposed 108,116,117 and demonstrated. 118,119 We point out how the circuits presented here can be converted to purely electrical neurons to illustrate the continuity of electronic and photonic implementations and to show that networks with both electrical and optical neurons working in conjunction based on the same neural principles and fabrication process can be achieved.

III. SYNAPTIC PLASTICITY
The synaptic weights between nodes of a neural system are crucial memory elements that affect dynamics and computation. 18,51,80,81 In the loop neurons under consideration, the photon-to-fluxon transduction that occurs during a synaptic firing event is implemented with an SPD in parallel with a JJ, as described in Sec. II. To change the number of fluxons generated during the synaptic firing event, one can simply change the current bias across J sf . The circuits presented in this section are designed to dynamically modify the current bias I sy to J sf (see Fig. 2). We refer to the circuits that modify I sy as synaptic update circuits.
In general, there will be a chosen weakest synaptic strength and strongest synaptic strength at each synapse, and in general, the weakest synaptic strength may be achieved with I min sy . 0. Thus, it is the goal of a synaptic update circuit to vary I sy over some range I min sy I sy I max sy . In certain contexts, it is sufficient for I sy to only be able to take two values, 120 while in other learning environments, it may be advantageous to access many values of I sy between I min sy and I max sy . In Sec. II, we identified I min sy ¼ 1 μA and I max sy ¼ 3 μA. The circuits described in this section modify I sy in either a supervised manner using JJs or an unsupervised manner using SPDs in conjunction with JJs. For supervised operation, controlled inputs are presented to the system, and the system provides an output. The output from the system is compared to a desired output, and an error is calculated based on a cost function. This error is then used to update the configuration of the system, often through backpropagation. 121 The objective of supervised learning is usually to train the hardware to perform a specific task. 122 For larger neural systems performing general cognitive functions, it is advantageous to operate in an unsupervised manner. Unsupervised learning often refers to the process of learning to categorize unlabeled data. Here, we use the term in a more general sense to refer to systems that learn without any supervisory control. In this modality, no outside entity modifies synaptic weights. Such an approach to learning is scalable in that a user is not required to calculate or adjust the network parameters, so systems with many degrees of freedom can be realized. Unsupervised learning requires that internal activity of the network be capable of adjusting the degrees of freedom to form a useful representation of the information it is expected to process. This operation is achieved through a variety of activity-based plasticity mechanisms, including spike-timing-dependent plasticity (STDP).
For either supervised of unsupervised learning, the memory update signals add or remove flux from a storage loop, which is inductively coupled to I sy . This loop is referred to as the synaptic storage (SS) loop, and the flux stored in this loop functions as the memory for the synapse.
For large-scale cognitive computing, we are interested in systems that will interact continuously with their environment, be capable of immediately assimilating new information, and also capable of remembering events as long as possible. Such competing memory demands are sometimes referred to as the adaptability-precision trade-off, 56 and the best-performing synapses in this regard are complex 54 and may have many stable levels. 55 In human subjects, memories have been observed to fade with a power-law temporal dependence. 123,124 It is difficult to do better than power-law forgetting with plastic synapses that continually adapt, 54 and simple synapses lose their memory trace most quickly. 55 Here, we show synapses with a number of stable states ranging from two to hundreds. These synapses have dynamically variable memory update rates, making the synapses suitable for power-law memory retention.
The circuits implemented to control I sy must meet several criteria: (1) transition between the minimum and maximum values of I sy should be possible with a specified number of increments to control the learning rate; (2) the circuit should not be able to set I sy outside of this range so that simple update rules or training algorithms do not result in excessively large synaptic weights; (3) it should be possible to cycle the value of I sy from minimum to maximum and back repeatedly without degradation; (4) in addition to a means by which the synaptic weights can be incremented by an external supervisor, there should be a means by which correlated photon signals from the two neurons associated with a synapse can strengthen or weaken the synaptic weight depending on the relative arrival times of the signals from the two neurons; (5) within this unsupervised mode of operation, synaptic update events should be induced by single-photon signals to fully exploit the energy efficiency of superconducting optoelectronic hardware; and (6) the transition probability between synaptic states should also be dynamically adjustable based on photonic signals to achieve metaplastic behavior. This section explores circuit designs satisfying these criteria.
Qualitative explanation of the memory update process is shown in Fig. 8(a) for supervised learning and in Fig. 8(b) for unsupervised learning. For the simplest supervised binary synapse, a flux-quantum memory cell can be used to switch between the strong and weak synaptic states in 50 ps. This binary design can be extended to a multistable synapse that can modify the synaptic weight between the fully potentiated and fully depressed states with hundreds of stable intermediate levels, and implementations with more or less resolution are straightforward to achieve. For unsupervised learning, we consider a circuit that can implement a Hebbian learning rule that potentiates a synaptic connection using one photon from the presynaptic neuron and one photon from the postsynaptic neuron. We generalize this circuit to implement full STDP wherein a synaptic weight can be either potentiated or depressed based on timing correlations. This STDP circuit uses single-photon signals at four ports. Implementations of short-term plasticity, homeostatic plasticity, and metaplasticity are also possible. By combining these synaptic update circuits, it is possible to realize neurons with a distribution of synapses that update at different rates as well as ensembles of neurons wherein different neurons store information about different stimuli learned at different times, thus achieving a network with rapid adaptability and long memory retention times necessary for cognition.

A. Supervised learning
At the present stage of development, it is not clear which application spaces will be best served by superconducting optoelectronic hardware. The main emphasis of this work is on general cognitive systems, yet we begin the exploration of synaptic plasticity with supervised synapses to explore the possibilities and because the unsupervised circuits are straightforward extensions of these supervised circuits. To implement supervised learning, we would like to control the flux stored in the SS loop using simple control signals, which we take to be square current pulses. For many applications in machine learning, neural networks, and neuroscience, synapses are treated as binary elements that can switch between strong ( potentiated) and weak (depressed) states. 55,120,125 In Sec. II, it was shown that in a superconducting optoelectronic loop neuron, changing I sy from 1 μA to 3 μA changes the contribution to the neuron's integrated signal by a factor of 15. The circuit for enacting a binary synapse is shown in Fig. 9. For systems with many neurons each with many synapses, we would like to use a single current source to establish the baseline synaptic bias to all synapses (I 1 in Fig. 9), keeping in mind that we may need the baseline synaptic bias to be different for different synapses. This can be achieved by using a single current bias, I 1 , and using mutual inductors to couple this current to each synapse. The synaptic firing circuit is thus biased by a superconducting loop, referred to as the synaptic bias (SB) loop, and the objective of the synaptic update circuit is to change the current in the SB loop, also through mutual inductors. This concept is shown in Fig. 9, where the SB loop is coupled to both the main bias, I 1 , and the dynamic synaptic bias based on the flux trapped in the SS loop. All circuits presented in the remainder of this section provide a means to adjust the flux stored in the SS loop.
The two-JJ circuit of Fig. 9 is a standard flux-quantum memory cell 65,66 coupled to the SB loop via a mutual inductor. 84 When there are no fluxons in the SS loop, I sy ¼ 1 μA, the minimum value. In this state, the bias currents (I b1 ss and I b2 ss ) are chosen such that a weakening synaptic update signal (I À ) cannot add a fluxon to the loop, so the synaptic weight cannot be further depressed. A strengthening signal can, however, switch J þ su and add one fluxon to the loop. This transitions the circuit to the potentiated state, wherein I sy ¼ 3 μA. At this point, further potentiating signals cannot add additional flux to the loop. The loop can store only a single fluxon, and it is characterized by 66 The junction and circuit parameters are given in Appendix A. These parameters are typical for superconducting electronic circuits and straightforward to realize in hardware. Figure 10 shows WRSpice simulations of the temporal behavior of the circuit as it switches between states. In Fig. 10(a), the circuit is initially in the depressed state. A pulse of 10 μA drives the circuit to the potentiated state. Repeated current pulses do not switch the state, and after the input pulses cease, the cell holds the value of I sy . Upon the application of a single 10 μA pulse into the weakening port (I b2 ss ), the circuit switches back to the depressed state, and repeated applications of this signal do not further switch the circuit.
In Fig. 10(b), we show the synapse switching between the depressed and potentiated states every 50 ps. The time scale of Fig. 10 is extremely fast compared to biological neural circuits. The speed of these circuits offers intriguing possibilities. Figure 10(c) shows a temporal zoom of a full cycle of the binary synapse occurring within 50 ps. The operation considered in Fig. 10 is intended to show the operating range of the synapse, but in practice repeated switching in this manner is unlikely to be useful for neural operation.
For deep learning in neural networks, it can be advantageous to increment the synaptic weights in small steps. To achieve fine weight update, a superconducting loop capable of storing more than one flux quantum is utilized, as shown in Fig. 11. Flux quanta can be added one by one using DC-to-SFQ converters. 65,66 The binary synapse has been modified to include two DC-to-SFQ converters: one for potentiating and one for depressing. When a fluxon is produced by the potentiating DC-to-SFQ converter by the introduction of a current pulse, I þ , the fluxon is added to the SS loop. When a fluxon is produced by the depressing DC-to-SFQ converter by the introduction of a current pulse, I À , the fluxon counter propagates in the SS loop. The inductors of the SS loop, L ss and M ss , can be chosen over a broad range of values to determine the learning rate and range of synaptic weights achieved.
Controlled increase of synaptic bias current is again demonstrated using WRSpice. The results are shown in Fig. 12. In this calculation, a periodic square wave drives the DC-to-SFQ converter with 10 μA pulses of 1 ns duration and 2 ns period. Current is added to the SS loop in fluxon increments over many input cycles [ Fig. 12(a)]. In this case, the value of I sy before any flux has been added to the SS loop is 2 μA, chosen to be in the middle of the operational range identified in Sec. II. For this calculation, the inductance of the SS loop is 200 nH (β L =2π ¼ LI c =Φ 0 ¼ 3:8 Â 10 3 ), leading to the addition of 2.5 nA to I sy with the addition of each fluxon to the loop. This value of inductance (and, therefore, ΔI sy ) can be chosen over a broad range to set the synaptic update increment and number of synaptic levels. This value was chosen to create a SS loop that can store over 1000 fluxons between the minimum and maximum values of I sy .
The inset of Fig. 12(a) shows the behavior of I sy as a function of time as it is potentiated to saturation. A fluxon is added to the loop every 2 ns. After approximately 500 fluxons have been added to the loop, the value of I sy saturates just above 3 μA. This saturation behavior is advantageous so that a learning algorithm cannot cause a synaptic weight to grow without bound. Figure 12(b) shows I sy as a function of time as the potentiating and depressing DC-to-SFQ converters are alternately employed, analogous to the two drives of the binary synapse in Fig. 9. For these calculations, an SS loop with 20 nH inductance was considered to reduce the time required to achieve saturation. Initially, I sy ¼ 2 μA. Fluxons are added to the SS loop for 200 ns, and I sy reaches its maximum value of 3:2 μA. Once the SS loop reaches saturation, the value of I sy cannot be increased. The figure further shows that after the synaptic strengthening drive is turned off, I sy maintains its value (i.e., during the time from 200 ns to 250 ns), a feature enabled by the use of dissipationless superconducting circuits. After 250 ns, fluxons of the opposite sign begin to be added to the SS loop via the synaptic weakening drive (I À ), and I sy can be driven down to the minimum value (800 nA in this case). Cycling these drives results in the periodic behavior seen in Fig. 12(b). During each strengthening and weakening cycle, I sy vs time has two regions with different slopes. This is because when the current in the SS loop is outside a certain range, the DC-to-SFQ converter releases two fluxons per drive cycle. This characteristic is likely of little consequence and may be eliminated with improved circuit design, possibly by separating the DC-to-SFQ converter from the SS loop with a JTL. The numerical simulation of the circuits through periodic cycling (Figs. 10 and 12) is intended to demonstrate the range of synaptic states, the transitions between them, and saturation behavior. In practice, the synapses are unlikely to be cycled in this manner and will instead be updated as needed by the learning algorithm.
The circuits of Figs. 9 and 11 have several strengths when used to establish the synaptic weight of a superconducting optoelectronic neuron. The nature of the flux-storage Josephson circuits enables modifying the synaptic weights as many times as necessary without material degradation. The maximum and minimum values of I sy can be designed to achieve a broad range of operating conditions. Upon reaching the maximum and minimum values, the device saturates, eliminating the possibility of runaway values of synaptic weight. Synaptic update can be carried out in a specified number of increments based on the choice of inductance of the SS loop. The size of these increments will contribute to the learning/forgetting rate of the synapse.
While these characteristics of the circuits are conducive to implementing a variety of training algorithms based on back propagation, 121 we would also like to enable systems that learn using only activity within the network. We next consider a Hebbian learning circuit, which strengthens the synaptic weight between two neurons that fire in succession. This will lead to the discussion of a circuit achieving STDP based on the timing between pre-and postsynaptic activity.

B. Hebbian update
The Hebbian update circuit under consideration is shown in Fig. 13(a). The operation of this circuit is based on a similar principle to the supervised learning circuits discussed in Sec. III A in that the synaptic bias current I sy is adjusted based on the flux stored in the SS loop. The DC-to-SFQ converter of Fig. 11 can be replaced by SPDs to enable flux to be added to the SS loop based on temporally correlated photonic activity within the network. The circuit of Fig. 13(a) implements a Hebbian update rule that potentiates a synaptic connection between pre-and postsynaptic neurons when the presynaptic neuron contributes to the firing of the postsynaptic neuron. 41 The Hebbian rule requires a two-photon temporal-correlation circuit, like the temporal-code receiver of Sec. II, except the asymmetry of Hebbian update requires an asymmetrical initial bias to the two SPDs. Operation of the Hebbian update circuit can be described qualitatively as follows. When no photons have been detected, the bias I spd is directed through SPD 1 . The resistor r 1 ensures that SPD 2 is unbiased until SPD 1 receives a photon, and, therefore, photons incident on SPD 2 have no effect on the circuit  unless they are incident during a time window following a detection event by SPD 1 . Once a photon has been detected by SPD 1 , I spd is redirected to I 2 and I 3 . The current returns to I 1 with a time constant of τ 1 ¼ L 1 =r 1 . If a photon is detected by SPD 2 during τ 1 , I spd is predominantly redirected to I 3 , which can be sufficient to switch J su , the synaptic update JJ, perhaps many times depending on the bias currents, I spd and I su , and the difference in arrival times between the two photons, Δt. Details of circuit design are included in Appendix A.
In Fig. 13(b), we analyze the current added to the SS loop as a function of the delay, Δt, for four values of I su with I spd fixed at 10 μA. We plot the change in current in the SS loop (ΔI ss ) as a percentage of the SS loop saturation current (I sat ss ) during Hebbian update events characterized by delay Δt. The amount of synaptic weight modification depends strongly on the temporal delay, dropping to zero after roughly τ 1 (50 ns in this case). We also see that the effect depends on I su , enabling the memory update rate to be dynamically adjusted during operation via a DC bias current. This dependence on I su provides a means to implement metaplasticity. The quantity ΔI ss =I sat ss represents the fraction of the synapse dynamic range that is acquired in an update event. Although the current in the SS loop (and, therefore, I sy ) can only change by an integer number of flux quanta, the use of high-kinetic-inductance flux-storage loops wherein thousands of flux quanta can be stored makes this effectively an analog circuit. For the SS loop investigated in Fig. 13(a), β L =2π ¼ 1:9 Â 10 4 .
During circuit operation, we assume that when the presynaptic neuron fires a photonic pulse, one or more photons will reach a synaptic firing circuit of the postsynaptic neuron and bring the neuron closer to its threshold, as discussed in Sec. II. We also assume that additional photons have a probability of reaching SPD 1 of the synaptic update circuit shown in Fig. 13(a) to perform the first step in implementing the Hebbian rule. This photon is labeled "1" in Fig. 13(a). The probability of reaching SPD 1 may be controlled to modify the learning rate. Similarly, it is assumed that during a neuronal firing event, the local neuron will send photons not only to its downstream connections but also to its local synaptic update circuits to activate learning by striking SPD 2 . This photon is labeled "2" in Fig. 13(a). This selffeedback is also illustrated in Fig. 1.
Hebbian learning rules may be based on average firing rates of pre-and postsynaptic neurons or on timing between individual spikes from these neurons. 39 Here, we consider the latter. A timing-dependent learning rule often takes the form of exponential decay as a function of the difference in arrival times of pre-and postsynaptic signals. The form shown in Fig. 13(b) is slightly different due to Josephson nonlinearities. This modified temporal dependence is likely of little consequence as it maintains the principal function of timing-dependent plasticity, which is to modify the synaptic weight based on temporal correlation within a specified time window surrounding a neuronal firing event.
While the quantity ΔI ss represents the change in synaptic weight due to one Hebbian update event, the area under the curves in Fig. 13(b) will be related to the learning rate when averaged over many events, because the delay between the two photons, Δt, will vary across events. The integral of the curve with I su ¼ 35 μA is 3.6% of the integral of the curve with I su ¼ 38 μA. For I su ¼ 36 μA, the value is 18%, and for I su ¼ 37 μA, the value is 48%. The learning rate can be dynamically adjusted across a broad range via I su .
C. Spike-timing-dependent plasticity Learning rules that can both strengthen and weaken the synaptic connection are required for neural computing. Spike-timingdependent plasticity requires the Hebbian potentiating operation described in Sec. III B and also a depressing operation wherein a neuronal firing event at the postsynaptic neuron followed closely by a neuronal firing event at a presynaptic neuron depresses the synaptic weight between the two neurons. A circuit capable of producing this STDP is depicted in Fig. 14(a). The full circuit with the STDP module delivering I sy to J sf is shown in Fig. 23. Much as strengthening and weakening were accomplished in Sec. III A by adding a mirror image of the strengthening circuit to the SS loop, here we duplicate the Hebbian circuit of Sec. III B to achieve STDP. The similarity of the SPD circuit of Fig. 14(a) and the JJ circuit of Fig. 11 is apparent. In Fig. 14(a), each SPD is assumed to be connected to a different photonic port. Two of the ports receive photons from the presynaptic neuron, and two of the ports receive photons from the postsynaptic neuron. In this circuit, photons coming from the presynaptic neuron are drawn coming from the left, and photons from the postsynaptic neuron are drawn coming from the right.
The symmetry between the strengthening and weakening receiver circuits in the STDP circuit of Fig. 14(a) is broken based on whether the SPD that is biased in the steady state receives photons from the presynaptic or postsynaptic neuron. In the synaptic-weakening receiver circuit, a postsynaptic photon detected by SPD 3 followed by a photon from a presynaptic neuron detected by SPD 4 introduces counter-circulating flux to the SS loop. The time constants and biases of the strengthening and weakening receivers can be adjusted independently.
To demonstrate the feasibility of implementing STDP, Fig. 14 photon arrives before the presynaptic neuron, and in this event, the synaptic weight is weakened. In both the potentiating and depressing cases, this model indicates the synaptic update abruptly drops to zero when the photon arrival times are in the reverse order. In practice, this effect would be smeared out by the timing jitter of the light sources and detectors, and the synaptic update represented in Fig. 14(b) would be convoluted with the source and detector jitter when averaged over many firing events.
To gain intuition regarding the dynamical operation of the STDP circuit, Figs. 14(c) and 14(d) illustrate the circuit in operation, as simulated with WRSpice. Two synaptic strengthening events and two synaptic-weakening events occur. The currents associated with synaptic strengthening and weakening, I þ and I À , are shown in Fig. 14(c) with colors related to the currents labeled in Fig. 14(a). I sy delivered to J sf is shown in Fig. 14(d). A synaptic strengthening event occurs with Δt ¼ 20 ns, followed by a weakening event with Δt ¼ 10 ns and another with Δt ¼ 25 ns. A final strengthening event occurs with Δt ¼ 5 ns. The synaptic bias current, I sy , is observed to respond as expected based on the Hebbian analysis in Sec. III B. In this calculation, L ss ¼ 20 nH, and we mention again that the amount of current added to I ss and, therefore, I sy during a synaptic update event can be scaled with L ss in hardware and with I su dynamically. The memory update rate of the STDP synapse can be controlled by adjusting the frequency of photon absorption events.
While crucial to learning and the interplay between the structure and function of neural systems, STDP is only one of many synaptic plasticity mechanisms. Despite their significance, we do not discuss short-term plasticity, homeostatic plasticity, or metaplasticity here. Further investigation of these circuits will be the subject of future work.

D. Discussion regarding synaptic plasticity
This section has explored synaptic update circuits capable of delivering a variable synaptic bias current to the synaptic firing circuits presented in Sec. II. Manipulation of the synaptic weight through external input of square wave pulses and via photondetection events has been simulated.
For STDP, the synaptic update circuits described here provide ports for four photons: one strengthening photon from both the presynaptic and postsynaptic neuron, and one weakening photon from both the presynaptic and postsynaptic neuron. For a single synaptic strengthening or weakening event, two of these photons must be present. When optically implementing a synaptic update rule based on timing correlation, it is difficult to achieve a circuit requiring fewer than two photons.
Other forms of photonic synapses have recently been developed and offer utility in multiple neural contexts. [126][127][128][129][130][131] One can leverage phase shifts in microrings 127,128 or Mach-Zehnder interferometers (MZIs) 129 to adjust synaptic weight. Thermal tuning is often employed to implement the phase shifts. Thermal tuning requires more power than is suitable for this hardware platform. Phase shifters may also be large if MZIs are used, and phase shifters may require exotic materials that limit scaling if electro-optic effects are leveraged. If different synapses are addressed with different frequencies of light, the out-degree of a node in the network is limited by the multiplexed channel spacing. Approaches using MZIs for weighting and routing have the disadvantage that STDP cannot be implemented, because modifying a single phase shifter in the network affects many synaptic weights. One approach to synaptic weighting in the photonic domain utilizes a variable optical attenuator at each synaptic connection. Phase-change materials have been employed as such variable attenuators, 130 and the absorption of phase-change materials can be affected with pulses of light, thus introducing a Hebbian-type synaptic weight update process. While such an approach may be useful for certain types of neural circuits, update of these synapses requires too many photons to be useful for the neural computing scheme developed here (billions of photons per update operation for phase change vs single photons for superconducting optoelectronics). Further, making use of the arrival times is difficult with phase-change materials. It remains to be seen if other synaptic operations such as short-term plasticity, homeostatic plasticity, and metaplasticity can be achieved with phase-change materials.
In general, synaptic weights that attenuate a signal in the optical domain require more light from neuronal firing events, and many photons are simply absorbed at weak synapses. By contrast, using photons for communication but weighting in the superconducting domain uses fluxons to change the synaptic weight, and they can be generated with orders of magnitude less energy than photons. While all of these approaches to synaptic weighting may be useful in different contexts, we have developed the synapses presented in this work based on simultaneous considerations of power, complexity, scalability, speed, and size in the context of the superconducting optoelectronic hardware platform.
An important weakness of the synapses presented here is they lose all memory when superconductivity is broken. The neuromorphic system must remain below T c to preserve what has been learned. This class of Frosty the Snowman memory may be augmented by devices that can be heated, such as magnetic Josephson junctions. 117,119,132 It would be appealing if the state of memory in the plastic synapses described here could be transferred to longterm magnetic memory, perhaps during a sleep phase.
Another potential challenge for this type of loop neuron memory is flux trapping. The SI loops discussed in Sec. II are likely to include resistors to give a leak rate. Trapped flux in those loops will be less problematic. The SS loops that set the synaptic weights are intended to store flux for a long time to maintain memory, so they will not include resistors. Trapped flux will produce variations in the initial synaptic weights across an ensemble. For binary synapses, this will result in some synapses being initialized with strong synaptic weight and some with weak. For SS loops with high inductance, stray flux will induce a small current, so the perturbation may be small relative to the dynamic range of the synapse. For large ensembles of synapses, the statistical variation may be tolerable or even advantageous. If flux proves problematic, techniques used to shield superconducting qubits can be employed. 133 In Sec. I, we argue that a dynamical system capable of differentiated processing and information integration across spatial and temporal scales underlies cognition. In Sec. II, we introduced the relaxation oscillators and dendritic processing loops capable of implementing the temporal synchronization operations necessary for integrating information in time. Network synchronization and synaptic plasticity are mutually constructive phenomena, in that synaptic strengthening through spike timing is more likely to occur when the firing of two neurons is correlated, and the strengthened synapses, in turn, make the correlated neurons more likely to synchronize. Networks with small-world structure 13,24 and dynamics characterized by self-organized criticality are crucial to achieving information integration. Hebbian learning rules and STDP have been shown to convert random networks into small-world networks and to give rise to self-organized criticality. 34,51 Creation of hardware capable of supporting complex networks and synaptic learning mechanisms will provide a powerful tool for the investigation of the relation between critical network dynamics and cognitive function. We have shown some of the complex synaptic behaviors necessary for rapid adaptation, long-term memory retention, and synaptic update based on network activity. Networks of neurons connected by these synapses will be capable of integrating information learned at many times in many contexts in a single dynamical state.

IV. TRANSMITTER CIRCUITS
Synaptic receivers must partner with a light source to generate a neuronal firing event when the integrated current reaches a threshold. In Sec. II, we discuss how a JJ could be used as a thresholding element. In this section, we show that the flux quantum generated by a thresholding JJ can trigger an amplification sequence resulting in an optical pulse containing 1-100 000 photons. These superconductor/semiconductor hybrid circuits achieve electrical-to-optical transduction and facilitate communication with high fan-out and light-speed delays in complex networks of superconducting optoelectronic neurons. We refer to these amplifier circuits as the transmitter of the neuron.
A central technical challenge in designing superconducting optoelectronic hardware is to produce optical signals at telecommunication wavelengths with superconducting electronic circuits. The superconducting energy gap 64 is in the millivolt range, making it difficult for superconducting circuits to produce the one volt needed to appreciably alter the carrier concentrations in semiconductor electronic and optoelectronic devices. This voltage mismatch makes it difficult for superconducting electronic circuits to interface with CMOS logic 66 and memory. 134 A common approach to increase voltage in superconducting circuits is to place JJs in series. 135 The achievable voltage scales as the superconducting energy gap multiplied by the number of junctions. Order of one thousand junctions must be utilized to drive the light sources we intend to employ. 1,63 While superconducting circuits operate at low voltage, they can sustain large current. Supercurrent can be converted to voltage with a resistor. A microamp across a megaohm produces a volt. A small meandering wire of a superconducting thin film can easily produce a megaohm resistance in the normal-metal state. Thus, we can convert a current-biased superconducting wire into a voltage source by switching the wire between the superconducting and normal states.
Such a voltage source is not commonly used in superconducting electronics because it tends to be slow and consume more energy per operation than a JJ. The advantages of superconducting electronics for digital computing are largely speed and efficiency, while the advantages of superconducting optoelectronic networks are largely communication and efficiency. While superconducting electronics aspire to operate above 100 GHz, superconducting optoelectronic networks comprising neurons firing up to 20 MHz would be a radical increase relative to the maximum frequency of 600 Hz in biological neural systems. 17,18,28 The goal of neural rather than digital computing liberates us to use superconducting devices slower than JJs for events that only occur once per neuronal firing. The speed and efficiency of JJs are still leveraged by loop neurons to distinguish synaptic efficacy during each synaptic firing event. Thus, we can use a switching element leveraging the superconductor-to-normal phase transition in a neural context, provided it operates every neuronal firing event and not every synaptic firing event.
An effective means of breaking the superconducting phase is to heat the wire locally. Thermal devices are generally slow and power hungry, but the small volume, small temperature change, and small heat capacity at low temperature 136,137 enable switching times on the order of a nanosecond with femtojoules of energy. While such an amplifier is not suitable for the high speed and low power of flux-quantum logic, [96][97][98]138 a single firing of the switch can produce thousands of photons, making it very useful and efficient in this neural context where each synapse can be activated by a single photon.
This high-impedance, phase-change switch is called an hTron. 69 The heating operation required to switch the hTron from the superconducting to metal state can be achieved through dissipation in a Joule heater. Fairly large current is required to provide sufficient power. While the hTron is equipped to deliver voltage to drive the semiconductor light source, the thresholding JJ will provide only fluxons when the neuron's threshold is reached. These fluxons are insufficient to thermally switch the state of the hTron. An intermediate current amplifier is required to convert fluxons from the thresholding JJ into current across the joule heating element of the hTron. For this step of the amplification process, an nTron 67 will suffice. The nTron is a three-terminal thin-film superconducting current amplifier. A small current in the constricted gate can exceed the local critical current and drive the path from source to drain normal. An impedance on the order of a kilohm can be produced quickly, and the current from the source to drain will be diverted to a load. In the present case, that load is the %10 Ω resistive element of the hTron.
A summary of the circuit operation is shown in Fig. 15. The schematic and circuit diagram are shown in Fig. 15(a), and the sequence of events in the operation of the transmitter is shown in Fig. 15(b). The NT loop described in Sec. II reaches threshold due to integrated current from synaptic firing events, and the thresholding JJ produces one or more fluxons. These fluxons enter the gate of the nTron, exceeding the gate critical current. The nTron source/ drain current is then diverted to the hTron gate. Joule heating in the hTron gate produces a temperature shift of several kelvins, switching the hTron to the high-impedance state. The hTron source-drain current suddenly experiences 1 MΩ, and the current rapidly switches to first charge the LED capacitor, and then, once sufficient voltage is achieved across the capacitor, current is driven through the diode. This current generates photons through electron-hole recombination. These photons are the optical source for neural communication.
This amplifier circuit is relatively complex and uses the most power of any part of the neural circuit. A more elegant solution may be possible. Any solution must meet several operating criteria: (1) the transmitter must threshold on a low-current signal from the NT loop; (2) the amplifier chain must convert this signal to the voltage necessary to produce light from a semiconductor diode; (3) the operation should happen at least as fast as the recovery of the single-photon detectors in the loop neuron synapses (a few tens of nanoseconds); (4) a number of photons appropriate to communicate with the neuron's synaptic connections must be produced; (5) the number of photons created must be dynamically variable with a bias current; (6) the energy of a firing event must be low enough that an ensemble of neurons in realistic network operation have power density low enough for heat to be removed by liquid helium; and (7) total power consumption must be low enough to enable scaling to neural systems with billions of interacting nodes. The transmitter circuits presented here satisfy these criteria. We begin with the design of the hTron driving the LED and work backward through the circuit of Fig. 15(a).

A. Driving the light-emitting diode
We must ensure that the voltage required to drive the LED can be sustained for a duration commensurate with the number of photons required. This duration will depend on the drive current, the capacitance of the LED, the efficiency of the LED, and the number of synaptic connections served by the neuron. Once the drive requirements of the LED are understood, we can proceed to design an hTron that can meet these drive requirements.
The LED circuit under consideration is shown in Fig. 16. The LED is modeled as a variable resistor in parallel to a capacitor. The variable resistor is modeled with the DC IÀV characteristic of a  Fig. 1. (b) Sequence of events during neuronal firing event. (i) Current threshold is reached in the neuronal thresholding loop, causing the relaxation oscillation junction to produce transient current to the nTron gate. (ii) Current from the relaxation oscillator junction drives the gate of the nTron normal, causing the nTron channel current to be diverted to the gate of the hTron. (iii) The current from the nTron through the gate of the hTron drives the channel of the hTron normal, resulting in a voltage pulse across the hTron. (iv) The LED is in parallel with the hTron, so the voltage across the hTron results in a voltage across the LED. This voltage is sufficient to forward-bias the pÀn junction to produce light. pÀn junction, as described in Appendix A. The hTron is modeled as a variable resistor in series with an inductor. This variable resistor is treated as a step function with zero resistance abruptly switching to 800 kΩ. This model is intended to capture the behavior of the hTron driving the LED as the superconducting channel of the hTron is driven above its transition temperature. A thermal model of the hTron is presented in Sec. IV B.
The equations of motion for the circuit of Fig. 16 are given in Appendix A. Solving these equations, we obtain the circuit currents and voltages as a function of time. From these quantities, we can determine the energy dissipated during a firing event as well as the number of photons produced. The number of photons produced is calculated as where η qe is the quantum efficiency of the diode and e is the electron charge. The efficiency of the circuit of Fig. 16 is calculated as where h is Planck's constant, ν is the frequency of a photon (taken to be c=1:22 μm 63 ), and E RC is the total energy dissipated by the RC circuit of Fig. 16 during a firing event as calculated with the model given in Appendix A. In calculating η RC , we use Eq. (1) with η qe ¼ 1.
At least two other factors will contribute to the efficiency of the circuit in Fig. 16. First, carriers injected into the pÀn junction may recombine nonradiatively. This loss mechanism is captured in the internal quantum efficiency, η qe . Second, light generated by electron-hole recombination events may not couple to the guided mode of the axonal waveguide and will, therefore, not couple to the synaptic terminals. This loss mechanism is captured in the waveguide coupling efficiency, η wg . The total LED efficiency is given by Circuit performance is shown in Fig. 17. Voltage transients across the pÀn junction are shown in Fig. 17(a), where two values of capacitance and two values of quantum efficiency are considered. For each of the four traces, the hTron pulse duration was chosen to produce 10 000 photons. We assume that 10 photons are generated per out-directed synaptic connection to compensate for loss, reduce noise, and implement learning functions. Thus, these voltage pulses are appropriate to neurons with out-degree of 1000. If the LED capacitance can be made as low as 10 fF, and η qe can be made as high as 0.1, the hTron gate must only be normal for 2.9 ns. Under these operating conditions, 25 fJ of energy is dissipated, of which 64% is dissipated as current through the pÀn junction (η RC ¼ 0:64). If the LED performance is worse, with C ¼ 100 fF and η qe ¼ 0:01, the drive must be applied for 29 ns. This operation dissipates 251 fJ of energy, also with 64% through the junction.
General trends for the number of photons produced by the circuit as a function of the time the hTron gate is above T c are shown in Fig. 17(b) for several values of capacitance and quantum efficiency. In these calculations, the bias current (I LED ) is fixed at 10 μA. The capacitance determines the minimum hTron pulse duration necessary to achieve the voltage for photon production, and the quantum efficiency determines the slope (y-intercept of this log-log plot) for longer hTron pulses.
As implied by the reach drive current depicted in Fig. 1, we would like a means to dynamically control the number of photons produced in a firing event. We can achieve this by varying the bias current diverted to the LED when the channel of the hTron becomes resistive, I LED . In Fig. 17(c), we show N ph as a function of I LED for various values of the LED capacitance in a case where the hTron channel is driven normal for 10 ns and the LED quantum efficiency is assumed to be 0.01. 70 This current bias provides a means to change the strength of neuronal activity. With fewer photons produced, the probability of reaching distant connections diminishes. With more photons produced, the neuron makes a stronger contribution to network activity. The synapses considered in this work do not decode the amplitude of the photonic signal, so the objective of changing the number of photons produced in a neuronal firing event is not to encode information, but rather to adjust how many synaptic connections are reached. The routing architecture can be designed so that with a given value of I LED some synapses receive 10 photons on average from a synaptic firing event, while others receive only one. By decreasing I LED , the strongest synapses will still receive more than one photon, while the weakest will receive less than one and are effectively eliminated. Thus, I LED provides an additional means to adapt the structural network into multiple functional networks at different times. This current bias may be modified by a supervisory user or by internal activity within the network.
The efficiency of the light-production circuit is considered with the rest of the transmitter circuit in Sec. IV D. Based on the calculations of this section, we know the hTron drive requirements FIG. 16. Circuit diagram of the LED driven by the hTron. For this analysis, the hTron channel is modeled as a variable resistor (r hT ) in series with an inductor (L hT ). The LED is modeled as a capacitor, C, in parallel with a variable resistor (labeled pÀn). The hTron variable resistor switches under the influence of heat produced by Joule heating in the gate resistive element, discussed in Sec. IV B (see also Appendix A). The LED variable resistor models the DC current-voltage characteristic of the pÀn junction (see Appendix A). The hTron channel is biased with I LED . When the channel is driven normal, r hT switches from 0 Ω to 800 kΩ. The current charges up the capacitor until the voltage is %1 V, at which point current begins to flow through the pÀn junction, producing light.
for light production across a range of capacitance and quantum efficiency values. If a neuron needs to produce 10 000 photons to communicate to 1000 synaptic connections, an hTron biased with a channel current of 10 μA must produce a resistance of 800 kΩ for 1 ns-100 ns, depending on the achievable LED performance. We now proceed to consider the operation of an hTron when driven by the current from an nTron.

B. The hTron voltage amplifier driven by the nTron current amplifier
To produce the voltage across the LED required to generate light, a resistance of 800 kΩ must be established rapidly and sustained for 1 ns-100 ns. One way to achieve this is to switch a length of superconducting wire to the normal-metal state. This can be straightforwardly accomplished by raising the temperature of a superconducting wire above T c . An hTron is a device that switches a channel resistance from zero ohms to a large value when current is driven through the gate. 69 The current through the gate dissipates power in a resistive element through joule heating, and this power locally raises the channel above T c . We show a schematic in Fig. 18(a), wherein a resistive layer is separated from a nanowire meander by a thin insulator.
While thermal devices can be too slow and consume too much power for some applications, there are several reasons why the hTron is suitable for the present purpose. First, the device has a compact footprint of 5 Â 5 μm 2 , so a very small mass must be heated. Second, the specific heat of all materials involved falls as T 3 , so at the desired operating temperature of 4.2 K, the specific heat is orders of magnitude smaller than at room temperature. Third, the required temperature swing is small (%2 K). These factors taken together make the hTron a suitable device to achieve the voltage necessary to produce light from a semiconductor diode with the power and speed required for the neural application under consideration.
To quantify the performance of the hTron when driving the LED of Sec. IV A, we consider the transient dynamics of the thermal circuit shown in Fig. 18(b). A heat source Q is delivered to the stack of materials shown in Fig. 18(c). The equations of motion and material parameters used in these calculations are given in Appendix A. Figure 19(a) shows temperature transients of the superconducting layer when power is delivered to the hTron gate long enough to drive the hTron channel normal for 1 ns and for 10 ns. In this plot, the hTron gate is driven with square current pulses to illustrate the temporal dynamics of the thermal components. In this model, the hTron can switch to the resistive state in roughly 1 ns. While this time scale is not suitable for many operations in superconducting digital electronics, it is more that fast enough for a neuronal firing event in the system under consideration.
While a square pulse is most efficient for driving the hTron, exponentially rising and decaying pulses will be generated by the nTron. 67 The nTron is a three-terminal, thin-film device. In the off state, a supercurrent flows from the source to drain, and the gate is in the superconducting state. The gate comprises a small constriction, and when the current delivered to the gate exceeds the critical current, it is driven locally to the normal state. Joule heating spreads the normal domain and quickly causes the channel between the source and drain to be driven normal across the entire wire width. This normal domain provides a few squares of resistance (roughly 1 kΩ), at which point the source-drain current is diverted to a load. In the present case, the load is the gate of the hTron [10 Ω in this study, the resistance of the upper layer in Fig. 18]. When the gate current to the nTron ceases and the channel returns to the superconducting state, the channel current returns with the L=r time constant of the system. Thus, the current pulse from the nTron to the hTron will have an exponential time dependence.
In Fig. 19(b), we show the temporal response of the hTron channel temperature when the gate is driven by an exponential current pulse of 1.2 mA amplitude from an nTron. In this case, the rise time of the pulse is 300 ps, and the fall time is 30 ns. These time constants are controlled by the L=r time constants of the circuit. The fall time is set by L nT =r nT , where L nT is the inductance of the nTron channel and r nT is the load of the nTron, which in this case is the gate resistance of the hTron. To achieve the power necessary to switch the hTron, 1.2 mA from the nTron is required across the 10 Ω of the hTron gate.
Driving the hTron with the exponential pulses of the nTron is far less efficient than driving with square pulses, because power continues to be dissipated in the exponential tail of the current pulse long after the temperature of the hTron channel drops back below T c . In the case considered in Fig. 19(b), the L nT =r nT time constant is 30 ns, and the hTron channel is held above T c for 4.7 ns. A square pulse of 6 ns would achieve the same duration of the hTron in the resistive state. Figure 20(a) illustrates this inefficiency. The required L nT =r nT time constant is plotted as a function of the time the hTron channel must be held above T c , referred to as t hot . The time t hot depends on the LED capacitance, efficiency, and the number of photons produced during the pulse, which is determined by the number of synapses formed by the neuron. The L nT =r nT time constant must be nearly ten times t hot . Improved drive circuit designs are likely possible, but we proceed with the single-nTron example to illustrate that even with first-generation circuit designs, sufficient efficiency can be achieved.
We wish to connect the number of photons produced in a neuronal firing event to the τ nT time constant that must be implemented in hardware. In Sec. V, we study networks with neurons making 20-1000 synaptic connections. We would like neuronal firing events to produce ten times as many photons as the neuron's out-degree to compensate for loss, reduce noise, and perform memory update operations. We are, therefore, interested primarily in neuronal firing events producing 200-10 000 photons, and large hub neurons may need to produce 100 000 photons or more. Such photon numbers will require the hTron channel to be driven normal for some duration, which necessitates current drive from the nTron to the hTron gate for some proportional duration [ Fig. 20(a)]. We, therefore, must calculate the number of photons produced by the LED as a function of the τ nT ¼ L nT =r nT time constant, taking the LED capacitance and efficiency into account. The results of these calculations are shown in Fig. 20(b). To produce 200 photons from an LED with 10 fF capacitance and 1% efficiency, τ nT must be 8 ns, and to produce 10 000 photons τ nT must be 100 ns. An nTron channel inductor achieving this time constant and carrying the requisite 1.2 mA necessary to drive the hTron gate can be achieved in an area commensurate with the area required for the synaptic wiring and routing waveguides comprising the dendritic and axonal arbors of the loop neurons. The next task is to show a JJ in the NT loop delivering the current to switch the gate of the nTron.

C. Detecting neuronal threshold
The circuit considered for detecting threshold in the NT loop is shown in Fig. 15(a). The parameters are given in Appendix A. When the current in the NT loop reaches the switching current of the thresholding junction, J th , that junction will transmit a fluxon to the relaxation oscillator junction, J ro . A relaxation oscillator junction has the property that, upon switching, it temporarily enters a latched state. In the latched state, the junction is resistive, and the bias current is diverted to a load. A relaxation oscillator junction can be physically implemented by utilizing only the internal shunting of a superconductor-insulator-superconductor junction, resulting in a hysteretic current-voltage relationship. 65 The relaxation oscillator junction utilized here is designed to enter a transient resistive period upon arrival of a fluxon from J th . The current from J ro switches the gate of the nTron, diverting the nTron channel current to the gate of the hTron.
Spice simulations of the thresholding event are shown in Fig. 21. For these simulations, Cadence was used. In Fig. 21(a), the thresholding junction switches, producing a current pulse from the relaxation oscillator junction. In Fig. 21(b), the gate of the nTron switches, and the nTron channel current is diverted to the gate of the hTron, returning with the τ nT time constant.
The series of events shown in Fig. 21 comprises the detection of neuronal threshold (which occurs when J th switches), and the subsequent current amplification, beginning with the switching of J ro , and leading to the switching of the nTron. The 1.2 mA current pulse coming from the nTron and driving the gate of the hTron is shown in Sec. IV B to be sufficient to switch the hTron, resulting in a voltage pulse and light generation from the LED. The L=r time constant in this simulation was 50 ns, corresponding to 500 nH in series with the 10 Ω of the hTron gate. Figure 20(b) shows this nTron time constant is sufficient to produce more than 3000 photons if the LED capacitance is 10 fF and the LED efficiency is 1%. Increasing the nTron inductance can extend the pulse duration and thereby produce more photons. With the material and device parameters used in this model of the nTron, 50 ns recovery was sufficient to keep the nTron from latching, and shorter recovery times may be possible, depending on electrothermal device engineering. As discussed in Sec. IV B, a square rather than exponential current pulse would have a significant impact on the light-production efficiency. Such operation may be achieved with cascaded hTrons in place of the nTron.
Having demonstrated threshold detection, current amplification, and voltage amplification, we have completed the description of the amplification chain of the neuronal transmitter circuit. We next discuss the efficiency of the amplifier chain.

D. Photon production efficiency
In Secs. IV A-IV C, we describe circuits capable of detecting threshold in the NT loop, amplifying the signal, and producing the voltage necessary to generate light from a semiconductor diode. We have calculated the energy consumed by each element of the amplifier chain when generating the number of photons necessary for neuronal communication in realistic networks. These networks will comprise a variety of neurons with a range of numbers of synapses. The number of out-directed synapses made by a neuron is referred to as the out-degree, k out . The number of photons that must be produced in a neuronal firing event depends on k out , and, therefore, so does the energy of a neuronal firing event, which we denote by E out .
In an ideal case, a neuron could produce one photon per synaptic connection with unity production efficiency, the photons would reach their destination without loss, and they would be detected at the synapse with unity detection efficiency. In this case, we would have E out ¼ hνk out . In an actual network implemented in hardware, waveguides will have propagation loss, and detectors will have efficiency less than unity. To account for these loss mechanisms, a neuron will produce a number of photons greater than one per synaptic connection. We refer to the number of photons produced per synaptic connection as ζ. If ζ is too large, neurons are wasting power. If ζ is too small, communication will be unreliable. The synapses of Sec. II have the same response if they receive one or more photons. Therefore, the noise is not shot noise, as it would be if the synapse were attempting to detect the precise number of incident photons.
The communication error can be calculated from the Poisson distribution. Given an average number of incident photons on a synapse due to an upstream neuronal firing event, we use the Poisson distribution to calculate the probability that a synapse will receive zero photons due to a neuronal firing event. If the average number of incident photons is five, the probability that the synapse will receive zero photons is less than 1%. A neural system is likely to be able to tolerate this level of error. 75 In the case of lossy waveguides with low-efficiency detectors, 3 dB loss may be incurred between a neuron and a synaptic target. We thus consider ζ ¼ 10 to be a representative number to use in calculations of network power consumption.
In addition to propagation loss and detector inefficiency, the circuits described in this work are not entirely efficient at producing photons. The total energy consumed by the amplifier chain during a neuronal firing event is given by  Figure 22(c) shows the total amplifier efficiency, η amp . We see that driving the hTron with the nTron [ Fig. 22(b)] dominates power consumption, yet the amount of time the hTron must be on is determined by the emitter capacitance and efficiency, and, therefore, emitter improvement is necessary for system improvement.
The circuits can also be made more efficient with improved thermal design of the hTron (through phonon localization) and current pulse shaping into the hTron gate. In practice, achieving small, waveguide-integrated LEDs with low capacitance should be possible. A simple parallel plate model indicates 1 fF should be achievable, and 10 fF should not be particularly challenging, even with wiring parasitics. In Fig. 22, we consider values as poor as 100 fF. The quantum efficiency of the device is harder to predict. Waveguide-integrated light-emitting diodes with efficiency near 0.01 have been demonstrated, 70 and low-temperature operation helps significantly in this regard. We expect quantum efficiency of 0.01 to be achievable at large scale, but in Fig. 22, we consider values as poor as 10 À3 . Considering an LED with 10 fF capacitance or less, if the LED efficiency is as poor 10 À3 , the total photon production efficiency of the amplifier chain is η amp % 10 À4 when more than 200 photons are produced. For calculating the power consumption of the networks described in Sec. V, we use Eq. (2) with ζ ¼ 10 and η amp ¼ 10 À4 .

E. Discussion regarding transmitter circuits
With the design of the transmitter portion of the circuit, the neuron model presented in this paper is complete. The full circuit is shown in Fig. 23. The analysis presented here points to two conclusions. The first conclusion is that to produce photons with short pulses from the hTron, it is necessary to fabricate LEDs with low capacitance. Other integrated photonic applications find that capacitance dictates the scale at which light becomes advantageous. 139 The second conclusion is that the overall photon production efficiency is limited by the internal quantum efficiency of the LED when producing large numbers of photons. As argued in Sec. I, cognitive neural systems are likely to utilize some neurons with relatively small out-degree making only local connections as well as other neurons with high degree making local and longrange connections. The value of capacitance achievable by the LED will determine the lowest degree practical to implement with photonic connectivity.
While capacitance limits utility for low-degree operation, internal quantum efficiency determines system power consumption dominated by neurons with high degree. The maximum achievable quantum efficiency will depend on emitter design, materials employed, and fabrication optimization. The requirements for this approach to neural computing differ from the requirements of other integrated photonics applications. Emitters with carrier recombination times as long as 50 ns, the same as the SPD recovery time, can be tolerated. For many integrated photonics applications, such a light source would be too slow. The ability to produce light at multiple frequencies may be advantageous, but spectral coherence is not required. Compound semiconductors have these spectral and temporal properties, and they can be integrated with silicon waveguides 70 with high efficiency, particularly at cryogenic temperature. Yet, fabrication with compound semiconductors is more expensive and more difficult to scale than silicon. Cryogenic operation enables several types of silicon light sources, 63,140,141 which bring the advantage of simpler process integration. Whatever the material, sources providing incoherent pulses with 10 000 photons produced with an internal quantum efficiency of 10 À3 operating at 20 MHz at 4.2 K are sufficient to enable a massively scalable neural computing platform with connectivity comparable to the brain and 30 000 times faster speed.
It may not be necessary to incorporate a light source at each neuron. Perhaps, an architecture can be utilized with circulating light in waveguides that are tapped by a modulator during each neuronal firing event. Such a system would have the benefit of keeping light sources out of the cryostat and separate from superconducting circuits. In the case of shared off-chip light sources, the new challenges become to develop very low loss waveguides; compact, efficient modulators with low insertion loss that do not need to be tuned; and extensive fiber-to-chip coupling. Initial calculations indicate that such an approach is less efficient unless waveguides can be made with very low propagation loss, and modulators can be made with very low insertion loss. Yet, device and system limits are far from understood. Amplifier circuits similar to those presented here may be useful when driving the modulators of such a system.
The synaptic receiver and synaptic weight update circuits in the forms presented in Secs. II and III utilize separate photons for synaptic firing and synaptic update. In this work, circuit operation culminates by producing light from a single LED, yet it may be desirable to implement transmitters that fire different LEDs for the different operations. This is one way that the power used for synaptic firing events can be decoupled from the power used for synaptic update events. If different colors are used for each of these operations, the same waveguide routing network can be employed for the three signals (synaptic firing, synaptic strengthening, and synaptic weakening), and the demultiplexers located at each downstream neuron can separate the signals and route them locally to the three synaptic ports. Similarly, the spike-timing-dependent plasticity circuit of Sec. III requires not only photons from the presynaptic neurons, but also photons from the local, postsynaptic neuron. Two additional light sources may be useful at each neuron to be utilized locally for synaptic update. The schematic diagram in Fig. 1 shows these five light sources. While producing five light sources instead of just one adds hardware overhead, the light sources are extremely compact compared to the routing waveguides and inductors associated with the synapses. By utilizing independent light sources, it may be possible to reduce network area as well as power consumption.
The ability of a neuron to hold a refractory period is important for spiking operation. In the neuron under consideration, one way to achieve this is by cutting the bias to J th upon neuronal firing. This makes it impossible for J th to be driven above I c and can be straightforwardly implemented with feedback from the output of the nTron or hTron.
In the schematic of Fig. 1, we emphasize three main current biases that affect the operation of each neuron. A current bias into the synaptic update circuit of Sec. III affects the learning rate. A current bias into the neuronal threshold loop affects the neuron threshold. A current bias into the light emitter affects the number of photons produced in a neuronal firing event, which we refer to as the reach of the neuron. With each of these currents fixed, the neurons and the network will have rich spatiotemporal dynamics. Yet, a network with the ability to dynamically vary these currents will be capable of achieving further complexity over longer time periods. For example, with the learning rate current fixed, spike-timing-dependent plasticity occurs at a fixed rate. By changing this current, the network can make certain regions more adaptive at certain times, and it can make those regions maintain synaptic weights at other times. Similarly, with a fixed reach current into the transmitter, the neuron will address each of its downstream connections with a given probability. By changing the reach current, the number of photons produced in a neuronal firing event can be adjusted, and, therefore, the probability of reaching downstream connections can be tuned. Changing these bias currents is analogous to changing various neuromodulators in biological systems. The values of each of these neuromodulatory control currents can be modified with photonic or electronic signals set externally or based on the internal network activity. These neuromodulatory control currents provide powerful means to dynamically transform a given structural network into myriad functional networks.
As we have seen in this section, production of light during each neuronal firing event requires an amplifier chain that consumes considerable energy relative to the Josephson circuits comprising the synapses and dendrites of the neuron. The choice to communicate with light brings this energy cost and necessitates the integration of superconductors with semiconductors. These costs are important to consider as one weighs options for advanced cognitive hardware. It would be simpler if all computation and communication could be achieved with electronic circuits, but an all-electronic communication infrastructure that can feasibly interconnect large numbers of neurons at the scale necessary for cognition without traffic-based delays has yet to be proposed. While photonic communication adds to the energy per neuronal firing event as well as to hardware requirements, the analysis of this section indicates that the total energy of a neuronal firing event is still quite reasonable, and if silicon light sources can be demonstrated to meet the specifications of these circuits, hardware integration will be tractable. Large-scale neural systems will employ a wide variety of neurons with myriad structural and dynamical properties. The full potential of neural circuits based on superconducting optoelectronic hardware will only be realized with a long-term experimental and theoretical effort of a broad research community.

V. NETWORKS OF LOOP NEURONS
Neural computation depends critically on the structure of the network. 3 Whether the application of the network is for control, sensory processing, or cognition, efficient information processing requires the architecture to possess certain features. For example, in visual processing, local clusters of neurons must be differentiated to independently code for certain stimuli, yet the information from many such clusters must be combined at a larger scale to identify groupings of features and trends across a visual field. 45 Such differentiated local processing combined with broad integration repeats at multiple levels of hierarchy. 17,28 This hierarchical architecture leads to systems with fractal properties in space and time. 12,18 Such an architecture balances differentiated, local information processing with efficient integration of information across the system. 7 In this section, we begin to explore networks in which superconducting optoelectronic neurons can be connected to achieve the desired fractal network architectures.
We would like superconducting optoelectronic networks (SOENs) to meet several criteria: (1) physical instantiations must accomplish the networks in a manner that can be straightforwardly fabricated with conventional lithographic techniques; (2) the networks must achieve a hierarchical architecture that can be manufactured from the scale of a single die up to a 300 mm wafer; (3) for efficient information integration, systems at the die scale must contain hub nodes with thousands of synaptic connections, and at the scale of a wafer, which we would like to serve as a column in cortex, high-degree nodes with tens of thousands of edges must be possible; 107 (4) considering these systems as modules, we extrapolate to neural systems at very large scales, where it must be possible to connect modules with dense local clustering into systems of billions of neurons for human-scale cognition; and (5) the power density of these networks must be low enough for cooling with 4 He at 4.2 K to be utilized. In this section, we present SOEN designs satisfying these criteria.

A. Optical communication and network metrics
Optical communication between neurons provides three major strengths. First, because photons are uncharged and massless, optical interconnects have no capacitance, resistance, or inductance, enabling massive fan-out. Nodes with very high degree can be achieved without the need for multiplexed communication lines and signal arbitration. 120 Second, it is possible to send and receive single-photon signals, leading to communication with high energy efficiency. 61 Third, because light travels at the highest velocity in the universe, systems signaling with light can integrate the largest area of neurons with coherent oscillations. Yet, devices based on optical signals have an important disadvantage: the device size is difficult to shrink below the wavelength of light. While superconducting optoelectronic neurons can make many connections, operate with high energy efficiency, and integrate a large area with coherent oscillations, the total number of neurons that can cooperate coherently depends on the total area of the system (limited by the speed of light) divided by the area of the individual neurons (limited by the wavelength of light). To begin to assess the potential of SOENs for near-term technological applications, we must analyze the types of systems that can be achieved on a single die, such as 1 Â 1 cm 2 . To assess the ultimate scaling potential of these networks when limited by light-speed communication, we must analyze what can be achieved within an area limited by the distance light can travel in the period of a network oscillation. We consider both these scales here.
Whether for near-term technological applications or longterm cognitive systems, the networks we wish to employ are likely to share several characteristics. 12 For cognitive systems large and small, differentiated processing balanced with information integration across spatial and temporal scales is crucial for performance. 6,7,11,13,18,45,142 Several network theory metrics can be employed to assess the fitness of a network for neural computing. We focus on three metrics relevant to differentiated processing and information integration.
The first metric is clustering, which quantifies the prevalence of triangles in the network. Triangles refer to groups of three nodes connected by edges. High clustering enables information to be shared locally. To achieve differentiated information processing, we seek networks with a high degree of clustering compared to a random network with the same number of edges on the same vertex set.
The second network metric is average path length. For a given network, we can calculate the shortest distance from every node to every other node. Averaging this distance over all pairs of nodes yields the average path length. To achieve information integration across spatial scales, the average path length should be nearly as small as the corresponding random network. High clustering with short path length characterize a small-world network. 24 The third network metric is the degree distribution. The degree of a node refers to the number of edges it forms with other nodes. In the present work, we consider two degree distributions: power-law and delta-function. For information integration across spatial scales with efficient wiring, 143 the degree distribution of the nodes in the network may follow a power law, thereby achieving a scale-free network. 25 Small-world, power-law networks have efficient communication, a balance of differentiation and integration, fractal properties of self-similarity across spatial scales, and are observed in many natural settings, including the networks of the brain. 144 We also consider structures closer to random networks, characterized by Gaussian degree distribution, which are optimized for short path lengths. Highly connected random networks with short path lengths are ideal for associative memories, such as the hippocampus. We approximate the narrow Gaussian with a delta function, meaning that all nodes of the network have the same degree.
To achieve information integration across time, concepts of self-organized criticality 29,145 are pertinent. In the temporal domain, differentiated local processing combined with large-scale information integration result in a power-law frequency distribution of transient synchronized oscillations. 17,18 This 1=f behavior leads to fractal use of time as well as space, and gives rise to neuronal avalanches 30 and criticality thought to be necessary for information integration and cognition. [30][31][32]34 In the human brain, 1=f behavior is observed from 0.05 Hz to 600 Hz, spanning up to at least 20 MHz, potentially spanning many more orders of magnitude that the brain.
Here, we consider specific networks with high clustering, short average path length, and power-law degree distribution that are suitable for differentiated processing combined with hierarchical information integration. We approximate the area and power consumption of such networks when implemented with superconducting optoelectronic hardware, and we anticipate the scaling of these networks when limited by light-speed communication. We focus here on microarchitecture at the level of neuron connectivity with the goal of identifying scaling laws for systems achievable in the near term. To develop a correspondence between the adjacency matrix of the network 19 and the physical hardware that will perform the neural operations, we model the spatial extent of neurons and construct a routing scheme to calculate the area of waveguide interconnects (analogous to white matter in the brain).

B. Network construction
To achieve high clustering, short average path, and power-law degree distribution while setting ourselves up for straightforward, modular fabrication, we construct our adjacency matrix in a hierarchical manner, as shown schematically in Fig. 24. At the smallest network scale, neurons are tiled in a grid to form a local sector [ Fig. 24(b)]. Sectors can then be tiled to form regions. Regions can then be tiled to form modules. In the present work, we consider these three levels of hierarchy, but the algorithm for creating the adjacency matrix can be repeated indefinitely. Networks constructed in accordance with the schematic of Fig. 24(c) with dense intrasector connectivity, few connections between sectors within a region, and sparse connectivity across a module are modeled after the horizontal connections in minicolumns and columns within cortex. 3,22,44,45 Feed-forward, vertical connections are likely to be achieved by stacking such networks, either through sequential processing or multidie assemblies.
Networks with high clustering and average path length close to that of a random network can be constructed in many ways. Here, we use a network growth algorithm. The concept is that the adjacency matrix of a local sector of neurons is formed by adding neurons one by one to the pool, establishing edges based on spatial location as well as the degree of the existing nodes. Such an algorithm introduces winner-take-more development, leading to a few nodes with very high degree, thereby extending the degree distribution to larger values. The next level of hierarchy can be generated by tiling the adjacency matrix of the sector along the diagonal of a larger adjacency matrix representing the region, and forming intersector connections in a manner that again depends on space and node degree. Networks of increasing scale can be achieved by repeating this procedure, thereby achieving the fractal, power-law degree distribution conducive to efficient information integration across the physical substrate of the network. The spatial dependence gives high clustering, which leads to functional differentiation. Reciprocal connection rules contribute to clustering as well as re-entrant connections, 3 which give rise to transient temporal correlations that are necessary for information integration across time. The degree dependence of the growth algorithm gives rise to the long tail of the degree distribution, including nodes with high degree, which are advantageous for information integration across space. We have used this growth algorithm to generate a specific network adjacency matrix with 8100 nodes and 330 430 edges in the hierarchical configuration of Fig. 24(c).
It is important to disambiguate two senses in which a network may have fractal spatial properties. We have been discussing the power-law degree distribution, which relates to a network that is scale-free in terms of connectivity. We can also consider the spatial scales across which connections are made. Rentian scaling quantifies fractal properties related to the number of connections made through various partitions of the network. This scaling relates the number of nodes within a topological partition, n, to the number of edges crossing the boundary of that partition, e. 146 If the relation between e and n follows the form e / n pT , the network shows fractal topology, and the Rent exponent, p T , is given by p T ¼ log(e)=log(n). The Rent exponent is related to the topological dimension, D T , by p T !1 À 1 DT . Consideration of Rentian scaling and topological dimension provide a means to assess a network's connectivity across various levels of hierarchy.
In the present context, Rentian analysis provides one way to assess the potential for information integration across the physical space of the network. The larger the topological dimension, the more access nodes will have to distant members of the network. For the hierarchical network designed with the growth algorithm, we can approximate p T by considering the number of nodes and edges at each of the three levels of hierarchy. For the specific network under consideration, averaging over the neurons in a local sector, we find there are 17.1 edges into each neuron from other neurons within the sector. Considering a region, there are 17.8 edges into each neuron from other neurons within the region, but not in the same sector. And at the scale of the module there are 17.6 edges into each neuron from neurons in other regions. Based on this analysis, we calculate a Rentian exponent very close to unity. This gives a topological dimension of D T ! 1. This implies the network can integrate information up to infinite levels of hierarchy, which of course is not possible, as the network only comprises three levels of hierarchy. Networks with large topological dimension support self-organized criticality, 30,34 which is advantageous for information integration in neural systems. [30][31][32][33][34] In networks from Caenorhabditis elegans to the human brain as well as in VLSI wiring, the Rentian exponent is closer to 0.75. 146 This analysis indicates that the network considered here has slightly more long-range connectivity than is necessary. The Rentian exponent can be adjusted within the growth algorithm by adjusting the number of connections made to higher levels of hierarchy.

C. Physical instantiation
We seek estimates for physical aspects of SOENs, including size and power dissipation. We first introduce a plausible routing scenario for connecting the neurons in the network described in Sec. V B.

Passive waveguide routing
To achieve the massive connectivity required of a neural system, each neuron must send its signals to many different destinations. To achieve this with optical signals in a controlled manner, multiple planes of dielectric waveguides 85-87 must be employed to keep waveguide crossing losses as low as possible and to reduce the area occupied by passive routing. We anticipate employing waveguide planes in pairs. Within each pair, one plane runs predominantly northsouth, the other predominantly east-west. The vertical gap between waveguides in a pair is smaller than between the top waveguide in one pair and the bottom waveguide of the subsequent pair. The lowest pairs of waveguides will be used for the most local connections, with higher pairs for successively longer distances. Materials with lower index contrast may be used on higher planes to minimize loss in long-distance connections. In the context of the network discussed in Sec. V B, it is natural to use one pair of waveguide planes for intrasector connections, a second pair for intraregional connections, and a third pair for intramodular connections.
In Fig. 25, we show a row-column routing scheme that connects all neurons in a sector to all other neurons in that sector. This approach to routing results in rows and columns densely packed with communication lines, analogous to the white matter of axons in biological neural systems. The connections from a single neuron are highlighted, and interplanar couplers are drawn as pairs of triangles. We utilize row-column routing to minimize the number of interplanar couplers required. The rules governing the routing are described in Appendix C, which also discusses routing in feed-forward networks that are used in many machinelearning applications.
While we have shown Fig. 25 to depict neurons connected in a sector, the diagram can also be interpreted as showing sectors connected in a region or regions connected in a module. This fractal spatial property is a result of the hierarchical architecture we intend to employ for cognitive computing, and it is typical of networks characterized by high topological dimension. So far, we have considered networks with three levels of hierarchy, but this fractal pattern can extend indefinitely. In the present context, we envision the row-column routing architecture extending beyond even the chip or wafer scale into multichip and multiwafer assemblies with fiber optics forming dense white-matter connections between modules in three dimensions.

Size of neurons
The size of the neurons described in Secs. II-IV, when arranged in the networks described in Sec. V B, is calculated in Appendix C. We use a spatial model of a superconducting optoelectronic neuron and routing waveguides shown in Fig. 26(a). A schematic of how fabricated layers may be stacked is shown in Fig. 26(b). Applying the analysis of Appendix C to the network of 8100 neurons discussed in Sec. V B, we calculate that the network will fit on a die 1 Â 1 cm 2 using a pair of waveguide planes for intrasector routing, another pair for intersector routing within each region, and a third pair for interregional connectivity.
We would like to approximate the size of networks with more neurons. The model informs us that the area of a neuron will follow a power law as a function of node degree. We show the area of a network as a function of the total number of neurons in the network in Fig. 27(a) for the cases of three pairs of waveguide planes and nine pairs of waveguide planes. In both cases, the degree distribution of the nodes takes the form p(k) ¼ Bk Àγ with γ ¼ 1:6, where p(k)dk is the probability of a node having degree between k and k þdk. We find that by utilizing nine pairs of waveguide planes, we can accommodate an integrated system of 1 Â 10 6 neurons on a 300 mm wafer. This network would comprise over 200 Â 10 6 complex synapses, as described in Sec. III. Considering instead a network with delta function degree distribution [ Fig. 27(b)], we find a network of 3000 neurons each with 300 synapses will occupy a 1 Â 1 cm 2 die. A 300 mm wafer will support 40 000 neurons with 4000 connections per neuron. The number of synapses within a given area is similar for the power-law degree distribution and the delta function degree distribution. The main difference is the power-law degree distribution contains more neurons with fewer synapses and a few neurons with many synapses. Both network architectures are likely to be useful for certain types of information processing.
Appendix C describes the calculation of the area of a neuron based on consideration of optical waveguides and single-photon detectors. From this area estimate, we can calculate the area available for the superconducting electronic components that contribute to the neuron, assuming they are patterned above the waveguides in repeating layer modules, as shown in Fig. 26(b). The area available for superconducting inductors, mutual inductors, and Josephson junctions is given by the area of the node divided by the in-degree. This area is multiplied by the number of repeating layers utilized in the fabrication process. We find the area available for superconducting electronic components is at least 30 Â 30 μm 2 . A 1 μH inductor is on the larger end of what is likely to be utilized by the SI loop or SS loop. When constructed of a material such as MoSi with highkinetic inductance (180 pH=A) and patterned with 50 nm lines, the area is 5 Â 5 μm 2 . This leaves plenty of area for multiple inductors and wiring. The mutual inductors required to couple the SI loop to the NR loop will be patterned with a different superconducting material separated from the high-kinetic-inductance layer by an insulator. Mutual inductance of 100 pH is desirable. Again, it is quite possible to fabricate a 100 pH mutual inductor from two superconducting coils within this area. Each synapse also requires on the order of 10 JJs with I c near 40 μA. These devices are approximately 1 Â 1 μm 2 and will easily fit in the available area. Finally, the largest components of the neurons are likely to be the amplifiers that drive the light emitters and the mutual inductor coupling the neuronal receiving loop to the neuronal thresholding loop. These devices are not required at every synapse, but rather only at each neuron. Therefore, area is not likely to be a problem. Passive photonic waveguides are likely to dictate the area, and they are limited by the wavelength of light.
We have mentioned in this paper that neurons based on similar JJ circuits, but using no optical components, could be created with much the same design as the neurons in Sec. II and synapses in Sec. III. The purely electrical neurons would have difficulty achieving the fan-out necessary for the networks discussed here. Nevertheless, the electrical neurons could potentially achieve k % 20 with a more compact footprint, lower power dissipation, and higher speed. It may be advantageous to construct a hybrid network with nodes from k ¼ 2 to k ¼ 20 being purely electrical neurons with flux quanta playing the role of photons. We consider a network with power-law distribution wherein nodes with k ¼ 2 to k ¼ 20 are assumed to be electrical neurons, and nodes with k . 20 are assumed to be optical neurons. We perform an area calculation using modified forms of Eqs. (C2) and (C3) using physical dimensions appropriate to superconducting wiring rather than photonic waveguides. Whereas a network of just over 1 Â 10 6 optoelectronic neurons fit on a 300 mm wafer, a network of 4 Â 10 6 total neurons can be achieved if we combine low-degree superconducting neurons with high-degree photonic neurons. This network would comprise over 1 Â 10 9 synaptic connections. Such an architecture would extend the spatial power-law distribution to lower degree while extending the temporal power-law distribution to higher frequency.

The neuronal pool
In addition to comparing to the size of a die or a 300 mm wafer, we must also consider the largest SOEN that can be established. This limit is not technological or economic in nature, but rather it is a physical limitation set by the velocity of communication.  Consider two neurons separated by a distance d. At time t 1 , neuron one produces a pulse. At time t 2 , neuron two produces a pulse. We ask whether the pulse generated by neuron one may have contributed to the pulse by neuron two. In particular, we are interested in whether neuron one may drive neuron two to synchronize. For this consideration, the time scale we consider is the inverse of the frequency of oscillation, neuron one can induce neuron two to synchronize with oscillations at frequency f . In Eq. (3), v is the velocity of signal propagation. For the present discussion, we take the value of d that saturates the inequality of Eq. (3) to define the diameter of the neuronal pool. This result of the size of the coherent neuronal pool decreasing as the inverse of the frequency of synchronized oscillations 49 is consistent with observations from neuroscience 147 where the slow conduction velocity of axons is a severe limitation to the size of a synchronized network. 17,18,27 Based on Eq. (3), we see the area of the neuronal pool scales as A p / (v=f ) 2 . The number of neurons in the pool depends also on the size of the neurons. We consider spatial scaling in terms of the size of a synapse, because neurons may have any number of synapses, and, therefore, may span a wide range of sizes, whereas the size of synapses depends less on the degree of the neuron to which they are connected or the total size of the system in which they operate. We suppose that the synapses in a network can be characterized by an average width, w. If we assume that a network has connectivity predominantly in a plane (as is the case for mammalian cortex and as we expect from lithography on wafers), the area of a single synapse scales as A n / w 2 . For a given hardware platform sending signals at velocity v to synapses of width w, the number of synapses within a neuronal pool with synchronized oscillations at frequency f scales as N p / A p =A n / (v=fw) 2 .
We wish to compare the number of synapses in the pool of two different hardware platforms oscillating at the same frequency, f . In particular, we wish to compare SOENs to the human brain. Denoting the two hardware platforms with superscripts, we find A 300 mm wafer can support roughly 2 Â 10 8 superconducting optoelectronic synapses, giving w (soen) ¼ 1:9 Â 10 À5 m. The area of the human cerebral cortex is 0:095 m 2 , 148 and it contains 1:6 Â 10 10 neurons. 149,150 If we assume that each neuron has, on average, 10 4 synapses, 151 this gives w (brain) ¼ 2:4 Â 10 À8 m. A biological synapse is 1000 times smaller than a superconducting optoelectronic synapse in width, and 1 Â 10 6 times smaller in area. The speed of signals in cortex is roughly 2 m/s. 152 Axons with larger diameter can propagate signals above 100 m/s, but for the dense connectivity of cortex, such large fibers cannot be supported. The speed of light is 3 Â 10 8 m=s. 153 Thus, comparing SOENs to biological networks, we find N (soen) p =N (brain) p % 10 10 . The neuronal pool enabled by lightspeed communication can contain 10 Â 10 9 times the number of synapses as the pool enabled by ionic signal propagation along biological axons. Signaling at the speed of light brings a tremendous advantage in this regard, an advantage made more significant in networks spanning a volume rather than an area. This simple scaling analysis does not take into account factors that may be significant for very large systems, such as volume required for liquid helium flow for cooling or the volume of white matter occupied by optical fibers carrying signals between large modules. Suppose these factors introduce a quadratic error, and the correct scaling is the square root of this estimate. We would still be considering a system with 100 000 times the number of synapses as the human brain. Despite the coarse nature of this comparison, we include it here to emphasize a crucial point. Communication at the speed of light may enable neural systems with many more neurons and synapses than the human brain even though photonic neurons and synapses are significantly larger than their biological counterparts.
It has been hypothesized that the conduction velocity of axons limits the size of biological neural systems and that scaling of a single cognitive system much beyond the human cortex may not be possible with biological axon conduction velocities. 18,154 Based on these scaling arguments, we expect the largest cognitive systems in the universe to communicate at the speed of light, enabling full utilization of each neuron's light cone. While other approaches to neural computing may utilize light for communication, it is unlikely other photonic neural devices will be significantly smaller than those presented here, as the optical wavelength imposes a practical spatial limit. In Fig. 27(c), we show the area of the neuronal pool as a function of oscillation frequency f for a general neural system communicating at the speed of light. When oscillating at 1 MHz, an area the size of a large data center (10 5 m 2 ) can be coherently synchronized. When oscillating at 10 Hz (near the θ and β bands of the human cortex), an area the size of the surface of the earth can be coherently synchronized. Cognitive systems far beyond our comprehension may be possible with light-speed communication and single-photon efficiency.

Power consumption
In addition to the size of a SOEN, we must also consider its power consumption, which depends on the energy per synapse event, the degree distribution of the neurons, and the firing frequency distribution of the neurons. Analysis of power consumption in the networks under consideration is detailed in Appendix D. From that analysis, we find if we assume the total photon production efficiency of η tot ¼ 10 À4 , the network of 8100 neurons will dissipate 1 mW of device power, a trivial load for a modern cryogenic system. Including the power dissipated to establish bias currents, total on-chip power will be less than 10 mW. The dominant power draw is the cryo cooler, and for this system, it would require a few hundred watts. For the system of 1 Â 10 6 neurons and 200 Â 10 6 synapses on a 300 mm wafer, the total device power dissipation would be approximately 1 W. This is the cooling power of a common Gifford-McMahon cryocooler with base temperature of 4.2 K, drawing roughly 1 kW.
In addition to low total power, the power density of these systems is also quite small. For the case of the network on a wafer, the power density is 10 W=m 2 if a poor total light-generation efficiency of η tot ¼ 10 À4 is assumed. The heat produced can be straightforwardly removed, and immersion in a liquid helium bath will be excellent for this purpose. 155 This power density is far below the threshold for boiling liquid helium. Liquid helium is transparent at telecommunication wavelengths with an index of refraction very close to one. Thus, optical signals can be sent between neural modules in this environment without attenuation at the maximum speed allowed in the universe. Because the area of devices grows as a faster function of node degree than the power dissipation, cooling SOENs is possible in principle up to systems with size limited by the speed of light.

D. Discussion regarding networks of loop neurons
We have argued that neural systems benefit from at least three network structural characteristics: clustering, short paths, and power-law scaling. We have utilized an algorithm to generate adjacency matrices with these characteristics that correspond to networks hierarchically arranged across spatial scales, and we have proposed that such networks could be implemented with superconducting optoelectronic hardware using 300 mm lithographic technology. Networks with millions of neurons and hundreds of millions of synapses can be fabricated on a 300 mm wafer. While Moore's law has been sustained through feature size reduction, scaling of SOENs will be enabled by scaling the process up in terms of the number of processed layers, lithographic field area, wafer size, and multiwafer assemblies. Whereas economics has driven CMOS to bigger wafers, network scaling will drive SOENs to bigger wafers. Fortunately, feature sizes in all devices of this system are easily achievable with 190-nm lithography.
As described in Sec. I A, this work is motivated by the intention to develop artificial cognitive systems. We have investigated the utility of superconducting optoelectronics to achieve multiple neuronal functions with adaptation across a wide range of time scales, meeting many of the needs for relaxation oscillators in complex cognitive circuits. We have investigated the communication networks necessary to enable dense local clustering as well as long-range connectivity for efficient communication across temporal and spatial scales. We have considered area and power consumption, and we find compelling evidence that this hardware platform has extraordinary potential to perform the functions underlying cognition. We think this work makes a strong case that a significant experimental effort to develop superconducting optoelectronic hardware is justified.
We are interested in systems with near-term feasibility. Considering the aforementioned network of 8100 neurons and 330 430 synapses, the network occupies 1 Â 1 cm 2 when processed with 3 pairs of waveguiding planes. Operating at 20 MHz and possibly far beyond, the dynamical state space of this system would be extraordinarily complex and could provide a fascinating experimental setting for many subjects including self-organized criticality [30][31][32][33][34] in an artificial network approximating cortex. Such a neural module is also likely to offer significant technological opportunities in applications such as visual and auditory sensory processing, language recognition, and mechanical control.
When combined with an artificial retina based on superconducting detectors, a powerful artificial vision system could result. A three-chip stack could accomplish this. One chip would serve as a retina, and the other two chips would serve as layers of visual cortex, arranged in a manner analogous to the columnar organization of the mammalian visual cortex. 44 Feed-forward communication is likely to make use of vertical optical links with light in a lower layer being sent vertically with a grating coupler and received by a single-photon detector at a neuron on a higher layer. Such a model, capturing the columnar, vertical flow of information between layers as well as the laminar, horizontal flow within layers, 3,22,49 has been shown to function well for object recognition in computer simulations. 45 Using visual systems as a testbed of SOEN hardware is beneficial because photon detectors are native to the hardware, and also because the visual system is the most studied system of the mammalian cortex. Investigation of binding in vision 15,16,[45][46][47]156 can inform other neural computing systems including motor control systems, language processing, or other cognitive architectures.
Similar modules are also likely to tile well into larger networks. With a 10 Â 10 grid of interconnected die, the system would contain as many neurons as a bumble bee, a creature observed to have advanced navigational skills; 157 multimodal communication and learning; 158 cognitive flexibility; 159 and emotions. 160 A 300 mm wafer would yield enough die for six of these small brains. Each system would comprise nearly 1 Â 10 6 neurons and over 33 Â 10 6 synapses with complex plasticity and metaplasticity. These neurons and synapses would operate up to 30 000 times faster than any known living creature. Such a system would be readily accommodated by existing cryostats.
If one chose to utilize an entire 300 mm wafer for a single module, neurons with as many as 10 000 synapses could integrate the activity of networks with 1 Â 10 6 neurons. Such a structure would be analogous to a module in cortex, except for the significant increase in speed. Considering such modules as the building blocks of larger systems, stacks of wafers could be assembled with free-space and fiber coupling between. We can consider an interconnected stack of wafers as a column. An arrangement of columns could be interconnected to contain tens of billions of neurons, matching the number in the human neocortex. 150,151 A SOEN with the same number of neurons as human cortex would occupy a volume less than 2 m on a side.
While it may be difficult to build systems larger than 10 Â 10 6 neurons in the near term, such a system is not physically limited. Like the brain, such limits will be incurred due to the velocity of signal propagation. From Fig. 27(c), we know that networks as large as data centers can sustain coherent oscillations at 1 MHz. Such a facility would house 10 8 300 mm wafers if they were stacked 100 deep. This would result in 100 Â 10 12 neurons per data center across modules interconnected with another power-law distribution.
It is conceivable that networks could continue to scale beyond the size of data centers, and information integration through slowwave system-wide activity could occur at 1 kHz-faster than any oscillation of the human brain. In this case, the neuronal pool could occupy a significant fraction of the earth's surface and employ quintillions of neurons. We do not wish to cover earth in such devices, but asteroids provide ample, uncontroversial real estate. The materials for this hardware are abundant on M-type and S-type asteroids. [161][162][163][164][165] It appears possible for an asteroid belt to form the nodes and light to form the edges of a solar-system-scale intelligent network. Asteroids can be separated by billions of meters, so lightspeed communication delays may be several seconds or longer. For cognitive systems oscillating up to 20 MHz, such delays would cause individual modules to operate as separate cognitive systems, much like a society of humans.

ACKNOWLEDGMENTS
We thank Dr. Alexandra Artusio-Glimpse, Bryce Primavera, and Dr. Alexander Tait for helpful conversations and insights. This is a contribution of National Institute of Standards and Technology (NIST), an agency of the U.S. government, not subject to copyright.

Receiver circuits
Unless otherwise specified, we take I spd ¼ 10 μA, comparable to the switching current of MoSi 166 SPDs. Designs with lower I spd , as would be present in WSi nanowires, 61 or higher I spd , as would be present in NbN 58 or NbTiN 167 nanowires are also straightforward to achieve. The variable resistor of the SPD has zero resistance in the steady state, and it switches to a high-resistance state (%5 kΩ) temporarily (%200 ps) upon absorption of a photon. 62 Typical values for the parameters in Fig. 3(a) are L spd ¼ 100 nH, I spd ¼ 10 μA, r spd ¼ 2 Ω, I sy ¼ 800 nA À 4 μA, L sf ¼ 200 pH, I b ¼ 7 μA À 9 μA, L si ¼ 100 nH À 10 μH, and M sy ¼ 1 nH. I c of JJs is chosen to be 10 μA in Sec. II to improve energy efficiency. In Appendix D, we argue this is not necessary, and implementation with junctions of I c ¼ 40 μA or higher is probably a better design choice. The JJs used in simulations in this work have β c ¼ 0:95, where β c ¼ 2eI c CR 2 = h, with C being the junction capacitance and R being the junction resistance in the RCSJ model. 65,66 The parameter β c corresponds to the junction damping (with β c ¼ 1 corresponding to critical damping), and for this study, we consider slightly over-damped junctions. Typical values for the transformer inductors in Fig. 3(b) are L t1 ¼ 1 μH, L t2 ¼ 100 pH, and L t3 ¼ 10 pH.
In all flux-storage loops, there is a trade-off between inductance and area. High-kinetic-inductance materials such as WSi have inductance per square as large as 250 pH=A. By patterning a nanowire of such a material in a meander geometry, we can produce an inductor with 10 μH in an area of 35 Â 35 μm 2 with a minimum feature size of 50 nm. We argue in Sec. V that these relatively large inductors are still compatible with scaling to neurons with 1000 synapses because the area of photonic routing is generally the limiting factor.
A JJ coupled to an inductive loop is often characterized by the parameter β L ¼ 2πLI c =Φ 0 , which quantifies the amount of phase a loop can store. β L =2π quantifies the number of flux quanta that can be stored. For the receiver circuits of Sec. II, β L =2π ¼ 5 Â 10 2 À 5 Â 10 4 . For digital computing applications, β L =2π ¼ 1:6 is typical. There is also an area/inductance trade-off for the mutual inductor coupling each SI loop to the NR loop, and in the present work, we choose 1 nH for this mutual inductor.
When the value of I b is 9 μA [Fig. 3(a)], current biases across J sf , J jtl , and J si are 2:2 μA, 8:1 μA, and 8:8 μA, respectively, when the SPD is not firing and I sy ¼ 1 μA. These numbers are 3:9 μA, 8:3 μA, and 8:8 μA when I sy ¼ 3 μA. Of these three junctions, only J jtl is not embedded in a high-inductance loop, making it the most susceptible to noise. This value of I b has been chosen as a compromise between flux-storage capacity of the SI loop and imperviousness to noise. Based on the analysis of Ref. 168, we calculate the effective temperature,T ¼ 2πk B T=Φ 0 I c , and inductance parameter, λ ¼ Φ 0 =2πLI c , where L is the total inductance of the loop. For the junctions under consideration with I c ¼ 10 μA at 4.2 K,T ¼ 0:0176. The Josephson inductance of the junctions at zero bias is 33 pH, giving a total loop inductance of 266 pH and an inductance parameter of λ ¼ 0:124. With these values ofT and λ, the analysis of Ref. 168 informs us that biasing J jtl with 8:1 μA is below the switching current of 9 μA at 4.2 K. If an application requires further noise reduction, very similar circuit operation can be achieved by applying I b ¼ 7 μA, provided the range of synaptic biases is shifted to 2 μA , I sy , 4 μA. With I b ¼ 7 μA, the bias across J jtl is 6:6 μA when I sy ¼ 4 μA.
It may be the case that for different applications or during different periods of learning and operation, different amounts of noise are tolerable or even desirable. Changing between I b ¼ 7 μA and I b ¼ 9 μA can be done dynamically during operation to modify the stochasticity of the synaptic transducer with no required hardware compensation. Thus, one can utilize or suppress noise at will depending on the context. 75,110,111 Further, because synaptic firing events will produce tens to hundreds of fluxons, thermal switching events resulting in the addition of a single fluxon to the SI loop may be inconsequential. The complexity of the effects of noise on the operation of this circuit merit further investigation.
Regarding the two-photon coincidence detector of Fig. 6, to emulate the physical rebiasing behavior and critical current of the SPDs, JJs with 11 μA I c were placed in series with each SPD. The main panel shows simulations of a circuit with L spd =r spd ¼ 1 μH=2Ω ¼ 500 ns, and the inset shows simulations of a circuit with L spd =r spd ¼ 1 μH=20 Ω ¼ 50 ns. The circuit of Fig. 6 has been designed with 40 μA I c JJs. Similar performance can be achieved with the 10 μA junctions used in the circuit of Figs. 2(b) and 3. Here, I b ¼ 38 μA.

Synaptic plasticity circuits
The memory cell of Fig. 9 has been designed with the following circuit parameters: I b1 ss ¼ 38 μA, I b2 ss ¼ 20 μA, and L ss ¼ 90 pH. The four inductors comprising the two mutual inductors are labeled L 1 À L 4 from left to right. Their values are L 1 ¼ L 2 ¼ 45 pH, L 3 ¼ L 4 ¼ 18 pH, J su ¼ 40 μA, and I 1 ¼ 12 μA. The junction J sf with I c of 10 μA is embedded in the SB loop.
The memory cell of Fig. 11 has been designed with the following circuit parameters. The inductors comprising the DC-to-SFQ converter are, from left to right, L 1 ¼ 80 pH, L 2 ¼ 60 pH, and L 3 ¼ 300 pH. The bias to the DC-to-SFQ converter is I DC ¼ 73 μA. The drive current pulses are I þ ¼ 10 μA with 100 ps rise and fall time and 1 ns duration. The two biases to the SS loop are each I b ss ¼ 34 μA. The mutual inductor parameters between the SS loop and the SB loop and from the SB loop to I 1 are, from left to right L 1 ¼ 18 pH, L 2 ¼ 190 pH, L 3 ¼ 18 pH, L 4 ¼ 18 pH, and I 1 ¼ 27 μA. With L ss ¼ 20 nH, ΔI ss ¼ 103 nA per pulse, and ΔI sy ¼ 25 nA per pulse. The SS loop can store À4:94 μA , I ss , 4:96 μA.
All JJs in the synaptic plasticity circuits have I c ¼ 40 μA. In contrast to the synaptic receiver circuits of Sec. II where JJs with I c ¼ 10 μA were used, these JJs do not switch with every synaptic firing event, and consequently, using lower I c for power minimization is less important. Using I c ¼ 40 μA leads to circuits with wider operating margins and ease of fabrication. We argue in Appendix D that using JJs with I c ¼ 40 μA for the circuits of Sec. II would also be satisfactory.
To achieve the desired Hebbian operation with the circuit of Fig. 13(a), several considerations are pertinent. When SPD 1 detects a photon, it needs to direct current predominantly to I 2 , and not to I 3 . When SPD 2 detects a photon, it needs to direct current predominantly to I 3 , and not to I 1 . These considerations inform us that we should choose L 2 ( L 3 and L 3 ( L 1 . We choose L 2 ¼ 12:5 nH for this study. Such a small SPD may have reduced detection efficiency, but the inefficiency is tolerable for this purpose, because synaptic update will occur only rarely to optimize memory retention. 55,120 We then choose L 3 ¼ 125 nH and L 1 ¼ 1:25 μH. The choices for r 1 and r 2 are made to achieve the desired temporal behavior. The L=r time constants must be long enough to ensure that the SPDs do not latch. Beyond this, they can be chosen to achieve the desired learning performance. We choose τ 1 ¼ 50 ns and τ 2 ¼ 5 ns to facilitate WRSpice analysis, but longer time constants may be necessary in practice. In Fig. 13(a), I spd ¼ 10 μA. The bias to the synaptic storage junction is 38 μA. All corresponding circuit parameters are the same for Fig. 14(a).

LED driven by hTron
The equations of motion for the circuit of Fig. 16 are In these equations, I pn is an analytical function which describes the LED's DC IÀV characteristic. It is given by 169 The parameters that enter this equation are as follows: the number of acceptors, N a ¼ 5 Â 10 19 cm À3 ; the number of donors, N d ¼ 5 Â 10 19 cm À3 ; the intrinsic carrier density, n i ¼ 1:5 Â 10 10 cm À3 ; the number of holes on the n side of junction, p n ¼ n 2 i =N d ; the number of electrons on the p side of junction, n p ¼ n 2 i =N a ; the electron minority carrier lifetime, τ np ¼ 40 ns; the hole minority carrier lifetime, τ pn ¼ 40 ns; the mobility of holes on the p side, μ pp ¼ 10 À2 m 2 =V s; the mobility of holes on the n side, μ pn ¼ 10 À2 m 2 =V s; the mobility of electrons on the n side, μ nn ¼ 2:5 Â 10 À2 m 2 =V s; the mobility of electrons on p side, μ np ¼ 2:5 Â 10 À2 m 2 =V s; the hole diffusion coefficient, D p ¼ (k B T=e)μ pn ; the electron diffusion coefficient, D n ¼ (k B T=e)μ np ; the hole diffusion length, L p ¼ ffiffiffiffiffiffiffiffiffiffiffiffi D p τ pn p ; the electron diffusion length, L n ¼ ffiffiffiffiffiffiffiffiffiffiffi ffi D n τ np p ; and the temperature of operation, T ¼ 300 K. While we plan to operate these circuits at 4.2 K, our measurements indicate that the model captures the operation well at low temperature, and the model is more stable with this room-temperature value. To approximate the capacitance, we have assumed the length of the junction to be 5 μm, the height of the junction to be 200 nm, the intrinsic region is 100 nm, and ϵ ¼ 12 for the semiconductor. This gives C ¼ 1 fF, which is the smallest value we have considered. Fringe fields and wiring parasitics will likely increase this value. We have considered capacitance as large as 100 fF to ensure that our estimates are conservative.

hTron circuit model
The equations of motion for the circuit in Fig. 18(b) are given by In the calculations presented in Figs. 19 and 20, Q ¼ I 2 r, where I ¼ 1:2 mA is the current through the gate of the hTron and r ¼ 10 Ω is the resistance of the hTron gate. Thus, Q ¼ 14:4 μW.
To produce the plots shown in Figs. 19 and 20, we model the heater layer as a 10 nm-thick film of Al. The upper spacer is modeled as a 10 nm-thick film of a-Si, and the lower spacer is modeled as a 50 nm-thick film of SiO 2 . For the material parameters, the following values are used. The density of Al was taken to be 2700 kg=m 3 , and the thermal conductivity was taken to be 30 W=m K. The density of a-Si was taken to be 2285 kg=m 3 , and the thermal conductivity was taken to be 0:01 W=m K. 170 The density of SiO 2 was taken to be 2650 kg=m 3 , and the thermal conductivity was taken to be 0:01 W=m K. The temperature-dependent specific heat of Al and SiO 2 was taken from Ref. 137.
Regarding the superconductor, we anticipate using MoSi for this purpose due to its resistivity in the normal state, its critical current density, and its critical temperature. In a film of 8 nm thickness, the sheet resistance in the normal state is 300 Ω=A. At this thickness, T c ¼ 6:2 K. A device held at 4.2 K must only be raised by 2 K to switch to the normal state. The temperaturedependent specific heat of MoSi was taken from Ref. 136, and the thermal conductivity was taken to be 1 W=m K.
Of the material parameters, the values of thermal conductivity are the most uncertain. From Ref. 171, we know the thermal conductivity of quartz and silica differ from one another by orders of magnitude at low temperature, and we expect the value of thermal conductivity to depend significantly on the specific material and film deposition method used in fabrication. For functional hTrons fabricated with SiO 2 spacer layers and operating between 1 and 5 K, it has been observed 172 that 18 nW=μm 2 is required to switch the hTron. When using the value of 0:01 W=m K for the thermal conductivity of the spacer layers, we arrive at a power density of 494 nW=μm 2 needed to elevate the structure above T c , a factor of 27 higher than the measured value. We, therefore, suspect hTrons can be more efficient than the present model would predict, but further empirical investigation is required to determine the limits of hTron efficiency. The model of Eqs. (A4)-(A7) neglects phonon reflection at material interfaces. This effect may be responsible for the discrepancy between the model and measurements. Utilizing phonon reflection or phononic crystals in design may further increase the efficiency of the device.

Parameters used in threshold circuit
The I c of the JJs in the JTL in Fig. 15(a) is 250 μm, and the inductors are 2.1 pH. The I c of the JJ just before the relaxation oscillator junction is 280 μm, and the I c of the relaxation oscillator junction is the same. The relaxation oscillator is biased with 140 μA. The value of L 1 ¼ 200 pH. The value of r 1 ¼ 2:76 Ω. The value of r 2 ¼ 3 Ω.

APPENDIX B: ENERGY OF A SYNAPTIC FIRING EVENT
In the receiver circuit explored in this work, each fluxon produced by a synaptic firing event switches three JJs. When combined with the synaptic update circuit, a fourth JJ is necessary to act as a buffer between the synaptic update circuit and the synaptic firing circuit. Each switching operation dissipates energy E J ¼ I c Φ 0 . The total energy dissipated during a synaptic firing event can be minimized by reducing the number of junctions and by reducing their I c , but power dissipation will likely be dominated by light production (unless the light emitters can be made extremely efficient), so reducing the number of junctions or their critical currents is not likely to be necessary from an energy perspective.
The minimum I c that can be used is determined by noise considerations. For this analysis, we compare the Josephson coupling energy, I c Φ 0 =2π, to k B T with T ¼ 4:2 K. 173 This informs us that the thermal-equivalent junction critical current is 176 nA. For digital electronics wherein errors are intolerable, most systems employ JJs with I c . 40 μA. Neuromorphic circuits are resilient to errors and leverage noise. 75,110,111 These circuits can, therefore, operate with lower I c . In the present work, the circuit of Fig. 3(a) is designed with all three JJs having I c ¼ 10 μA (I c Φ 0 =2π . 50k B T), giving a switching energy (I c Φ 0 ) of E J ¼ 21 zJ.
The energy of a synaptic firing event is due to rebiasing of the SPD, E spd ¼ L spd I 2 spd =2, as well as the energy dissipated by the junctions, E JJ ¼ 4I c Φ 0 . The factor of four comes from the fact that the synaptic firing circuit has four JJs. If we assume that the SPD is an out-and-back nanowire integrated with a waveguide 174-177 with 60 μm total length, 150 nm width, and 180 pH=A, E spd ¼ 4 aJ. For a synaptic firing event with weak synaptic efficacy (33 fluxons), E JJ ¼ 2:7 aJ. For a synaptic firing event with strong synaptic efficacy (497 fluxons), E JJ ¼ 41 aJ. Operation with a weak synaptic weight could be engineered to produce zero fluxons, and a strong synaptic weight one fluxon, thereby reducing the analog transducer to a binary element and achieving the limit of energy efficiency of this part of the neural system. This is likely not necessary, as energy consumption is dominated by the production of photons by the transmitter circuit.

APPENDIX C: WAVEGUIDE ROUTING
Minimization of wiring is an NP-complete problem, and we make no attempt to solve it in the present context with the constraints particular to SOEN hardware. Instead, we develop a practical all-to-all routing scheme and arrive at wiring diagrams by pruning the complete set of connections down to those corresponding to the adjacency matrix of the network under consideration. This solution is likely not optimal, yet it accomplishes the goal of achieving integrated photonic connections between neurons represented by the relevant adjacency matrix. The routing scheme under consideration is shown schematically in Figs. 25 and 26. To route the waveguides connecting each neuron to all other neurons within a local sector, we begin with the center node. Light is assumed to exit the node to the east, and it branches into northrunning and south-running waveguides on the upper waveguide plane. Each of these waveguides splits east and west at each row to send light to the neurons in that row. Interplanar couplers 85 are shown as pairs of triangles in Fig. 25. Using these couplers, light is routed to the lower plane for east-west routing. It is routed back to the upper plane before entering the SPDs at each neuron. Interplanar transitions are required to mitigate losses due to waveguide crossings. Although intraplanar crossings can be used sparingly, their loss is high enough that they cannot be the exclusive means of crossing waveguides. 178 Connections from the central node to all other nodes are established in this manner. Next, a similar approach is taken for the node just north of the central node. Within the central routing column, each subsequent node has two waveguides running north-south. One is to the east of the center node's main north-south waveguide, and the other is to the west. Such an approach adds a factor of 2 to the width of columns, but avoids intraplane crossings. A better solution is likely possible. At each point where a waveguide must branch, a power tap evanescently couples the appropriate fraction of light to achieve the desired optical signal at all destinations. 87 The objective is to couple as close to one photon as possible to each neuron while avoiding intolerable communication errors. In the power calculation presented in Sec. V C, we assume that ten photons are produced for each synaptic connection, and reception of one or more photons will trigger a synaptic firing event.
Routing proceeds by establishing the connections from each node up the center column followed by connections from each node below the center node down to the bottom of the center column. As each new row or column waveguide is added, it is placed outside the existing waveguides, leading to dense ribbons of waveguides in the rows and columns, as shown in Fig. 25.
To calculate the width of a column, we define the term a ¼ w wg þg tap þh sine þ(w wg þw IPC )=2: (C1) Within this model, we calculate the width of a routing column as w col ¼ 2n row (a þg wg ) þ2r bend (C2) and the height of a routing row as h row ¼ n N a þn spd (w wg þg wg ) Â Ã þ2r bend : In these expressions, n row is the number of rows in a sector, g tap is the gap of an evanescent tap, h sine is the height of the sine bend for a power tap, w wg is the width of a waveguide, w IPC is the width of an interplanar coupler, g wg is the gap between two waveguides sufficient to avoid cross talk, r bend is the radius of a routing bend, n N is the number of neurons in a sector, and n spd is the number of SPDs associated with each synapse. We take n spd ¼ 3 in this analysis to account for synaptic firing and synaptic update signals from presynaptic neurons. Two SPDs receiving photons from the local, postsynaptic neuron are also required, but they do not need to be accessed via the same routing waveguides, and it is assumed these detectors can be patterned either below or above the routing infrastructure. The expressions in Eqs. (C2) and (C3) approximate the dimensions of routing channels for sectors with all-to-all connectivity, but in practice, connectivity will be much more sparse. To account for the sparsity, we multiply the width of Eq. (C2) by k out =n N and the height of (C3) by a factor k in =n N when calculating the contribution to the sector area by a neuron with out-degree k out and in degree k in . These in-and out-degrees refer to the degree calculated using only the adjacency matrix of the sector. That is, only connections that originate and terminate within a sector are accounted for in this part of the analysis. Distant connections are accounted for below.
The expressions in Eqs. (C2) and (C3) account for the area consumed by routing waveguides, but the contribution to total sector area due to the area of the neurons must also be included. We approximate the size of a neuron with h neuron ¼ 2L spd (C4) and w neuron ¼ k in (w wg þg wg ) þ 3 2 (L tap þL sine þL DeMux þL IPC þr bend ): In these expressions, k in is the in-degree of the neuron, L tap is the length of an average evanescent power tap, L sine is the length of a sine in a power tap, L DeMux is the length of a demultiplexer used to separate the photons for synaptic firing and Hebbian update. While it is not necessary to use different colors for these different operations, it may be advantageous so the different operations can be turned on and off independently. If multiple colors are used, the demultiplexer is likely to be a frequency-dependent evanescent coupler. L IPC is the length of an interplanar coupler.
To calculate the area of a sector, the areas occupied by routing and by neurons are summed, and the result is divided by the number of pairs of waveguide planes used in the fabrication process. Once the area of a sector has been calculated, we need to consider the area consumed by routing between sectors to form regions and between regions to form a module. For these calculations, the expression of Eqs. (C2) and (C3) are used, except n row refers to the number of rows of neurons in the region (which is the number in a sector times the number of rows of sectors in the region), and n N refers to the number of neurons in the region. The in-and out-degrees used now refer to connections originating in one sector of the region and terminating in a different sector of the same region. When calculating the area of a region, the area of the neurons is not added, as all intersector synapses are assumed to be above the intrasector synapses on higher waveguide planes. This simplification can be made if the number of synapses resulting from intrasector connections is equal to or less than intraregional and intramodular connections. This corresponds to a Rentian exponent less than one. For the case of the growth model network with 8100 neurons, the average number of intrasector connections is 17.1. The average number of intraregional connections is 17.8. The average number of intramodular connections is 17.6. For the random network, these numbers are 0.4, 9.8, and 91.9. Routing and area analysis of the random network requires a different approach.
For the network of 8100 neurons considered in Sec. V B, there are three levels of hierarchy: sectors, regions, and the total module. The analysis of the area of routing waveguides in the module is nearly identical to the analysis of the regions with the appropriate substitutions of n row , n N , k in , and k out .
The size of the neurons and routing depends on the number of connections made by each neuron. Using the above expressions for the contribution of a neuron to the total area, we can calculate A n (k), the area of a neuron with degree k. Here, we assume k in ¼ k out ¼ k. For the specific case of the network of 8100 neurons, we can explicitly calculate this area for each neuron because the adjacency matrix is known. In a general case, we can calculate the total area of a general network if the number of neurons and their degree distribution, p(k), is specified. The total area of the network is given by For the analysis presented in Fig. 27(a), we use this expression to calculate the total network area, and we obtain A n (k) by fitting the data from the network of 8100 neurons to a power law. From this fit, we find A n (k) / k 1:4 . To generate Fig. 27(a), we assume that the in-degree and out-degree are the same. The following parameters have been used in these calculations: the width of a waveguide, w wg ¼ 500 nm; the gap between waveguides, g wg ¼ 1 μm; the height of a sine bend, h sine ¼ 1 μm; the length of a sine bend is the same; the gap of an evanescent power tap, g tap ¼ 500 nm; the length of a power tap, L tap ¼ 5 μm; the length of an interplanar coupler, L IPC ¼ 36 μm; the width of an interplanar coupler, w IPC ¼ 4 μm; the length of an SPD, L SPD ¼ 10 μm; the radius of a routing bend, r bend ¼ 2 μm; and the length of a demultiplexer, L DeMux ¼ 5 μm. These parameters are intended to model silicon passive photonic components. For longer-distance connections, silicon nitride waveguides may be preferable due to the lower index contrast and, therefore, lower loss. The area of silicon nitride passive waveguides is roughly twice that of silicon.
While the sector comprising a square array of nodes shown in Fig. 25 works well for the growth algorithm leading to power-law degree distribution, many applications leverage feed-forward networks that are very wide with only a few layers but all-to-all connectivity from one layer to the next. A routing diagram for such a network is shown in Fig. 28. Here, we again use pairs of waveguide planes for north-south and east-west routing. A similar spatial analysis as that represented by Eqs. (C1)-(C5) has been conducted. Figure 28(b) shows the spatial scaling and loss as a function of the number of neurons in a network layer for a process employing 12 waveguide planes to form two layers of a feed-forward network. The maximum distance refers to the row-column propagation distance from a neuron at the bottom of one network layer to the top of an adjacent layer. It is assumed waveguides have 0.2 dB/cm propagation loss. 179 For 1000 neurons per layer, the system would fit on a chip 2:5 cm Â 2:5 m.

APPENDIX D: NETWORK POWER CONSUMPTION
For this analysis, we model the energy to produce a photon as E ¼ hν=η tot , where h is Planck's constant, ν is the frequency of light (250 THz if the emitters of Ref. 63 are employed), and η tot is an efficiency factor that takes into account all energy used by the thresholding junction, the superconducting amplifiers, and the light emitter. From Sec. IV, we take η amp ¼ 10 À4 . Because power consumption is dominated by the amplifiers, we let η amp ! η tot ! η. The energy of a neuronal firing event is given by E out ¼ ζhνk out =η, where ζ is the number of photons produced for each synaptic connection. In principle, this can be as low as one, but such low photon numbers would make communication noisy and unreliable. ζ ¼ 10 is likely to be a safe operating point, as described in Sec. IV D. To calculate the total energy per neuronal firing event, we also need to sum the energy of all the synaptic firing events that led to the neuronal firing event. This energy is given by E in ¼ χn fq I c Φ 0 k in , where n fq is the number of fluxons generated in a synaptic firing event (taken to be 245), and χ is the fraction of synapses that must fire to drive the neuron to threshold. We assume χ ¼ 1=3 as an overestimate, given that N 1=2 sy is observed in biological systems. 110,111 To simplify analysis, let us take k in ¼ k out ¼ k. The total energy of a neuronal firing event is thus E(k) ¼ (ζhν=η þχn fq I c Φ 0 )k ; E 0 k: The two contributions to Eq. (D1) are due to generation of photons during a neuronal firing event (ζhν=η) and generation of fluxons during a synaptic firing event ( χn fq I c Φ 0 ). These two terms are plotted independently in Fig. 29. We see the photon production efficiency, η, must be very high (.0:2) for the photonic contribution to be less than the fluxonic contribution. Based on this analysis, we conclude that the choice to use junctions with 10 μA I c for the synaptic receivers of Sec. II was probably misguided, and it may be the case that all the circuits of Secs. II and III can be produced with junctions of 40 μA I c .
The power dissipated by a neuron is given by the energy per neural firing event multiplied by the number of neural firing events per unit time, where the subscript n denotes the power of a single neuron as opposed to the total power of the network. We assume that the network is characterized by a power-law degree distribution of the form meaning the probability that a node from the network chosen at random will have degree between k and k þdk is p(k)dk. We further assume that the network is characterized by a power-law frequency distribution of the form meaning that, during a period of observation, a node chosen at random from the network will be observed to spike with frequency between f and f þdf is given by p(f )df . Incorporating these probability distributions, and assuming that there are N tot nodes in the network, the total network power consumption is given by and the power density is P N =A N , where A N is the total area of the network. Considering first the spatial term of Eq. (D5), we see that whereas A N / ð kmax kmin k À(γÀ1:4) dk: The exponent of 1.4 is explained in Appendix C. Because the integrand of A N decays more slowly with k than the integrand of P N , power density will decrease as a function of network size and will, therefore, never limit scaling. The exponents γ and μ can also be chosen to adjust power density.
Considering the frequency term of Eq. (D5), we see that The integrand in Eq. (D8) is the power spectral density of the network, which is proportional to f ÀðμÀ1Þ .
If the exponent in the frequency power-law distribution [Eq. (D4)] is μ ¼ 2, the network-averaged power spectrum takes the 1=f form observed in cortical networks, 17 believed to be significant for cognition, 18 and known to be related to self-organized criticality and fractal objects. 29,30,145 In the hardware platform under consideration, μ can be adjusted through the threshold bias currents as well as the light-emitter bias currents, thus providing a means to investigate network operation as a function of power spectral density.
Using Eq. (D5) to calculate the power consumption of the network of 8100 neurons, and assuming f min ¼ 100 Hz and f max ¼ 20 MHz, we find the network will consume 2 mW. A network of 1 Â 10 6 neurons on a 300 mm wafer will consume 2 W. To calculate these numbers, we have assumed γ ¼ 1:4 and μ ¼ 2.