Neuromorphic encoding of image pixel data into rate-coded optical spike trains with a photonic VCSEL-neuron

Driven by the increasing significance of artificial intelligence, the field of neuromorphic (brain-inspired) photonics is attracting increasing interest, promising new, high-speed, and energy-efficient computing hardware for key applications in information processing and computer vision. Widely available photonic devices, such as vertical-cavity surface emitting lasers (VCSELs), offer highly desirable properties for photonic implementations of neuromorphic systems, such as high-speed and low energy operation, neuron-like dynamical responses, and ease of integration into chip-scale systems. Here, we experimentally demonstrate encoding of digital image data into continuous, rate-coded, up to GHz-speed optical spike trains with a VCSEL-based photonic spiking neuron. Moreover, our solution makes use of off-the-shelf fiber-optic components with operation at telecom wavelengths, therefore making the system compatible with current optical network and data center technologies. This VCSEL-based spiking encoder paves the way toward optical spike-based data processing and ultrafast neuromorphic vision systems. © 2021 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). https://doi.org/10.1063/5.0048674


I. INTRODUCTION
In recent years, the fields of artificial intelligence (AI) and machine learning (ML) have been growing rapidly, fueling the interest and research efforts in unconventional, "beyond von Neumann" computing hardware. Neuromorphic computing is among these promising new approaches, taking inspiration from the extraordinary computational power of the human brain. Neuromorphic (brain-inspired) systems aim at processing information by reproducing the massively parallel architecture and dynamics and signaling observed in the neuronal networks in the brain. Current neuromorphic computing platforms include, among others, Loihi by Intel, 1 SpiNNaker by the University of Manchester, 2 TrueNorth by IBM, 3 Akida by BrainChip, 4 and BrainScaleS by the University of Heidelberg. 5 These platforms exhibit a varying degree of biological plausibility and hold great promise for efficient operation of AI algorithms or computational neuroscientific models. 6 There is also a wide range of novel experimental technologies that can power neuromorphic systems, including memristors 7,8 and nano-oscillators, 9,10 among others. Besides the aforementioned fully electronic approaches, photonic realizations of neuromorphic hardware are attracting increasing interest, given the key inherent advantages of optical systems. 11 Examples of these include signaling via optical pulses that have non-interacting bosonic nature and allow for both low-loss, high-speed waveguiding and wavelength-division multiplexing, which allows us to increase communication capacity. Photonic devices can also operate at very high frequencies, can exhibit the nonlinear dynamical responses required for neuromimetic information processing, and offer promising prospects for low power, energy-efficient operation. A wide array of photonic devices is currently being investigated for neuromorphic architectures. These include micropillar lasers, 12 distributed feedback lasers, 13,14 electro-optic modulators, 15 micro-ring resonators with phase change materials, 16 Kerr microcomb sources, 17 quantum-dot lasers, 18 hybrid resonant tunneling diode circuits, [19][20][21] superconducting Josephson junctions, 22 or coherent meshes of Mach-Zehnder modulators (MZM). 23 Among the available photonic technologies for light-powered neuromorphic processing systems, VCSELs (vertical-cavity surface emitting lasers) are gathering significant research interest due to their mature fabrication technology and favorable inherent attributes (such as low-power and high-speed operation, reduced costs, and compactness). Recent experimental [24][25][26] and numerical 27,28 works have reported that commercially available VCSELs and VCSELs with integrated saturable absorbing regions are capable of eliciting controllable spiking responses and exhibit behavior analogous to the leaky integrateand-fire (LIF) neuronal model. 29 Crucially, VCSELs also exhibit the capability of information representation using spiking rate coding, 30 as observed in certain classes of biological neurons. This functionality yields them as a promising solution for interfacing and conversion of data into a suitable spike-based representation, a key challenge in the field of neuromorphic engineering. 31 One of the domains where neuromorphic approaches are attracting growing interest is event-based computer vision systems. These offer many attractive features, such as low power consumption due to the sparse, low redundancy information representation via events (spikes) and high temporal resolution with continuous response, all while not suffering from drawbacks such as motion blur. 32 These systems readily yield themselves for combination with deep learning approaches, 33 convolutional neural networks, 34 and spiking neural networks 35 for fast, efficient computer vision solutions. While there is a growing appeal for photonic neuromorphic computing, the suitability of optical brain-inspired spiking hardware for computer vision remains a mostly unexplored territory. The human eye features two types of photosensitive cells, cones and rods, which convert the incoming visual stimuli into electric signals that are transmitted to neurons in the retina and the inner brain for processing and interpretation. The former have fixed wavelength range operation, whereas the latter are more sensitive to light intensity. In this work, we propose a system using a single VCSEL operating as a spiking photonic neuron, which emulates the operation of a cone cell. We demonstrate that the VCSEL-neuron is able to encode the R/G/B color channel intensity from a spatial area of interest (a camera pixel) into a continuous train of spikes, where the color component intensity dictates the spiking rate. Moreover, using time-division multiplexing (TDM) techniques, we are able to process (at different time instants) complex greyscale (GS) and RGBcolor source images with a large number of pixels using just one VCSEL-neuron. Therefore, this work experimentally demonstrates a highly hardware-friendly neuromorphic photonic system for the spiking representation of standard digital images, where the pixel values of each color channel are encoded into rate-coded optical spike trains. Furthermore, our implementation uses an off-the-shelf VCSEL and fiber-optic components operating at the standard telecom wavelength of 1310 nm, therefore making our approach fully compatible with optical communication networks and data center technologies. The reported system offers enticing prospects for operation as an input layer device in future photonic spiking neural network architectures for light-enabled machine learning and artificial intelligence hardware.

II. METHODS
The VCSEL-based photonic neuron operates as an analog, coherent optical device. The information input is provided as an intensity-modulated light signal from an external laser source.
The input data (R/G/B digital images) are pre-processed prior to injection into the VCSEL-neuron, as shown in Sec. II A. The VCSELneuron encodes the pixel values into a spike-based representation, which is later reconstructed following the steps described in Sec. II B. Finally, the optical setup used to operate the VCSELneuron is described in Sec. II C. In a fully parallelized realization of a neuromorphic vision system, a single VCSEL-neuron would be used to encode the information of a single color pixel of the input image. For such an implementation, the number of required VCSEL-neurons (3 ⋅ N 2 VCSEL-neurons in the input layer for RGB square images with size N × N) grows very quickly even for small image sizes. Instead, we take the approach of sequentially processing the pixels by means of time-division multiplexing (TDM). This allows us to dramatically reduce the number of required VCSELs to just a single device. Therefore, using this technique, we are able to deliver a neuromorphic photonic system for spike-based image data encoding, benefiting from a hardware-friendly implementation while yielding high operational speeds for image encoding functionalities. For future larger-scale implementations of neuromorphic photonic systems, integrated VCSEL arrays 36 offer the promise of network scalability and parallelized operation.
A. Pre-processing of the image data for in-VCSEL spike encoding In this work, the VCSEL-neuron encodes pixel information from digital images into spike trains. Multi-channel RGB images can be split into individual color channels, where each channel can be processed independently either serially by implementing timedivision multiplexing step or in parallel using individual VCSELneurons for each color channel. Here, all the inspected color channels are processed serially to allow for the use of a single VCSELneuron for the processing of the complete color information. A single color channel of the image is represented as an N × N array of independent pixel values with standard eight-bit depth of pixel data. The pixels in the single-channel array are serialized to enable processing with a single VCSEL-neuron device, thus critically reducing the required experimental system complexity. The pixel serialization process is realized either on a row-by-row basis or using randomized processing order, where the n pixels are shuffled before injection into the VCSEL-neuron. Each of the n pixels is time multiplexed using a fixed time period T = TP + T 0 , referred to, in this work, as a single cycle. During this cycle, single input pixel information is injected into the VCSEL-neuron in the form of square wave shaped power drop with temporal length TP, followed by a quiescent "zero" state for the remaining time period T 0 (effectively realizing a return-tozero coding scheme). In the power drop, the amplitude An represents the pixel color intensity. The mapping between each pixel value Pn and its respective drop amplitude An is linear across the non-zero values, where the min-max range of the produced waveform corresponds to the available digital-to-analog converter (DAC) range. Pixel values Pn = 0 are assigned as An = 0. For eight-bit pixel values, the mapping can be described as

ARTICLE scitation.org/journal/app
where b sets the baseline value of the linear mapping for non-zero pixel values. In this work, we selected b = 0.4. This mapping was selected to achieve a fully injection-locked dynamical state of the VCSEL-neuron for zero (dark) pixel values and a gradually increasing perturbation depth for all non-zero pixel color values. This can be considered as a basic model of a neuromorphic vision system studied over time period [0, T]. In this model, a single cycle of length T represents the behavior of a photodetector (PD) (or a single-pixel camera) exposed to constant light intensity during [O, TP], while subject to darkness (P = 0) otherwise. The electric output of the photodetector is then processed by the VCSEL-neuron into a rate-coded train of spikes in the optical domain. Without light exposure, the detector receives no signal and the spiking neuron remains quiescent [ Fig. 1(c)].

B. Reconstructing the images from spike counts
The temporal evolution of the system plays a crucial role for representation of information in spiking neural networks. Therefore, a 2D pixel array response over a given time period would ideally require a 3D visualization. For simplicity, we instead show the response of each individual pixel (a single cycle) as a 2D temporal map. In each of the n recorded cycles, the produced spikes are counted as individual events that cross a given measurement threshold value, returning the cycle spike counts cn. Every channel is injected into the VCSEL multiple times under the same set of conditions to obtain an average spike count value cn, delivering a clearer representation of the system's response. The average spike count is converted into an average spiking rate as rn = c n T P . The averaging step is performed over multiple independent encoding trials. This value is normalized by the maximum measured spiking rate in the image max(rn) and directly converted into a component intensity value for given pixel as n = r n max(r n ) . In the final step, the reconstructed pixel intensity values n are assigned to their original positions in the image pixel array, creating a direct reconstruction of the source image.

C. Experimental setup for injection-locked VCSEL-neuron
The experimental setup of this work is shown in Fig. 1(a). The setup consists of off-the-shelf fiber-optic components, where a commercially available telecom-wavelength (1310 nm) VCSEL is used to implement the photonic spiking neuron. The VCSEL is operated at a constant temperature of 298 K and with an applied bias current of I = 5 mA well above the device's lasing threshold current (I thr = 2.9 mA). The VCSEL's spectrum [shown in Fig. 1(b)] reveals a characteristic two-peak emission that corresponds to the two orthogonal polarizations of the main transverse mode of the device. These are referred to as parallel (the main lasing peak, at λy) and orthogonal (the subsidiary attenuated peak, at λx) polarized modes. The VCSEL used in this work does not exhibit polarization switching (PS) with applied bias current.
The operation of the VCSEL-neuron relies on coherent injection locking of one of the VCSEL's polarized modes to an intensitymodulated signal from an external tunable laser source (TL, Santec TSL210V), representing the information (image data) input to the system. The intensity of the tunable laser light represents the image pixel information input to the VCSEL-neuron. We want to emphasize that the VCSEL used in this work as an artificial spiking photonic neuron is a commercially sourced, inexpensive device operating at the standard telecom wavelength of 1310 nm. No modifications of any kind to the factory design were implemented. In this work, we injected the external light signal (encoded with image information) in the subsidiary orthogonally polarized mode to achieve the characteristic high-speed spiking operation. Light intensity modulation is realized with a 10 GHz Mach-Zehnder modulator (MZM) controlled by RF signals generated with a 12 GSa s −1

ARTICLE
scitation.org/journal/app arbitrary waveform generator (AWG, Keysight M8190A). Amplification of the electric modulation signal is provided by a 10 dBm wide bandwidth amplifier (Mini-Circuits ZX60-14012L-S+). First, the externally injected signal from the TL signal passes through an optical isolator (ISO) to ensure unidirectional coupling. The signal is attenuated with a variable optical attenuator (VOA) and polarization matched to the MZM using a polarization controller (PC). The operation point of the MZM is set to a working point between the quadrature and the peak, where the electric modulation results in optical power drops while preserving the full dynamic range and approximately linear response with high output power. The modulated optical signal is polarization matched to one of the polarized modes of the VCSEL (here, into the orthogonally polarized mode at λx) and injected into the laser through a 50:50 optical coupler and an optical circulator (CIRC). The second coupler branch is used to record the average injection power Pinj using an optical power meter (PM). The input signal wavelength is shifted with respect to the λx mode by an initial frequency detuning δ f . The detuning and injection power allow for adjustments of the spiking threshold in the VCSEL-neuron, with higher detuning and lower power values decreasing the threshold distance from the quiescent state (where the VCSEL is injection locked to the external injection). Input stimuli (power drops) that are strong enough to push the system out of the injection locking regime result in the VCSEL-neuron responding with fast (sub-nanosecond) optical spiking responses, including single spikes 37 [ Fig. 1(d)] and continuous spike trains 30 [ Fig. 1(e)]. Following the behavior of excitable systems, the spike shape does not depend on the input perturbation and only requires the perturbation (drop) to cross the spiking threshold. However, during continuous modulation, the spiking frequency is a function of modulation amplitude. The detuning and power values used through this work (indicated in the figure captions) were found experimentally based on values used in previous works (see Refs. 30 and 38).
The VCSEL output passes through the circulator into a 50:50 coupler for analysis using an optical spectrum analyzer (OSA) and a 13 GHz real-time oscilloscope (RT OSC). For the oscilloscope readout, the optical signal is converted using a 9 GHz amplified photodetector (PD, Thorlabs PDA8GS). The modulated signal injected into the VCSEL-neuron is recorded from the power-meter (PM) branch. The response of the VCSEL-neuron is then recorded and time de-multiplexed to retrieve responses corresponding to individual serialized pixels.

III. RESULTS AND DISCUSSION
To assess the viability of the VCSEL-neuron for encoding pixel color intensity information into optical spike trains, we investigate its operation using multiple input digital images. We use a singlechannel grayscale (GS) image containing a vertical gradient, a twochannel RB image containing a two-color (red-to-blue) diagonal gradient, and a complex, 32 × 32 pixel RGB image.

A. Grayscale (GS) and red-blue (RB) images
The source images, individual channels, and corresponding input modulation waveforms for the grayscale (GS) image and red (R) and blue (B) channels in the RB image are shown in Fig. 2. Figure 3 provides the experimentally measured results for the vertical GS gradient image. The input modulation waveform, shown for the GS image in Fig. 2(c) intensity gradually changes from black (0) to white (255), the amplitude of the injected modulated power drops increases, pushing the VCSEL-neuron beyond the spiking activation threshold into the continuous spiking regime. The approximate spiking threshold is illustrated as a red solid line in Fig. 3(a). The response for the GS modulation case is shown in the temporal map in Fig. 3(b), where each row in the map corresponds to the response to a single given pixel in the GS image. The major gridlines in the map in Fig. 3(b) separate groups of eight processed pixels, hence visualizing each individual row of the 8 × 8 pixel image. In the case of the GS image, all eight pixels in a given row have an identical value and should therefore produce equivalent spiking rates. Similar to the stochastic nature of certain biological neurons, where spike firing can be considered as governed by Poisson statistics, 39 the individual spiking events can appear at random instants while still correctly representing the input GS image data in the average spiking rate, which increases monotonically with the input pixel intensity. 30 This is illustrated in both the increasing density of the events (spikes) in the temporal map of Fig. 3(b) and the increase in the measured spiking rate shown in Fig. 3(c). Specifically, the spiking rates in Fig. 3(c) are obtained by averaging the rates from nmeas = 3 independent oscilloscope readouts taken for the same GS image input under the same operating conditions. The different colors used in Fig. 3(c) highlight the spiking rates produced by the pixels from each individual row in the image. Figure 3(d) shows the relation between the amplitude of the injected optical power drops encoding the pixel GS intensity and the corresponding average spiking rate from the VCSELneuron, clearly confirming a monotonically increasing trend. Note that for certain pixels at the start of the rows [see, for instance, rows depicted in purple, pink, and green in Fig. 3(c)], we observed a sudden, higher-than-expected spiking rate that gradually decreases toward a constant value. We believe this transient effect is due to the stateful (memory exhibiting) nature of the VCSEL-neuron, where the response to an incoming perturbation depends on the previous state of the device. This is in agreement with earlier reports on the leaky integrate-and-fire (LIF) functionality in VCSEL-neurons. 29 High degree of consistency was observed among the readouts used for the averaging, confirming that the VCSEL-neuron can process input images reliably in a single shot fashion.
To demonstrate the optical spike encoding of multi-channel images with the VCSEL-neuron, we processed the red-blue (RB) diagonal gradient image shown in Fig. 2. The individual red (RB-R) and blue (RB-B) image channels and their corresponding input waveforms are shown, respectively, in the middle (d)-(f) and bottom (g)-(i) rows of Fig. 2. The frequency detuning set for both the RB-R and RB-B channels was equal to δ f = −5.59 GHz, while the injected power was set to Pinj = 115.6 μW and Pinj = 128.4 μW for the RB-R and RB-B channels, respectively. Each individual pulse (representing single-pixel information) in both modulation waveforms for the RB-R and RB-B channels has a set temporal length Tp = 60 ns with a return-to-zero period of T 0 = 30 ns, resulting in an encoding time of 5.76 μs per a single iteration of a channel. For both RB-R and RB-B cases, the VCSEL-neuron is able to successfully encode the pixel intensity of each color channel into the average spiking rate. Notably, we can again observe the short-term memory of the VCSEL-neuron influencing the continuous spiking responses, with low-to-high transitions in the RB-R channel causing over-firing with higher-than-expected spiking frequency, while high-to-low transitions in the RB-B channel do not exhibit such an effect. Therefore, abrupt, high-amplitude incoming stimuli, arriving after a quiescent or a low spiking rate state, can cause an accentuated response with an increased spiking frequency output. This functionality could hold further promise for spike-based signaling of rapidly changing events.
To verify the resulting encoded images obtained from the VCSEL-neuron, we have directly reconstructed the source images from the average spiking rates shown in Figs. 3(c), 4(c), and 4(g). cases, both the individual channels and composite images show very good overall agreement, clearly preserving the color gradients.

B. Complex RGB image
Furthermore, we have also investigated the processing of larger-scale full-color images using a 32 × 32 pixel RGB image with more complex features. The pixels are processed in a fixed randomized order, where the same order was used for all three color channels. The input waveform pulse parameters were Tp = 53.3 ns and T 0 = 26.6 ns, resulting in 81.9 μs single-channel processing time. The frequency detuning was set to δ f = −4.9 GHz for RGB-R, δ f = −4.9 GHz for RGB-G, and δ f = −5.25 GHz for RGB-B. The injected power was set to Pinj = 120.9 μW for RGB-R, Pinj = 119.6 μW for RGB-G, and Pinj independently processed R/G/B channels, is in good agreement with the source and preserves the key features from the original image.

IV. CONCLUSIONS
This work demonstrates that a VCSEL-based photonic spiking neuron can be used to encode digital image data into optical spike trains with variable spike firing rates, therefore exhibiting the behavior of certain biological neurons at speeds that are multiple orders of magnitude faster. We use this functionality to rate-encode and reconstruct grayscale and RGB color digital images, showing, in all cases, very good agreement between source and reconstructed image data. Use of time-division multiplexing allows us to dramatically reduce system requirements, permitting operation with a single VCSEL and enabling an extremely hardware-friendly implementation. This approach yields a simple neuromorphic photonic platform serving as a data encoder (interface) for spike-based machine vision hardware, which is also able to deliver high speed, thanks to the GHz-rate optical spikes, low timescales (active encoding time of 50-60 ns per pixel is used in this work), and low energy consumption. For future approaches, we envision systems using multiple independent and array-integrated VCSEL-neurons for parallel pixel processing operation for faster operation speeds and enhanced functionalities. Our results further demonstrate that the VCSELneuron implementation, built with a commercially available and telecom-compatible 1310 nm VCSEL and off-the-shelf fiber-optic components, can realize spike-rate information encoding of complex analog (image) data streams, yielding itself as a viable option for operation as an input layer device in future photonic spiking neural network hardware architectures.