Design, fabrication and metrology of 10$\,\times\,$100 multi-planar integrated photonic routing manifolds for neural networks

We design, fabricate and characterize integrated photonic routing manifolds with 10 inputs and 100 outputs using two vertically integrated planes of silicon nitride waveguides. We analyze manifolds via top-view camera imaging. This measurement technique allows the rapid acquisition of hundreds of precise transmission measurements. We demonstrate manifolds with uniform and Gaussian power distribution patterns with mean power output errors (averaged over 10 sets of 10 inputs) of 0.7 and 0.9 dB, respectively, establishing this as a viable architecture for precision light distribution on-chip. We also assess the performance of the passive photonic elements comprising the system via self-referenced test structures, including high-dynamic-range beam taps, waveguide cutback structures, and waveguide crossing arrays.


A. Background
The development of highly compact and energy-efficient optical interconnects 1 has been a major research objective for integrated photonics. Applications of optical interconnects range from telecommunications 2 to energy-efficient and high-bandwidth cross-chip communications in CMOS systems 3,4 . The reason for photonic communication to replace electrical communication is that light experiences no charge-based parasitics, and therefore can achieve higher fan-out as well as long-range communication with lower power and higher speed. However, the relatively large size of photonic components presents challenges to their integration. In a system with both photonic and electronic components, the chip area consumed by photonics grows rapidly as the number of communicating nodes, and their degree of connectivity, is increased. For densely connected systems, the requisite number of waveguides can increase to the point where they cannot fit on one plane. Wavelengthdivision-multiplexing (WDM) 4,5 or mode-division multiplexing 6 can partially alleviate this problem. Only one or a small number of master communication buses is then required to satisfy the information bandwidth requirements. When the number of nodes and their degree of connectivity is small, this provides an elegant and cost-effective solution to mitigating the von Neumann bottleneck 7 .
However, neural computing departs significantly from the von Neumann architecture. In a neural system, each processing node (neuron) contains local memory and communicates to many other nodes of the network across local and global spatial scales 8,9 . The information processing of biological neural systems is approximated in feed-forward neural networks, which have proven technologically useful 10 . A feed-forward neural network consists of multiple layers of neurons, which each integrate several inputs and transmit a signal when a threshold condition is reached. Each layer consists of some number of neurons, which have directed connections to the next downstream layer of neurons. Computation and memory are distributed, largely eliminating the bottleneck of processor-memory communication, but necessitating significant communication to and from each neuron.
Light is naturally suited to perform this communication. Because photons are uncharged and massless, photons avoid charge-based wiring parasitics. Using light for communica-2 tion in neural systems is very promising [11][12][13][14][15][16] , but constructing a network of nodes each with thousands of connections presents a formidable routing challenge. Using WDM alone is untenable, as it would require an extremely fine and precise wavelength spacing to be constantly maintained. The ability to scale to greater connectivities thus depends on the number of waveguides that can be integrated on a substrate. A suitable solution is the use of multiple planes of photonic waveguides, a field which has seen significant progress over the last decade [17][18][19][20][21][22] . The stacking of waveguides allows for dense integration with lowloss and low-crosstalk waveguide crossings. In the present work, we present the design and implementation of a two-plane signal distribution network routing 10 input nodes in one network layer to 100 connections on 10 output nodes. This routing manifold accomplishes the routing between two layers of a feed-forward neural network with 10 neurons per layer and all-to-all connectivity. We recently reported a theoretical analysis of the performance and scaling of multi-planar routing strategies for neural computing 23 .

B. Design
Feed-forward neural networks commonly leverage topologies where a given layer has order N 2 synaptic connections, where N is the number of neurons in a layer. In this work, we design, fabricate and experimentally characterize a distributed passive photonic routing manifold capable of realizing connectivities of order N 2 . The routing network can be pruned to achieve any subset of connections. Communication with this manifold requires neither wavelength nor time multiplexing, yet can be straightforwardly extended to utilize either.
The design of the proposed manifold ( Fig. 1) is based on two vertically integrated planes of waveguides. The lower plane (P 1 ) predominantly runs east, while the second plane (P 2 ) runs south, thus avoiding in-plane crossings. The light in P 1 bus waveguides originating from each input node is tapped sequentially into P 2 waveguides as the light propagates eastward.
This Manhattan-like routing architecture reduces the number of waveguides relative to a scheme where each input is immediately fanned-out with a star coupler.
The manifold implements two layers of a feed-forward neural network with 10 upstream neurons (first layer), 10 downstream neurons with 100 synapses (second layer), and all-to-all connectivity. Figure 1(c) provides this perspective for a reduced section of the manifold.
Throughout the remainder of this paper, we will use the labeling scheme shown in Fig. 1  The crossbar-like network allows each input node to be routed into a group of 10 outputs representing the whole input array (see the single-input case shown in Fig. 1(b)). Each output group acts as the synapses (receivers) for that downstream neuron.
The goal of the manifold is to route each input to one synapse on each output, following a pre-determined power distribution pattern. Here, we pursue two schemes to demonstrate control of the output intensity: uniform (each output synapse receives the same power) and Gaussian (the synapses from middle neurons of the upstream layer receive the most power, and the synapses from peripheral neurons receive much less). A script was developed to 4 automatically generate the layouts for the manifolds in both cases; variables in the script set neuron numbers as well as intensity distribution profiles. The core element of the manifold is the tap-and-transition device shown in Fig. 1(e). It comprises a beam-tap and an interplanar coupler (IPC) in close proximity. Its function is to divert a certain fraction of the bus power into a perpendicular waveguide on the upper plane. Between bends, gratings, and tap-and-transition devices, the P 1 and P 2 waveguides are adiabatically tapered to and from a larger width (1.5 µm) to minimize scattering loss over most of their length. The IPC is a similar design to the one presented in Ref. 17. In the present work, the input waveguide (on P 1 ) is tapered down to a width of 400 nm over a distance of 12 µm and is then routed at a constant width for a distance of 18 µm. It is finally tapered down to a minimum width of 200 nm over 12 µm. The other waveguide (on P 2 receiving from P 1 ) follows the same pattern in reverse over the same length. The total IPC length is 42 µm.
In a network of this size, significant dynamic range is required in the power-tap coefficients to achieve either uniform or Gaussian distributions. If only a single coupling gap is utilized, two limits are encountered: (1) the finite size of the sine bend in the tap waveguide results in a certain minimum coupling coefficient, and (2) an excessively long interaction length is required to achieve a high coupling coefficient. To address these issues, the manifold makes use of three coupling gaps and variable coupling lengths to improve the dynamic range of the power distribution network. The layout script selects the coupling gap from a look-up table generated from prior measurements of the tap coefficients. The three gap values are 300 nm, 400 nm, and 500 nm. Coupling lengths range from 2.7 µm to 19 µm.

B. Characterization
The manifolds under consideration each have 10 input ports and 100 output ports. While it is possible to measure these devices with the common approach of aligning optical fibers to grating couplers or facet-terminated waveguides, that measurement technique has significant limitations. First, repeatability strongly depends on the operator's ability to consistently optimize the fiber position on both ends using micro-positioning stages. Second, sample and fiber position drift are likely to disturb any power normalization by the time all the output ports are measured. V-groove arrays of fibers may alleviate the problem, but cannot accommodate densely packed structures, nor can the inter-fiber spacing be readily adjusted for different device configurations. Realizing precise fiber array alignment to the sub-dB level is challenging.
Here we pursue an alternative method of transmission measurements for this experiment: top-view imaging with a microscope and a camera. We couple transverse-electric (TE) polarized laser light near λ = 1320 nm onto the chip through a fiber-to-waveguide grating coupler, and light is coupled out through one or more grating couplers designed for vertical emission. Instead of collecting the light with fibers, we focus it onto a 640 × 512 pixel, 12-bitdepth indium gallium arsenide image sensor array through a microscope objective. The light from each output port is integrated over a small window and normalized to the brightest port in the frame, allowing simultaneous acquisition of many outputs. For most of the devices, a reference port is included near the input to allow straightforward normalization of the input power. An in situ image of this arrangement is shown in Fig. 3(a). To obtain low-noise and repeatable measurements, we take care to meet several conditions during all measurements: (1) the camera's gamma (intensity curve) is always fixed at 1.0 to ensure linear power dependence and no gain is applied, (2) a pixel correction mask is applied to remove bright pixels and nonuniformities, (3) background light is filtered out via an 1150 nm long-pass filter inserted in the microscope tube, and (4) all output ports utilize an identical grating design and orientation. Proximity effect correction is applied during lithography to prevent distortion of the gratings in densely loaded areas. Any measurements with saturated pixels are rejected and repeated at a lower exposure time. Likewise, measurements that are too close to the noise floor are repeated at a higher exposure time. The grating coupler efficiency was not characterized, but it is more than sufficient to conduct the measurements with a high SNR. Most measurements were conducted with a laser power of only a few hundred microwatts exiting the input fiber.
Images from the camera are analyzed with in-house software, which locates the optical modes of the output ports, and extracts a relative power measurement from the set. The data analysis proceeds as follows: (1)  three-pixel window is applied to locate bright spots; and (5) power is integrated near each port, and the integration window is expanded until convergence to a specified residual is achieved. The output of the script is an array of power values, normalized to the largest value in the set. 8 This measurement technique allows many photonic devices to be analyzed in parallel with high precision. In this work, we investigate up to 10 ports at once, but many more ports can be analyzed, limited mainly by the imaging performance of the optics and camera which dictate some minimum spacing between ports. Consider the test device in Fig. 3(a), which starts with an input grating coupler. Light is then split into two paths in a 50:50 power splitter (based on a Y -junction). The path on the left leads to a reference output grating coupler. On the right, the path leads to the device under test, in this case a beamtap. The coupling coefficient of the beam-tap is simply the ratio of the tap output power divided by the reference port's power. The loss of the grating couplers, input waveguide section, and 50:50 splitter are normalized out. Consequently, the measurement has high throughput (fully parallel measurement of many ports) and is robust to alignment errors.
Most structures reported in this work, with the exception of the manifolds, were designed with this configuration. In the case of the manifolds, the output ports (synapses) for a given input are measured relative to each other.

Passive components
First, we characterized the performance of the different passive components that are used in the manifold. The most critical feature is the high-dynamic-range power distribution system. To analyze the constituent components, we measure an array of beam-tap test devices. Across the array, the three coupling gaps of 300 nm, 400 nm, and 500 nm are implemented with a variety of coupling lengths. Each test device comprises a 50:50 splitter and reference port followed by two device output ports: the tap output, and the drop output (indicating the untapped power). The measured data are plotted in Fig. 3(f), along with a sine-squared fit of the coupling coefficient to the coupling length. A tight fit is observed for all three coupling gaps, providing a reliable model for future routing manifold designs based on the same platform.
Next, we analyzed the performance of the P 1 /P 2 waveguide crossings. The distribution of these crossings is not uniform in the manifold design presented here, so some waveguides experience more crossing loss than others. The T 1 bus waveguide ( Fig. 1(a,c)) encounters 81 crossings, the maximum in this design. A test structure for waveguide crossings is shown in Fig. 3(d). It consists of a meandered P 1 waveguide passing under a cluster of P 2 waveguides 9 above. It crosses the P 2 waveguide cluster a total of 8 times. Test structures with a total of 200, 400, 600 and 800 crossings were measured (Fig. 3(g)). The P 2 waveguides are 800 nm wide (same width as the P 1 waveguides) and are spaced by a nominal period of 4 µm, with a random variation between ± 400 nm to ensure no grating effects are introduced. The data are fit with linear regression to a loss of 6 ± 1 mdB per crossing. Considering the worst case of 81 crossings (path S 1,10 ), this constitutes a maximum link loss contribution of 0.49 dB. In the manifolds presented later in this work, waveguide crossings occur between 1500 nm-wide waveguides, which may have slightly lower crossing losses due to tighter optical confinement; nevertheless, this measurement places a conservative bound on the loss value.
Waveguide propagation loss is also important to consider when trying to fabricate precision routing manifolds. Cutback test structures are shown in Fig. 3(e). Eight different path lengths between 1.2 to 13.0 mm were tested and identical structures were fabricated for both the P 1 and P 2 planes. The data are shown in Fig. 3(h). A good fit via linear regression is again observed, indicating propagation losses of 6.5 ± 0.4 and 3.9 ± 0.4 dB per cm, for the P 1 and P 2 waveguides, respectively. The higher P 1 loss could be from mechanical degradation of its top oxide cladding in successive processing steps, which can be addressed with dense and robust sputtered oxide films. Future studies will include co-optimization of the optical and material properties of the SiN film to enable scaling to larger numbers of waveguiding planes.
Finally, we discuss the characterization of the IPCs. On this mask, the IPC test structures were placed too far from the optimal zone in the middle of the wafer (where the planarization was on-target) resulting in a larger inter-planar pitch and higher than anticipated losses.
Since there were 64 IPCs back-to-back, the total loss exceeded the dynamic range possible in the measurement. Fortunately, the IPC performance could still be straightforwardly characterized by comparing power transmission through two particular synapses on the manifolds: S 1,2 and S 2,2 . The only difference between them is that S 2,2 has two IPCs and 180 µm extra P 1 propagation length. We carefully aligned the fiber to each of the two inputs and recorded the power transmitted through the respective synapse. At λ = 1320 nm (the nominal wavelength for most tests in this work), a value of 0.6 dB per IPC is measured (after subtracting the 0.1 dB loss acquired from the extra propagation length). This is sufficiently low loss to enable good power uniformity, since any two synapses may differ only by up to two IPCs in their routed paths. Still, the loss is higher than anticipated, probably due to a deviation in the fabricated dimensions from the design. In future work, we expect pre-compensation information to improve this to levels similar to our previous work on amorphous silicon 17 . At this point, we can also make an informed estimate of the total link loss experienced in two representative paths through the manifold. First, we consider the path S 2,9 (Fig. 1), which encounters a relatively large loss compared to the other connections. It has a long propagation length (2.9 mm, all on P 1 ), 72 waveguide crossings, and 2 IPCs. Utilizing the information collected from the passive measurements earlier in this section, we estimate the S 2,9 link loss to be 3.5 dB. The smallest link loss occurs on S 1,1 , which consists of 1.1 mm of P 1 propagation length, leading to 0.7 dB loss. However, it should be noted that these losses are probably larger than the actual values, because we have used the propagation loss value from an 800 nm-wide waveguide. In reality, the manifolds employ 1500 nm-wide waveguides over most of the propagation length, which could reduce the link loss in the S 2,9 case.

Uniform-distribution manifold
The first type of routing manifold we analyze is the uniform distribution pattern. For any given input, the power delivered to each connected output synapse should be equal; for example, after applying input light to port T x , we should observe a power distribution of S x,1 = S x,2 = S x,3 · · · = S x,10 . To satisfy this requirement, the tap coefficients range from 0.1 to 0.5. An infrared image of the manifold under test is shown in Fig. 3(b), showing light emerging from the output ports. The measured intensities (normalized for each input case) are plotted together in Fig. 4(a), as well as the errors in Fig. 4(b). While there are a few outliers, the vast majority of synapses exhibit good uniformity. The measured power uniformity of the outputs for input T 8 is shown in Fig. 4(c) as a representative case. Error is calculated as the deviation of each point from the mean of that set. In Fig. 4(d), the mean is calculated for the absolute value of the errors in each row in Fig. 4(b). The grand mean of this data results in an overall average error of 0.7 dB.
Next, we consider the spectral dependence of the uniform routing manifold. For this study, we couple into a single input node T 8 , and observe the changes to output uniformity while scanning the wavelength. The power dependence on wavelength is plotted in Fig. 5(a), and the error in Fig. 5(b). The lowest mean error of 0.46 dB is observed at a wavelength of 1320 nm (Fig. 5(c)), and the value remains below 1 dB over a bandwidth of at least 50 nm, providing sufficient tolerance for many applications. We note that the mean error value only differs by 0.1 dB with the measurement of that same node, T 8 , in the earlier series of measurements (see Fig. 4(d), input number 8). This indicates that the measurement approach is highly repeatable.

Gaussian-distribution manifold
We continue the analysis with the Gaussian-distribution routing manifold. This manifold is designed such that the synapses receive power following a Gaussian envelope. The designed envelope is plotted on top of the experimentally measured synaptic power distribution for input node 8 in Fig. 6(c), showing good agreement. The rest of the analysis follows the same pattern as for the uniform case. Measured intensities are plotted together in Fig. 6(a), as well as the errors in Fig. 6(b). For this manifold, the normalization for each input is done by least-squares fitting of the amplitude a of the Gaussian power envelope P (k) according to where k is the index of the output synapse, b is the index of the peak value, and w is the FWHM of the Gaussian envelope (both b and w are equal to 6 and are not fitted in the analysis). Once a is fitted, the output powers are normalized to that amplitude, so the envelopes remain in-line despite the occasional bright or dark synapse. In Fig. 6(d), the mean is calculated for the absolute value of the errors in each row in Fig. 6(b). The grand mean of the errors results in an overall average error of 0.9 dB.
The spectral dependence of the Gaussian routing manifold is analyzed with a similar method to the uniform manifold. As before, light is coupled solely into T 8 . The power dependence on wavelength is plotted in Fig. 7(a) and the error in Fig. 7(b). A trend in the

III. SUMMARY
In this work, we propose, fabricate and characterize an integrated photonic routing manifold capable of distributing light with high precision across a 10 × 100 network. The approach utilizes multiple planes of waveguides and a distributed routing scheme to make efficient use of area. The manifold can instantiate custom power distribution patterns, such as uniform or Gaussian, based on the values of beam tap coefficients. This design is topologically equivalent to a feed-forward, 10 × 10, all-to-all-connected neural network. In analyzing the network and its sub-components, we employ a method for rapidly acquiring insertion loss measurements. Using fiber-based input coupling and vertical grating emission onto an InGaAs imaging sensor, photonic routing manifolds with 100 output ports are fully characterized in less than 4 minutes. At a wavelength of 1320 nm, the uniform and Gaussian manifolds were found to have mean output power errors (averaged over 10 rows of 10 inputs) of 0.7 and 0.9 dB, respectively. These routing and measurement techniques offer new opportunities for complex integrated photonic systems in computing, telecommunications, and other applications.
Official contribution of the National Institute of Standards and Technology; not subject to copyright in the United States.