Deep learning-based design of broadband GHz complex and random metasurfaces

We are interested to explore the limit in using deep learning (DL) to study the electromagnetic response for complex and random metasurfaces, without any specific applications in mind. For simplicity, we focus on a simple pure reflection problem of a broadband electromagnetic (EM) plane wave incident normally on such complex metasurfaces in the frequency regime of 2 to 12 GHz. In doing so, we create a deep learning (DL) based framework called metasurface design deep convolutional neural network (MSDCNN) for both the forward and inverse design of three different classes of complex metasurfaces: (a) Arbitrary connecting polygons, (b) Basic pattern combination, and (c) Fully random binary patterns. The performance of each metasurface is evaluated and cross-benchmarked. Dependent on the type of complex metasurfaces, sample size, and DL algorithms used, MSDCNN is able to provide good agreements and can be a faster design tool for complex metasurfaces as compared to the traditional full-wave electromagnetic simulation methods. However, no single universal deep convolutional neural network (DCNN) model can work well for all metasurface classes based on detailed statistical analysis (such as mean, variance, kurtosis, mean squared error). Our findings report important information on the advantages and limitation of current DL models in designing these ultimately complex metasurfaces.

There are two general approaches in designing metasurfaces. The first approach is the forward design, which is an iterative process involving parametric studies to explore within a given set of input parameters in order to produce the desired EM response or output. A simulation tool (or forward numerical solver), which solves the underlying governing equations to provide reliable characterization of input parameters to match the calculated outputs.
The cost is determined largely by the simulation time of each trial and error. If the number of input parameters is huge or to avoid computational cost, a designer often has to give up exhaustive exploration of the design space and settle on some trade-offs on the desired output.
The second approach is the inverse design, which is to find an optimal set of input parameters for a given output. This is more difficult than the forward design as there is no definite or unique solution for such a problem. Thus, inverse design is typically formulated to search for the most approximate input conditions within a prescribed domain via an optimization algorithm. Almost all of the inverse design problems are challenging, which require advanced algorithms, such as the heuristic algorithm of ant colony algorithm 26 , genetic algorithm 27 , particle swarm algorithm 28 and topological optimization [29][30][31][32] .
Machine learning (ML) techniques like deep learning (DL) 33 has been successful in various fields involving complexity, such as computer vision, natural language processing, and speech signal processing. Their applications in some traditional scientific disciplines have also grown significantly in recent years, including condensed matter physics 34 , particle physics 35 , chemistry 36 , text mining for materials discovery 37 , discovering physical concepts 38 and many other physics-based problems [39][40][41][42] . DL based approaches for the design of metasurfaces are also gaining a lot of attentions 5,6 , where various types of metasurfaces (with some prescribed regular patterns) have been successful designed [43][44][45][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60]  is well-known to be a more robust alternative to FCNN as the residual function can provide a smooth and stable gradient. The t-CNN is the inverse operation of the normal convolution process, which is typically introduced in the DNN-based inverse design approach, such as the Generative Adversarial Networks (GAN) 62 .
In the following sections, we first provide a short overview of DL based framework used for designing metasurfaces. The complexity of various types of metasurfaces with different degrees of freedom is shown in Fig.1. In this paper, we consider three complex metasurfaces: (a) Arbitrary connecting polygons (PLG), (b) Basic pattern combination (PTN), and (c) Fully random binary patterns (RDN). The goal is to explore DL based design for any given complex metasurfaces for a broadband EM response. In our experiments, the EM response is focused on the reflection of a broadband EM wave (from 2 to 12 GHz) on the given dataset of complex metasurfaces (PLG, PTN and RDN), where the reflection as a function of frequency can be predicted by using different DL models based on our training procedures for both forward and inverse design. Successful results and limitations will be evaluated and discussed. By considering the subordinate relation in these three metasurfaces, we also use a cross-benchmarking to evaluate the ability of the FCNN model in using different metasurfaces in both training and testing. Finally, we conclude the paper that MSDCNN is able to provide good performance for each of the three complex metasurfaces studied in this paper, however there is no one single universal DCNN model that is able to perform well for all of them simultaneously. Other DL models like the graph neural networks or complex value neural network, will be likely promising candidates for further improvements.

II. METASURFACE DESIGN USING DEEP LEARNING
Many successful applications of DL models have been reported in the literature for the design of different metasurfaces [43][44][45][46][47][48][49][50][51][52][53][54][55][56][57][58][59]63,64 . In metasurface designs, the design parameters and the desired EM responses are of key concerns. The design parameters are properties associated with the metasurfaces such as working frequency and geometrical parameters that describe the physical structure of the metasurfaces. These parameters are normally limited in a continuous variable range R, so the problem can be abstracted as a mapping between R n and R k , where n is the number of controllable/designable variables and k is the dimension of desired response/design target. In this sense, the mapping from R n to R k is the forward prediction of metasurface designs, while the mapping from R k to R n is the inverse-design.
Depending on the complexity of the controllable variables, we can categorize them into two types. The Type-1 design task is of relatively low complexity featuring a template with well prescribed shapes that can be described by a handful of geometrical parameters like thickness, spacing and width.
For k = 1 case, it is the simplest single-regression task. In some cases, the design target is a frequency-dependent response (k > 1), and the controllable parameters dimenstion are limited to small n. As an example, a prior work 43 is focused on the electromagnetic scattering of alternating dielectric thin films with a combination of different thicknesses and materials (also known as the layered model).
For such Type-1 tasks, the fully connected network (FCN) model is sufficient to predict the corresponding EM response which agrees with physics based simulated results (obtained from a numerical solver). However, it is not suitable for the inverse design task that its training becomes unstable and it is also slow due to the inconsistency of the dataset used.
An encoder-decoder network, known as Tandem 43 has been introduced to solve this problem, which quickly becomes a popular framework in the metasurface design. Other improvements includes enhancing the FCN for more robust performance by using deeper networks 44,49-51 .
The Type-2 design task allows a higher degree of complexity, which typically features a 2D metasurface and it can be viewed as a mapping of R n × R m → R k . This task can be converted to the above Type-1 task (R n×m → R k ) if n and m are small 52 .
However, when the size (n, m) is very large, the deep FCN will be too expensive and unstable for computation. Instead of FCN, convolutional neural network (CNN) provides an distinct advantage for such multi-dimensional inputs. The convolutional operation in CNN is ideal in capturing the spatial relationship of the inputs. By using the deeply-stacked convolutional layers, both local and long-range effects are expected to be captured by CNN. For example, at k = 1 (prediction of the quality factor of a cavity) 53 , CNN with only four layers is capable of producing outstanding prediction. For k = 2, generative adversarial network (GAN) has been successfully applied to design a meta-grating component 54 . Furthermore, GAN has also been incorporated for a physics-driven and data-free neural network 55 . As mentioned above, for Type-1 problem, FCN is found to be effective if the input (R m ) and target output (R k ) have comparable small dimensions. However, it is not suitable for the Type-2 problem, the inverse design becomes troublesome due to the complication of dimensionality. It is reported that GAN can efficiently discover the correct metasurfaces in using user-defined and on-demand spectra as input parameters 58 . A bidirectional neural network is successful in designing three-dimensional chiral metamaterials 59 For a broadband response of metasurface, we may have large k over a large frequency range. Depending on the resolution requirement and frequency range, the value of k can be 1000 or much more is required. For this problem, mathematical transformations like Fourier, Wavelet, or simple down-sampling can help to extract the most significant information from the EM response. 65 Other methods like the contrast-vector 56 designed to emphasize important features like peak location of the EM response, can also improve the efficiency of the inverse design. With these methods, the dimension from k = 1000 can be compressed to k < 100 with good accuracy.
The above discussion suggests that different DL models are required depending on the specific types of the metasurfaces and its required EM response. For complex metasurfaces with irregular patterns (see Fig.1), we are not sure if DL models will work especially for a broadband EM response, in which studied domain space is big with large values of n, m and k. Thus it is the focus of this paper to evaluate the performance of DL models for the broadband response of such complex metasurfaces.
The remainder of the paper is organized as follows. In Sec.III, we will introduce the training procedures to obtain a well-performance forward prediction and inverse design model based on the convolution neural network (CNN). We will report the good prediction of our model for both fast forward and inverse design of PLG based complex metasurfaces. In Sec.IV, we will introduce the other two complex metasurfaces: PTN and RDN. We will extend the original DL model developed for PLG to PTN and RDN. Improvements in using deeper DL models and large datasets are reported. The weak generalization of one unique DL model to all PLG, PTN and RDN is discussed with detailed statistal measure and cross-benchmarking. Finally, Sec.V concludes with a summary and raises some future prospects in using DL models to deal with the complex metasurfaces. We argue that complex metasurfaces studied here may serve as a good platform to test the capability or limit of ML in analyzing a complex design problem even though the EM response of the complex metasurface is well governed by the Maxwell equations.

III. METASURFACE DESIGN DEEP CONVOLUTIONAL NEUTRAL NETWORK (MSDCNN)
We propose a metasurface design deep convolutional neural network (MSDCNN) framework for both forward-design and inverse-design of complex metasurfaces. In particular, a co-polarized reflectance (coPR) of a purely reflective metasurface over a frequency range of  Fig.3(a). The data are randomly split into a training set of 27,000 samples and a test set of 3000 samples for training purposes. The calculated coPR for each PLG has 32 points evenly distributed over a frequency range from 9.5 GHz to 12 GHz as shown in Fig.3(b). In our experiments, we have found that uniform sampling method is better than fast Fourier transform (FFT) and wavelet transform. Having tested several different numbers of sampling points, 32 sampling points has been determined to produce satisfactory outcome with minimal information loss. Resnet18S is adopted in the evaluation branch which consists of eight Resnet blocks with LeakReLU activation function. It has been adapted from popular Resnet18 model 61 . This evaluation branch predicts the corresponding coPR associated with a input 2D image (like PLG pattern). The generation branch has five transposed convolution blocks followed by a tanh activation function. This inverse generator branch suggests a 2D image associated with the input EM response like coPR. The judging branch is made up of a series of regular convolution blocks, which is used to compare the generated image obtained from the generation branch to the input image, in terms of the confidence level of the matching. A successfully trained forward prediction branch is regarded as a replacement for the numerical solver for the forward design and the inverse generator branch is used to perform the inverse design.
In our MSDCNN, we first train a very accurate evaluation model to predict coPR to replace the time-consuming CST simulation. Secondly, we build a high-quality image generator under the GAN framework in order to convert any random sequences to a 2D polygon-like pattern. Using the evaluation and generation branches, we can establish a differentiable mapping between the 2D images and the input coPR, so it becomes a useful metric in quantifying the matching between the two inputs. By enforcing a minimization (or optimization) of this metric in the training procedure, it becomes a pattern-coPR converter. This training procedure encourages the model to search for the best candidate pattern for the evaluation branch. Thus the accuracy of the evaluation branch is critical to the performance of the MSDCNN, which largely depends on the quality and quantity of the dataset used in the training. More details on the architecture of MSDCNN and the training procedure can be found in Supplementary materials.
To measure the accuracy of MSDCNN on the PLG patterns, we use the mean square error (MSE). Our results suggest that the MSDCNN can obtain high accuracy for both the forward prediction and the inverse design tasks from the evaluation and generation branch, respectively. For the forward prediction task, the MSE is defined as s = |C c − C r | 2 , where the C r is the true coPR (obtained from CST) and C c is the predicted coPR from evaluation branch in MSDCNN. In Fig.6, eight randomly selected test cases are plotted to demonstrate the excellent agreement between predicted and true calculated coPR. More importantly, the model can correctly capture both the locations and magnitudes of the peaks or variation in the spectrum from 9.5 to 12 GHz. Here we have coPR = 1 at lower frequency from 2 to 9.5 GHz (not shown). The average error of the evaluation branch is s = 3.7 × 10 −4 , thus confirming that the evaluation branch has been successfully trained with high accuracy and thus can be used as a fast computational tool to replace the traditional EM simulator for the design of complex PLG metasurfaces.
For inverse design, the generation branch suggests a corresponding 2D image of a PLG-like metasurface for a given coPR spectrum. There are 3 metrics used to determine the accuracy: C r is the input (or real) coPR C g is the predicted coPR from the evaluation branch, and C p is the actual coPR computed by CST based on the 2D image created from generation branch. An ideal good inverse design demands a low error between C r and C p determined by e = |C r − C p | 2 . However, C p is calculated from on the time-consuming full wave simulation and avoiding such lengthy simulations will largely reduce the training time required in our model. Thus, we choose to optimize the the alternative error between C r and C g , which is defined as d = |C r − C g | 2 . Finally, the error between C p and C g is b = |C g − C p | 2 . In Fig.7, we show the comparison of 6 cases with their respective values of e, d, and b. The results show that the C r , C p and C g agree well with only small errors in the range of 10 −3 to 10 −4 .
This indicates that the generation branch has been successfully trained with the ability to 8 provide promising inverse design satisfying the input coPR spectrum. It is observed that the designs produced by the generator are not the same with the reference upper image.
This is due to the patterns and the EM responses not having one to one mapping. In this case, the network will provide a design with the closest possible EM response in the context of training dataset. Furthermore, we have used a well-trained evaluate branch as a part of GAN rather than directly employing a full wave simulator to compute mapping from EM response to pattern. Those technologies will help expand the expression capability of the model and overcome the data inconsistency 50 .
The above findings shown in Fig.6 and Fig.7 have proved the possibility of using MSD-CNN in complex metasurface design such as PLG patterns. Firstly, the evaluation branch (forward design) can provide an accurate prediction of a broadband EM response from 2 to 12 GHz. Secondly, the generation branch can suggest corresponding PLG based metasurfaces to satisfy the input broadband EM response. In the following section, we will extended this capability to other types of complex metasurfaces, in order to assess the broader performance of MSDCNN and to understand its limitation. ). The PLG pattern requires connectivity, which are common in manufacturing design 44,[56][57][58] . The PTN pattern is the combination of some basic shapes like square (9 pixels), cross (5 pixels), triangle (4 pixels) with four directions, U-shape (5 pixels), and H-shape (7 pixels). The RDN pattern is the fully random pattern with no constraint.

IV. EXTENSION TO OTHER METASURFACES
The coPR of each created pattern is calculated by CST as a function of frequency from 2 to 12 GHz The statistics of the calculated coPR (mean and variance) for each dataset is shown in Fig.8 ResNa. In our comparison, the performance of generation branch heavily depends on the accuracy of the evaluation branch. Therefore, the discussion below will be centered around the performance of the evaluation branch (forward design) to predict the coPR for a given arbitrary metasurface. The performance is evaluated not only by the MSE but also other statistical measurements, such as mean, variance, and kurtosis.

B. Improvement and Limitation
DCNN is well-known for its great generalization in computer vision tasks. For example, Deeper/better DL models or larger dataset can probably enhance performance. Fig.10 demonstrates these efforts with significant improvements. The details of the adapted deeper ML models can be found in the supplementary materials. The Resnet34S model is the deeper version of the Resnet18S which has more parameters and greater capacity. As expected, the performance of Resnet34S is better in training the PTN27000 dataset as compared to Resnet18S, where the agreement in Fig.10(a) of PTNRes34 is better than Fig.9(b). ResNa model combines CNN with long short-term memory (LSTM) network and it has less layers and fewer parameters than Resnet18S. ResNa is expected to handle a more complex pattern like RDN dataset. Thus the results of RDN27000 in Fig.9(c) has been improved by using ResNa model as shown in Fig.10(b). By expanding the sample size of RDN dataset (based on Resnet18S) from 27000 to 108000 (RDN27000 to RDN108000), we also improve the performance as shown in Fig.10(c) in comparison to Fig.9(c). The improvements reported in Fig.10 confirms that good performance can be achieved by using better DL models and/or large training sets for all 3 complex metasurfaces over a frequency range from 2 to 12 GHz.
For completeness, we show 8 random patterns selected from the RDN108000 dataset in Fig.11 to illustrate their small s (mean square error). Note this good performance is due to same type of complex metasurface used in both training and testing.
Upon further analysis using the high-order statistical measure like kurtosis, we notice that the previous analysis based on mean and variance does not reflect the matching fully.
Note kurtosis is a statistical measure to define how heavily the tails of a distribution differ from the tails of a normal distribution, which helps to determine whether the tails of the distribution containing extreme values. The calculated mean and kurtosis are shown in  Fig.9(a) but also has a matching kurtosis as shown in Fig.12(a). The RDN108000 case although have a good improvement in variance in Fig.10(c) in using larger sample sizes, the disagreement in kurtosis remains significant as shown in Fig.12(c). Such disagreement in kurtosis will lead to high bias error in inverse design. For example, Fig.13 shows the performance of generation branch (or inverse design) when the model is trained on the RDN dataset of 108k samples. The CST calculated coPR C p (green) of the predicted image by the generation branch agree well with the desired input coPR C r (red), but not necessarily in good agreement with the estimated coPR C g (blue) from the evaluation branch using predicted image. This finding reveals that MSDCNN-Resnet18S model has high bias error in the generation branch (inverse-design) despite having superior performance in the evaluation branch (forward design). general purpose yet powerful model that requires less resources to train and less parameters to be tuned in comparison to CNN based models. This section focuses on answering these questions by performing cross benchmarking using different training datasets and models.  PTN and RDN).

C. Cross benchmarking between the models
The results marked in red are the best performance (smallest MSE), and the best performance remains on using the same type of dataset for both training and testing. This finding suggests that the models trained on dataset with larger domain do not necessarily perform better on other datasets even with smaller domain of less complexity For example, the models trained on higher-level dataset (RDN) do not perform well at lower-level dataset (like PTN and PLG). By comparing between the RDN108000 (4X more samples) and RDNResNa/PTN27000, it tends to show weaker performance than an under-fit model.
This situation can be regarded as another type of over-fitting on the large domain level, suggesting that the DL-based model is more like a curve-fitting process than learning the physics behind it, which will otherwise allow equivalent performance in using large datasets in training (see move in discussion below). The weak agreement of the model trained on RDN108000, which are tested on PLG and PTN can be found in Fig. 14(a) and 14(b).
To completely investigate the compatibility on the low-level dataset, we need to check the dependence of sample size in the RDN dataset used in the training, which is plotted in  It is important to note that the focus of this paper is due to the curiosity in understanding the limits of MSDCNN in predicting the EM response of complex and random metasurfaces without specific applications in mind. We also ignore any experimental constraints and feasibility of such complex metasurfaces in any applications, which will require future ex-

VIII. DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request. I r and C r together form the training dataset. The input of the Inverse Generator branch is the actual EM response, C r , appended with a fully random vector. I g is the generated pattern from the Inverse Generator branch as well as the input to the Forward Prediction branch. C g is the EM response predicted by the Forward Prediction branch which is trained to minimize the mean squared error between C g and C r . x-axis for each image is frequency from 2GHz to 12GHz, the y-axis is the reflectance ranging from 0 to 1. Every figure contains the input or the desired coPR C r (red), the estimated coPR of the predicted image (by inverse design) C g (blue), and the CST calculated coPR C p (green) of the predicted image. The MSE are e = |C r − C p | 2 , d = |C r − C g | 2 , and b = |C g − C p | 2 . Upper (lower) image is, respectively, the input image and the predicted image by the generation branch.