Deep-learning of Parametric Partial Differential Equations from Sparse and Noisy Data

Data-driven methods have recently made great progress in the discovery of partial differential equations (PDEs) from spatial-temporal data. However, several challenges remain to be solved, including sparse noisy data, incomplete candidate library, and spatially- or temporally-varying coefficients. In this work, a new framework, which combines neural network, genetic algorithm and adaptive methods, is put forward to address all of these challenges simultaneously. In the framework, a trained neural network is utilized to calculate derivatives and generate a large amount of meta-data, which solves the problem of sparse noisy data. Next, genetic algorithm is utilized to discover the form of PDEs and corresponding coefficients with an incomplete candidate library. Finally, a two-step adaptive method is introduced to discover parametric PDEs with spatially- or temporally-varying coefficients. In this method, the structure of a parametric PDE is first discovered, and then the general form of varying coefficients is identified. The proposed algorithm is tested on the Burgers equation, the convection-diffusion equation, the wave equation, and the KdV equation. The results demonstrate that this method is robust to sparse and noisy data, and is able to discover parametric PDEs with an incomplete candidate library.


Ⅰ. INTRODUCTION
In recent years, great advancements have been made in data-driven discovery of partial differential equations (PDEs), whose goal is to identify governing equations of underlying physical processes directly from noisy and sparse observation data. Sparse regression methods, including least absolute shrinkage and selection operator (LASSO) [1], sequential threshold ridge regression (STRidge) [2], sparse Bayesian regression [3] and sparse group lasso [4], have been used to obtain parsimonious models from data for various physical systems. However, three major challenges remain to be solved.
The first challenge is sparse noisy data, which results in serious problems in the calculation of spatial and temporal derivatives. Derivative smoothing methods, such as polynomial interpolation, total variation regularized derivative and integral form, have been utilized to reduce the impact of noise [1,5,6]. Meanwhile, neural networks have been introduced to discover PDEs. For example, the physics-informed neural network (PINN) is proposed by Raissi [7] to find corresponding coefficients with high accuracy when the terms of the PDE are known. PDE-NET is developed by Long et al. [8] to discover time-dependent PDEs, although no parsimony in the PDEs is reinforced.

A. Parametric PDE discovery
In this work, parametric PDEs with spatially-or temporally-varying coefficients are considered. The PDE with spatially-varying coefficients is taken as an example here, and the PDE with temporallyvarying coefficients can be expressed similarly. The form of the parametric PDE can be written as  (2) where T u refers to different orders of derivatives of u with respect to t, such as t u and tt u ; Ф(u) denotes the candidate library of potential terms; αi(x) (i=1,2,...,n) denotes the vector of varying coefficients, with n being the size of the candidate library; and N(·) denotes the linear combination of terms in ) (u  and corresponding vector of varying coefficients αi(x). where Nx is the number of x; Nt is the number of t; and n is the number of possible terms. Eq. (3) can be expressed as follows: The discovery of parametric PDE aims to identify nonzero PDE terms from Θ(u) and the general form of varying coefficients ) (x  . Here, a parsimonious model is expected to be obtained since, for most partial differential equations, only a small number of terms exist, and thus with nonzero coefficients, in Eq. (1) and (2).

B. Architecture of Adaptive DLGA-PDE
In this work, a novel framework, called Adaptive DLGA-PDE, is proposed to discover parametric PDEs from sparse noisy data with an incomplete candidate library. This algorithm constitutes a combination of adaptive methods and DLGA-PDE. Due to the difficulty in discovering the PDE form and spatially-or temporally-varying coefficients at the same time, a two-step adaptive method is utilized. The structure of the PDE is discovered firstly, and then the general form of varying coefficients is identified.
In this framework, sparse noisy data are used to train a fully-connected feedback neural network. The trained neural network is then utilized to calculate derivatives and generate meta-data. Here, two types of meta-data, including local meta-data and global meta-data, are generated. Local metadata are generated within a local window, which is a small part of the entire domain, and global meta-data are generated within nearly the whole domain. For local meta-data, DLGA-PDE is performed to discover the structure of PDEs, although the coefficients may be either incorrect or non-representative, because no global constraint is reinforced at this step. This process is called DLGA-PDE (Structure). Then, the type of varying parameter (spatially-varying or temporallyvarying) is identified by comparing the spatial and temporal stability of DLGA-PDE (Structure). Subsequently, corresponding coefficients at each point (x or t) in the global domain are calculated based on the discovered structure of PDE by global optimization of the neural network with global meta-data. Finally, a new DLGA-PDE method that is different from DLGA-PDE (Structure) is used to discover the general form of the calculated varying coefficients. This process is called DLGA-PDE (Coefficients). The architecture of Adaptive DLGA-PDE is shown in Fig. 1.

C. Neural network
In this work, a deep fully-connected artificial neural network NN1(x,t;θ) is trained to approximate u(x,t). The structure of a typical artificial neural network is shown in Fig. 2. Input of the neural network is spatial-temporal observation data, and output is NN1(xi,tj;θ), where θ refers to the parameters of the neural network, including weights and bias.
The neural network NN1(x,t;θ) is trained by minimizing the loss function: where Nx is the number of x; and Nt is the number of t. The early termination method is utilized to prevent over-fitting. When the training process has been completed, derivatives can be calculated via automatic differentiation. Meanwhile, the trained neural network NN1(x,t;θ) serves as a surrogate model for the underlying physical system that can be employed to generate meta-data. Here, the neural network is briefly introduced, additional details of which can be found in Xu et al. [17].

D. Genetic algorithm
The genetic algorithm is an important part of Adaptive DLGA-PDE, whose function is discovering the form of PDEs with an incomplete candidate library. With a unique digitization method, terms in the PDE can be expressed by genes. Through mutation and cross-over, the genetic algorithm can produce infinite combinations of genes, which greatly expands the search scope of possible PDE terms. In this part, we will briefly introduce the procedure of genetic algorithm utilized in Adaptive DLGA-PDE, including translation, cross-over, mutation, fitness calculation, and selection. Specially, a new principle, called winner-take-all, is introduced to prevent being constrained in a local minimum. Additional details about the genetic algorithm can be found in Xu et al. [18].

Translation
To digitize PDEs, a principle of translation from the structure of PDEs to genomes is proposed. Firstly, numbers are used to denote different orders of derivatives. For example, 0 refers to u, 1 refers to ux or ut, and 2 refers to uxx or utt. It is defined as the gene, which is the smallest unit in the genetic algorithm. The combination of genes is defined to be a gene module. It is assumed that there is only multiplication in a module. For example, [0,1] refers to uux, and [1,1,2] refers to ux 2 uxx. The combination of gene modules is defined as the genome. A genome has two parts, which represent left-hand terms and right-hand terms in the PDEs, respectively. It is also assumed that the left-hand terms are derivatives with respect to t, and the right-hand terms are derivatives with respect to x. Gene modules are connected by addition. For example, [1],{[0,1], [2]} refers to the structure of PDE ut=uux+uxx. Here, for the sake of distinction, right-hand terms are placed in the brace. With the special principle of translation, we can generate a large number of genomes from several basic genes. Each genome corresponds to a specific PDE.

Cross-over
Cross-over is the process in which two parents exchange certain gene modules to produce their children. It is an important way for parents to transfer their genes to the next generation. In this work, the probability of cross-over is 80%. 7

Mutation
Mutation is a key process for producing new genes, which is critical in the genetic algorithm. There are many ways of mutation. Here, three main ways, including order mutation, add-gene mutation and delete-gene mutation, are introduced. In order mutation, certain genes in the genomes will be changed by reducing 1. Particularly, 0 can be changed to the largest number. For example, genome [1],{[0], [1,3]} may be transformed to be [1],{ [3], [1,2]} if 3 is the largest number. In add-gene mutation, a new random gene module is added into the genome. In delete-gene mutation, a certain gene module is deleted from the genome. These three ways of mutation constantly generate new genomes during the evolution process, thereby avoiding the local minimum. In this work, the chance of mutation is set to be 80%.

Fitness calculation
Fitness refers to the viability of a genome in the environment, measured as the quality of the genome. Fitness is usually calculated by a fitness function, which is defined in this work as: for the right-hand terms is obtained by the least square regression, and the mean squared error MSE1 is then calculated. To prevent over-fitting, the l0 penalty is utilized. Here, len(genome) is the length of the genome, and ε is a hyper-parameter, which is chosen according to the magnitude of MSE1. It is worth mentioning that the genetic algorithm uses a derivative-independent way to optimize fitness, and thus the l0 penalty, which is simpler and more effective, can be applied here. In this work, the smaller is the fitness function, the better is the genome.

Selection
In this work, each parent genome cross-overs twice, and the best half of children in each generation is selected to be the next generation of parents. After several generations, the best child whose fitness function is the smallest is the best model.

Principle of winner-take-all
In some cases, the genetic algorithm may easily fall into a local minimum, which is detrimental to finding the best equation. Repeated trial is an option, but is tedious and unstable. Consequently, in this work, a principle called winner-take-all is proposed. This means that the winner will take all of the benefits, while the others receive none. In the winner-take-all principle, except for the best genome, all other genomes will be replaced by new random genomes under a certain probability. 8 This principle will assist to avoid the local minimum and accelerate the rate of convergence. In this work, the principle of winner-take-all is utilized in DLGA-PDE (Coefficients).

E. Procedure of DLGA-PDE (Structure) and DLGA-PDE (Coefficients)
In this subsection, the procedure of DLGA-PDE (Structure) and DLGA-PDE (Coefficients) are introduced. DLGA-PDE (Structure) aims to discover the structure of the PDE from local meta-data within a window, while DLGA-PDE (Coefficients) aims to identify the general form of spatiallyor temporally-varying coefficients. The genetic algorithm is utilized in both processes, but the settings are dissimilar. 100 genomes are produced from basic genes. For each genome, it cross-overs twice and 200 children are generated. For each child, mutation takes place under the probability of 20%. It is worth noting that three ways of mutation take place independently. After mutation, the fitness of each child is calculated according to the fitness function and local meta-data. Children are then sorted according to their fitness, and the first half of the best children are left as the parents of the next generation. This process will repeat until convergence or a maximum number of generations is reached. The best child in the last generation is the discovered structure. The work-flow of DLGA-PDE (Structure) is shown in Fig. 3.

Identifying the type of varying parameters and discovering the best structure
Although the structure of PDE can be discovered by DLGA-PDE (Structure) for a local window in the above, it remains unclear whether the coefficients vary spatially or temporally. Therefore, the type of varying coefficients has to be identified.
Firstly, the coefficients are assumed to be spatially-varying. As shown in Fig. 4, a large number of local meta-data is generated from the neural network NN1(x,t;θ) within several local windows. For each local window, DLGA-PDE (Structure) is performed, and a respective structure is discovered. Among these structures, the structure which occurs stably and most frequently is termed the best (or most possible) structure. Then, we may examine the spatial stability of DLGA-PDE (Structure), which is defined as follows: where Nx is the number of local windows; and best x N is the number of occurrences of the best structure in space. The coefficients are then assumed to be temporally-varying. The respective best structure in time and the corresponding temporal stability St can be discovered in a similar manner.
If the assumption of either spatially-or temporally-varying coefficients is correct, the type of varying coefficients does not change substantially in the local windows, and the structures discovered in multiple local windows will be more stable. In contrast, if the assumption is incorrect, DLGA-PDE (Structure) will be relatively unstable because it will lead to different structures of PDE within different local windows. Therefore, if t x S S  , the coefficient is spatially-varying; otherwise, the coefficient is temporally-varying. After identifying the type of varying parameter, the finally discovered structure is the corresponding best structure.

FIG. 4.
Diagram of the proposed algorithm, with black dots referring to data. The observation data are sparse and noisy, while meta-data produced by the neural network are dense.

Calculating varying coefficients
Although the structure of the PDE has been discovered, the particular values and the corresponding expression of varying coefficients are unknown. As shown in Fig. 4, a large number of global metadata are generated within nearly the entire domain. With the discovered structure and global metadata, the process of calculating corresponding coefficients at each point of x or t in the global domain constitutes an inverse modelling problem, which can be solved by neural network methods [7] or data-assimilation methods, such as ensemble Kalman filter (EnKF) [19,20]. In this work, corresponding values of the varying coefficients are calculated by global optimization of the neural network. The neural network ) , is constructed in a similar way to NN1(x,t;θ), while the loss function is different. Here, the loss function is defined as follows: can be obtained via optimization. It is worth noting that, although the corresponding coefficients at specific locations are obtained in this step, the general form, which is more representative, remains undetermined.

DLGA-PDE (Coefficients)
In DLGA-PDE (Coefficients), it is assumed that the general form of varying coefficients is comprised of elementary functions, such as k1x n , sin(k2x), cos(k3x) and e k4x , and n is a positive integer and ki (i=1,2,3,4) is a real number. As a consequence, basic genes of DLGA-PDE (Coefficients) are (4) Here, the upper sequence refers to possible functions in the general form of varying parameters, and the lower one refers to corresponding coefficients of these terms, respectively. In this work, ki=delta*N (i=1,2,3,...,), where delta is the interval of ki that is set to be 0.001, and N is a random integer from [-10000,10000]. Therefore, the value range of ki is [-10,10], which is sufficient for most common parametric PDEs. For more complex situations, in which coefficients may vary with a higher frequency, a larger value range of ki can be adopted.
After generating genomes, the principle of winner-take-all is performed prior to cross-over and mutation. In this step, the best genome remains, while other genomes are replaced by new random genomes under a certain probability, which is 80% in this work. Due to multiple possibilities produced by these two genome sequences, the genetic algorithm is very easily constrained to a local minimum, which may lead to incorrect forms of the equation. Indeed, the principle of winner-takeall can effectively avoid the local minimum without repeated trails.
Cross-over and mutation are then performed. It is worth noting that the two genome sequences cross-over and mutate synchronously. Particularly, ki mutates via randomly selecting a new value to replace itself. Subsequently, fitness is calculated. For each genome, the corresponding function can be obtained by translation, and terms in the function can be calculated. Here, the fitness function is slightly different, which is defined as: is the vector of possible functions, with the size 1  n ; n is the number of terms in the function; and N is the amount of x in global meta-data. The vector 2   of size n  1 is coefficients of the corresponding functions that are obtained by the least square regression, and the mean squared error MSE2 is then calculated. Here, if_not_constant is the index for distinction between constant and varying coefficients. If the discovered form of the coefficient is a constant, if_not_constant=0; otherwise, if_not_constant=1. This penalty is applied to avoid overfitting constant coefficients by a series of complex formulas. ε1 and ε2 are hyper-parameters, and are chosen according to the magnitude of MSE2. Here, the spatially-varying coefficient is taken as an example, and the fitness for the temporally-varying coefficient can be calculated similarly.
Finally, selection is done according to fitness. When the evolution has converged, the best child is the discovered general form of varying coefficients. The work-flow of DLGA-PDE (Coefficients) is presented in Fig. 5.

Ⅲ. RESULTS
To test the performance of Adaptive DLGA-PDE, three PDEs with spatially-or temporally-varying coefficients are considered, including the Burgers equation, the convection-diffusion equation, and the wave equation. Meanwhile, the KdV equation, which has high-order derivatives, is also investigated. Although the general form of varying coefficients in the KdV equation can be discovered, accuracy needs to be improved. Additional details and discussions are provided in Appendix A.
In this work, it is supposed that the type of coefficients is consistent in the PDE, which means the PDE only has temporally-varying coefficients or spatially-varying coefficients. Two neural networks NN1(x,t;θ) and  NN1(x,t;θ). In addition to the basic case with clean data, four noise levels, including 1%, 5%, 10% and 15%, are added to the data.
To identify the type of varying parameters and discover the structure of PDE, local meta-data are generated from multiple different local windows. With the assumption of temporal-varying coefficients, the local windows are In total, there are 160,000 local meta-data for each local window. DLGA-PDE (Structure) is performed on each local window to calculate Sx and St. Sensitivity analysis shows that DLGA-PDE (Structure) is insensitive to local window position with the correct assumption, but is sensitive to the local window position with the incorrect assumption, which is detailed in Appendix B2. It can be found that St>Sx in all experiments, which means that the coefficients are temporally-varying. Therefore, the best structure is the structure which occurs most frequently with the assumption of temporally-varying coefficients, which is shown in Table I. It is obvious that the discovered best structure is correct.  Table I. It is found that the general form of varying coefficients is discovered with high accuracy, even if the noise level is 15%. The calculated varying coefficients and the identified general form are shown in Fig. 6(a) and (b) to better demonstrate the effect of DLGA-PDE (Coefficients). From the figure, it can be seen that the varying coefficients calculated by global optimization are robust to noise and relatively accurate. Furthermore, the curve that represents the identified general form is smoother and more accurate. To illustrate the accuracy of discovered PDEs, error is defined as:

Parametric convection-diffusion equation
Next, the parametric convection-diffusion equation with a spatially-varying coefficient a(x) is considered, whose form is expressed as: To generate training data, the convection-diffusion equation is solved numerically using the precise integration method with the initial condition (8-x)sin(x) and boundary condition u(0,t)=u (8, Table II, and it can be found that Sx>St in all experiments, which means that the coefficients are spatially-varying. Meanwhile, the best structure is obtained, which is shown in Table II Fig. 6(c) and (d). It can be seen that, although the calculated coefficients have some error when the noise level is 15%, the general form can be discovered by DLGA-PDE (Coefficients) accurately.  Table Ⅲ. From the table, it is found that the general form of varying coefficients is discovered accurately.

B. Discovery of parametric PDE with sparse data
In this part, the performance of Adaptive DLGA-PDE for discovering parametric PDE under different data volumes is investigated. The Burgers equation, the convection-diffusion equation, and the wave equation are again considered. The settings of these PDEs are the same as those in Section Ⅲ. Different amounts of data are randomly selected to train the neural network. For the Burgers equation and the convection-diffusion equation, 25,000 data, 15,000 data, 5,000 data, and 1,000 data are randomly chosen to form new datasets. For the wave equation, 30,000 data, 15,000 data, 5,000 data, and 1,000 data are randomly chosen. Adaptive DLGA-PDE is performed to discover these three PDEs, and results are shown in Table IV, V, and VI, respectively. From these tables, it can be seen that Adaptive DLGA-PDE is able to discover parametric PDEs in extremely sparse data (e.g., 1,000 data), which only accounts for nearly 2% of the total data. This means that Adaptive DLGA-PDE is robust to sparse data.

C. Discovery of parametric PDE with an incomplete candidate library
Finally, the performance of Adaptive DLGA-PDE for discovering parametric PDEs with an incomplete candidate library is investigated. The ability of DLGA-PDE (Structure) to discover the structure of PDEs with an incomplete candidate library has been tested in our previous work, and additional details can be found in Xu et al. [18]. In this part, the ability of DLGA-PDE (Coefficients) to discover the general forms of coefficients in PDEs with incomplete basic genes is discussed.
Here, the Burgers equation is considered again. The basic gene of DLGA-PDE (Coefficients) is changed to be [αi]{1(0),k1t(1),sin(k2t)(2),cos(k3t)(3)}, with other conditions unchanged. In this case, the correct function e k4t is not contained in the basic gene, which means that it can only be produced by mutation. The best child in each generation is recorded, and the results are presented in Table VII. From the table, it can be seen that Adaptive DLGA-PDE discovered the incorrect form at first due to the absence of the correct function in the basic gene. However, after two generations, the correct function emerges via mutation, and Adaptive DLGA-PDE converges and discovers the correct form of the coefficient in PDE finally. This indicates that this method has the ability to discover parametric PDE with an incomplete candidate library. Additional details are provided in Appendix C, where the convection-diffusion equation and the wave equation are also tested, and satisfactory outcomes are obtained.

Ⅳ SUMMARY AND OUTLOOK
In this work, we proposed a new framework combining adaptive methods and DLGA-PDE, called Adaptive DLGA-PDE, which aims to discover the general form of parametric PDEs from sparse noisy data. This algorithm utilizes DLGA-PDE (Structure) to discover the structure of PDEs, and employs DLGA-PDE (Coefficients) to identify the general form of varying coefficients. Faced with the three main challenges mentioned in Section I, Adaptive DLGA-PDE provides a systematic and comprehensive solution. In Adaptive DLGA-PDE, a neural network is utilized to calculate derivatives and generate meta-data, which solves the problem of sparse noisy data. Meanwhile, a genetic algorithm is applied to discover PDEs with an incomplete candidate library via mutation and combination of genes, which solves the problem of incomplete candidate library. Finally, a twostep adaptive method is developed to discover parametric PDEs, which solves the problem of varying coefficients. Numerical experiments demonstrated that Adaptive DLGA-PDE still performs well when the noise level is 15%, except for the KdV equation, which means that it is robust to noise and is able to handle various types of parametric PDEs.
Compared with PDEs with constant parameters, discovery of parametric PDEs is more complex and more easily affected by noise. Therefore, it is unprecedented in previous work that Adaptive DLGA-PDE is able to discover the correct form of parametric PDEs with high accuracy at 15% noise. The performance of our proposed method, however, is not satisfactory when discovering the parametric KdV equation. This is because the KdV equation has third-order derivatives, which may bring a relatively large error when calculating derivatives. To handle the problem of high-order derivatives, the weak form of PDEs mentioned in Section I may constitute a viable solution [12][13]. The weak form of PDEs is expressed by integral form, which will reduce the order of derivatives needed to be calculated. Integration is indispensable and important in the weak form, but it is difficult to calculate with discrete data. Therefore, our proposed algorithm is particularly suitable for performing integration because it can generate a large amount of meta-data on a regular grid, which assists in calculating integrals. Determination of how to combine the weak form and Adaptive DLGA-PDE constitutes a worthy topic of future work.
Numerical experiments also indicate that Adaptive DLGA-PDE is able to discover parametric PDEs with sparse data, even with only 2% of the total data. Moreover, the experiment in which the true term is not contained in the basic genes indicates that this method can discover parametric PDEs with an incomplete candidate library. Different from previous works, which are based on prior 20 knowledge about whether the coefficients are spatially-varying or temporally-varying, Adaptive DLGA-PDE is able to identify the type of varying parameters, which greatly increases its potential for application.
To further investigate the stability of Adaptive DLGA-PDE and its application to address more complex problems, other experiments are carried out. Sensitivity analysis shows that the selection of local window size is important for discovering a correct structure. It is also observed that DLGA-PDE (Structure) is insensitive to the position of the local window with the correct assumption of varying coefficients. In total, Adaptive DLGA-PDE is very stable. Inspired by the stability of Adaptive DLGA-PDE, a more complex problem in which the structure of PDE is different in different domains is considered, and the results demonstrate that Adaptive DLGA-PDE performs well by producing local meta-data in different domains. Additional details about this problem are provided in Appendix D.
At the same time, Adaptive DLGA-PDE possesses certain limitations. For example, if the coefficient oscillates rapidly with a large amplitude, it may be difficult for DLGA-PDE (Structure) to discover the correct structure because coefficients with a large amplitude cannot be seen as a constant, even in a small local domain, which will lead to the failure to discover the true structure of PDE. In addition, the choice of hyper-parameters, including ε, ε1 and ε2 in the fitness function, is significant to discover the true PDE, but it is now decided by experience and is sophisticated to adjust. Meanwhile, if the form of varying coefficient is complex and cannot be represented by the combination of elementary functions, it may be difficult for Adaptive DLGA-PDE to discover the true general form of the varying coefficients. To solve this problem, discovering the form of Taylor expansion or an approximate substitution function may be a viable choice. Moreover, it may be difficult for Adaptive DLGA-PDE to directly discover PDEs whose coefficient is a random field. Dimension reduction techniques, such as Karhunen-Loeve expansion, singular value decomposition and auto-encoder, may be needed to parameterize the random field. Further investigation of these issues is necessary.

APPENDIX A: DISCOVERY OF PARAMETRIC KDV EQUATION BY ADAPTIVE DLGA-PDE
In this section, the parametric KdV equation, which has a third-order derivative, is considered. Its form reads as follows: The KdV equation is solved numerically using the finite difference method, with the initial condition u(0,x)=cos(πx) and boundary condition u (-1,t)=u(1,t) Table AI.
From the table, it can be seen that, although the structure and general form of varying coefficients are discovered successfully, the error is relatively large. There are many reasons contributing to the large error. Firstly, the parametric KdV equation has a third-order derivative, which is difficult to calculate accurately. Although automatic differentiation of the neural network is utilized to calculate derivatives, when faced with high-order derivatives, the error of calculated derivatives will still be large and affect the performance of Adaptive DLGA-PDE. Secondly, faced with the parametric KdV equation, which is a complex dynamic system, the structure of the neural network and the choice of activation function are important for learning its characteristics. The neural network NN1(x,t;θ) in this work may not work well. Therefore, discovering a more appropriate structure of neural network and type of activation function will assist to learn the parametric KdV equation and calculate derivatives more accurately. Finally, for equations with highorder derivatives, more data are needed. Rudy et al. [16] investigated the Kuramoto-Sivashinsky (KS) equation, which has a fourth-order derivative. In their work, 262,144 data are used, and the results are robust to only 0.01% noise.

Sensitivity to local window size
To investigate the sensitivity of DLGA-PDE (Structure) to local window size, which is the size of the domain where local meta-data are generated, the Burgers equation is first taken as an example. In the sensitivity analysis, the type of varying parameter is supposed to be known. Local window size, which is referred as L, is changed. When generating local meta-data, 400 spatial observation . Other conditions are the same as those in Section IIIA. The structure discovered by DLGA-PDE (Structure) is recorded and compared with the true structure, which is shown in Fig. B1. In this figure, if the fitness of the discovered structure and the true structure are equal, this means that DLGA-PDE (Structure) has discovered the true structure. From this figure, it can be seen that the true structure can be discovered, even if the ratio of the window size to the entire length of the interval is nearly 50%.
Other conditions are the same as those in Section IIIB. The discovered structure is recorded and compared with the true structure, which is shown in Fig. B2. It is obvious that when the ratio of the window size to the entire length of the interval is less than 60%, the true structure can be successfully discovered.
Finally, the wave equation is considered. When generating local meta-data, 400 spatial observation points are uniformly selected from ] 2 3 , 2

[
Other conditions are the same as those in Section IIIC. The discovered structure is recorded and compared with the true structure, which is shown in Fig. B3. It can be seen that the true structure can be discovered, even if the window size is as large as the entire length of the interval.
In general, DLGA-PDE (Structure) is able to discover the true structure when the local window size varies over a wide range, which indicates that it is not sensitive to the local window size. 2. Sensitivity to local window position Next, the sensitivity of DLGA-PDE (Structure) to local window position is investigated. The parametric Burgers equation with temporally-varying coefficients is first taken as an example. The basic case with clean data is investigated. Conditions are the same as those in Section IIIA. With the assumption of spatially-varying coefficients or temporally-varying coefficients, the structure discovered by DLGA-PDE (Structure) within each local window is recorded, respectively. The outcome is shown in Table BI. For this case, it can be calculated that Sx=0.3 and St=0.8. It is obvious that St>Sx. It can be found that, if the assumption is correct, structures discovered in multiple local windows are stable. Meanwhile, if the assumption is incorrect, the discovered structure changes according to the position of the local window. This means that DLGA-PDE (Structure) is insensitive to local window position with the correct assumption, but is sensitive to the local window position with the incorrect assumption.