Extending the reach of quantum computing for materials science with machine learning potentials

Solving electronic structure problems represents a promising field of application for quantum computers. Currently, much effort has been spent in devising and optimizing quantum algorithms for quantum chemistry problems featuring up to hundreds of electrons. While quantum algorithms can in principle outperform their classical equivalents, the polynomially scaling runtime, with the number of constituents, can still prevent quantum simulations of large scale systems. We propose a strategy to extend the scope of quantum computational methods to large scale simulations using a machine learning potential, trained on quantum simulation data. The challenge of applying machine learning potentials in today's quantum setting arises from the several sources of noise affecting the quantum computations of electronic energies and forces. We investigate the trainability of a machine learning potential selecting various sources of noise: statistical, optimization and hardware noise.Finally, we construct the first machine learning potential from data computed on actual IBM Quantum processors for a hydrogen molecule. This already would allow us to perform arbitrarily long and stable molecular dynamics simulations, outperforming all current quantum approaches to molecular dynamics and structure optimization.


I. INTRODUCTION
In view of the exponential speed-up that can be achieved in solving electronic structure problems quantum computers have the potential to revolutionize the field of quantum chemistry [1][2][3][4] .Substantial experimental and theoretical advances have been made in the last years, both concerning the realization of quantum computing platforms [5][6][7][8] , and the development of new generations of quantum algorithms [9][10][11][12] .
In a nutshell, by means of quantum computers one can in principle accurately solve the Schrödinger equation with an algorithm that scales polynomially in both, memory and runtime, while exact classical configuration interaction methods scale exponentially with the system size.All the various quantum algorithms proposed to solve this problem, from eigenstate projection methods using quantum phase estimation 2, 13 , to variational approaches 14,15 , feature a O(N 4 ) scaling where N is essentially the system size (or more specifically the number of basis functions used to represent the system).
The origin of this scaling is the number, O(N 4 ), of terms present in an electronic Hamiltonian in second quantization 4 .This means that the complexity of one Trotter step in the Hamiltonian evolution primitive also scales with O(N 4 ).Concerning the variational approach, the number of measurements needed to compute the energy necessarily scales with the number of terms in a) Electronic mail: gma@zurich.ibm.comb) Electronic mail: ita@zurich.ibm.com the Hamiltonian 16 .Moreover, the most promising variational circuit to represent chemical systems, the unitary coupled cluster (UCC) ansatz, also feature a complexity of O(N 4 ) gates, for the same reason.
Much of the effort spent so far has been focused on the solution of medium-sized chemical systems, while virtually no attempts have been made to propose a feasible strategy for simulations of extended systems, central in materials science.
Crucially, the asymptotic exponential speed-up that a quantum computation can offer, may not be sufficient to simulate bulk systems.While end-to-end resource estimates confirm that quantum computers equipped with optimized quantum algorithms can perform chemistry calculations that are completely unfeasible for current classical solvers 17,18 , the O(N 4 ) scaling can still represent a practical barrier to achieve quantum speed-up in first-principle simulations of extended systems.
For instance, approximate classical electronic solvers like Density Functional Theory (DFT) 19 or quantum Monte Carlo 20,21 , which feature a O(N 2 ) − O(N 3 ) scaling, stand at the edge of what we can really consider large scale simulations.Indeed, first-principle molecular dynamics (MD) simulations, powered by DFT, can routinely tackle systems featuring an order of 10 3 electrons on picosecond (ps) time scales.These sizes and timescales (not to speak of accuracy) are often not sufficient to realize a converged setup to study many physical systems, ranging from chemical reactions in solutions, nucleation processes, and phase transitions.
On top of that, the quantum gate frequency is typically orders of magnitude slower than on a classical CPU 17,22 .This means that a large prefactor should also be taken arXiv:2203.07219v1[quant-ph] 14 Mar 2022 into account when considering the runtime of a quantum algorithm.All in all, it seems unlikely that an exact quantum powered electronic structure method, featuring a runtime of O(N 4 ), will achieve large scale simulations of bulk systems, whereas a classical approximate solver like DFT showing a better scaling, O(N 3 ), as well as a much better prefactor, struggles.Furthermore, the "time" dimension of the problem will likely constitute a major bottleneck for first-principle simulations powered by quantum electronic structure solvers, as much as it is for present classical ones.An MD simulation implies that a sequence of electronic structure calculations needs to be performed serially 23 , each for a new generated displacement, and the total runtime of the numerical experiments also grows with n T , the number of time steps.Given that the typical integration time step of an MD simulation for molecular systems is 0.2 femtosecond (fs), n T should be at least of order 10 4 for a ps long dynamics.However, that can still be too short to produce predictive results for many condensed matter problems of interest.For instance, phase transitions and complex chemical reactions may take place on a timescale of nanoseconds or more.
Recently, machine learning (ML) solutions have been put forward to overcome such size and timescale barriers in first-principle simulations driven by DFT electronic solvers [24][25][26][27][28] .The last two years witnessed an exceptional increase in quantity and quality of ML-powered numerical experiments, such that this approach is on the path to become a standard in materials science [29][30][31][32] .
In this manuscript, we propose that first-principle quantum computing simulations of materials should follow the same approach.Namely, data coming from quantum hardware should be used to harvest high-quality electronic structure datasets to generate a machine learning potential (MLP), rather than to drive an MD trajectory directly.
However, the combination of the two approaches is less straightforward compared to the case where the dataset is generated through DFT calculations, as present quantum computation is subject to several forms of noise, which will impact the quality of the dataset.In this work, we focus on three types of quantum noise: (i) the statistical noise, closely connected to the problem of measurements in quantum mechanics; (ii) the variational optimization noise, which is inherent to variational approaches; and (iii) the hardware noise, which originates from device errors.
The manuscript is structured as follows.Section II reviews the general idea of MLPs.Section III introduces the general concepts for quantum electronic structure calculations.In section IV we discuss the effect of the different noise sources on quantum computations.Section V provides a brief description of the applied Neural-Network force-field approach.We present and discuss the results in section VI and conclude in section VII.

II. MACHINE LEARNING POTENTIALS
Machine learning approaches are the cornerstone in many technological applications, ranging from image/speech recognition, search and recommendation engines, to data detection and filtering 33 .
While in the past ML methods, due to their power of compressing high-dimensional data into low-dimensional representations 34,35 , have been mostly applied to data science, we have recently witnessed an increased interest for applications in the physical sciences, and particularly in quantum mechanics 28,36 .For instance, several ML methods have been put forward to solve the manybody Schrödinger equation (in a reinforcement learning fashion) [37][38][39][40] , to learn quantum states from measurements (unsupervised learning) [41][42][43][44] , or to learn materials or chemical properties from datasets (supervised learning) 28,29 .The general idea in these approaches is to search large databases for non-trivial relationships between the molecular or crystal structures (i.e.atomic positions and nuclear charges {R, Z}) and several properties of interest.These include, for instance, semiconductor's band-gaps and dielectric constants 45 , atomization energies of organic molecules 46,47 , energy formations of crystals 48 , or the thermodynamic stability of solids and liquids 49,50 .To do so, one first performs the training of the ML algorithm on a finite subset of known solutions ({R, Z} → p), and then predicts the properties p of interest for new, unseen structures, which differ in composition and geometry.
The construction of MLPs, first pioneered by Blank et al. 51 , and later by Behler & Parrinello 52,53 falls within this class.
In essence, the approach works as follows: (i) one generates a training dataset of M configurations {R, Z}.Each sample contains atomic positions (the number of atoms in the sample can be limited, provided that interactions are local in space), charges and the energy E (or forces f) calculated with an ab initio electronic structure method, such as DFT.(ii) The learning process consists of generalizing the mapping {R, Z} → E to out-ofsample configurations, bypassing the need of solving the electronic structure problem at each MD iteration, thus achieving a considerable speed-up in simulations.
As a result, first-principle modelling can now reach size and time scales, which were previously possible only with computationally cheap, but approximated, empirical force-fields.The technique has already been applied to several long-standing problems in materials modelling, such as the phase diagram of liquid and solid water 31,50 , silicon 32,54 , and dense hydrogen 30 , to name a few.
In this work, we adopt the MLP approach based on a neural-network architecture 52 , as implemented in the software package n2p2 (version: 2.1.1) 55, in combination with a training based on a quantum computing evaluation of the electronic structure.Details of the set-up will be outlined in Sect.V. Before moving already to this rather technical part, let us discuss first the quantum computing aspect of the work.

III. QUANTUM COMPUTING FOR ELECTRONIC STRUCTURE PROBLEMS
The starting point of most electronic structure problems in chemistry or materials science is the electronic Hamiltonian written in the second quantized representation 3,4,56 , with h rs (R) denoting the one-electron integrals and g pqrs (R) the two-electron integrals, respectively.The vector of nuclear coordinates R = (R 1 ,R 2 ,...,R N I ) ∈ R 3N I of N I nuclei parameterizes the electronic Hamiltonian.The operators â † r (â r ) represent the fermionic creation (annihilation) operators for electrons in N molecular spin-orbitals (MOs).The term E nn (R) represents the classical nuclear repulsion energy.
The implementation of Eq. (1) requires the translation of each fermionic operator into a qubit operator, that can be interpreted by a quantum computer.This can be achieved by several fermion-to-qubit mappings such as the Jordan-Wigner or the Bravy-Kitaev mapping (we refer to standard reviews such as Refs 3,4,56 for more details).After this mapping, the Hamiltonian operator has the following form where each N -qubit Pauli string Pk is an element of the set , Ẑ}} (tensor products of N single qubit Pauli operators).As discussed above, the total number of Pauli terms scales as O(N 4 ).There exists several methods to solve for the ground state of Eq. (1) using a quantum computer.The method of choice, when fault-tolerant quantum computers will be available, is to perform quantum phase estimation to project onto eigenstates of the Hamiltonian 4,13,57 .Another strategy is to obtain a variational approximation of the ground state using the variational quantum eigensolver (VQE) 4,14 .This heuristic method features parameterized quantum circuits, defined in terms of parametric gates.This generates a variational quantum state |Ψ(θ) , often called trial state, defined by the array of parameters θ.The parameters are then optimized classically to reach the minimum for the energy The expectation value in Eq. ( 3) is calculated as the sum of the expectation values Pk of the single Pauli operators, multiplied by the respective scalar coefficients c k .Each Pk value is obtained through sampling from the prepared state |Ψ using S k measurements, hence S k repetitions of the same circuit (see Ref. 58 for details).The statistical error associated with the evaluation of Pk decreases as 1/ √ S k .Finally, to construct a training dataset using the VQE algorithm, one just needs to create the second quantized Hamiltonian Eq. (1) for a set of generated atomic structures {R, Z}, perform the fermion-to-qubit mapping to generate the qubit Hamiltonian Eq. (2), and finally optimize a parameterized circuit |Ψ(θ({R, Z})) to obtain the energies of the atomic structures.

IV. ERROR SOURCES AND TRAINING WITH NOISY DATASETS
The combination of VQE with ML for force-field generation would be trivial if not for the presence of error sources that are absent in classical DFT calculations.These errors pertain only to the quantum nature of the hardware, and while they can be systematically reduced, some of them will remain finite, assuming practical setups.
In this manuscript, we consider three main noise sources that can affect the quality of the dataset generated with a quantum computer.

A. Statistical noise in expectation values
This type of noise is linked with the way observables are computed in the quantum setting.As discussed above for the case of the energy, the expectation value of an operator is computed as the sum of the expectation values of its Pauli terms.The variance becomes, where Var[ Pk ] = P 2 k − P k 2 is the variance of the Pauli string Pk .It is easy to see that the variance is always finite, even if we consider the exact ground state of H. Since the ground state is an eigenstate of the sum, but not of each single Pauli operator P k , the total variance is always positive 59 .The error in the estimation is therefore given by where S k is the number of measurements, used to estimate the k-th term, with K k=1 S k = M , and M the total number of measurements.For instance, for an 8 qubit Hamiltonian operator representing the H 2 molecule at equilibrium bond distance in the 6-31g basis set, the number of shots M required to compute the energy within chemical accuracy (1.6 mHa) is on the order of 10 844 .
As of today, many strategies have been put forward to at least mitigate this issue 44,[60][61][62][63][64][65][66][67][68][69] .To the best of our knowledge these methods can save at most three orders of magnitude in the number of shots 44 , but cannot remove entirely the problem.
Without loss of generality, we can therefore assume that the expectation value of an operator O decomposed as a sum of Pauli terms (Eq.(2)) will always take the form even if the exact ground state can be represented by the quantum circuit.The operators of our interest are the energy E and the set of N I atomic forces F = (F 1 , F 2 , . . ., F N I ), that can also be decomposed into Pauli strings and measured alongside the energy 70 .This labeling error needs to be taken into account when training an ML model.Finally, we notice that statistical errors in the expectation values of energy and forces are present also in some classical electronic solvers like quantum Monte Carlo 71 .

B. Variational and optimization error
The second type of error we consider is the variational error.This happens because the exact ground state generally lies outside the region of the Hilbert state that can be represented by the variational ansatz.
So far, different types of circuits have been employed in quantum computing calculations.They can be roughly divided into two classes.The first class contains chemically inspired circuits.These circuits generally feature few variational parameters, but a fairly large depth, and therefore are still unsuited for present day's hardware.The most popular of them is the unitary coupled cluster (UCC) circuit 4 , with a depth growing as O(N 4 ), in the commonly used version where the excitations are truncated at the second order (UCCSD).
The second class consists of hardware efficient ansatze 15 .These circuits prepare entangled states while minimizing the circuit depth.They usually feature many more variational parameters, therefore they offload part of the computational burden to the classical optimizer.
Indeed, even in the case when the exact ground state, or a close approximation of it, is theoretically within the representability range of the ansatz, suboptimal energy minimization can lead to poor results.Eq. ( 6) is thus modified as where var is the error coming from a non-ideal variational optimization.In this work, we will study how this error depends on the chosen circuit and how it impacts the training of an MLP.

C. Hardware noise
In the era of noisy quantum devices, errors occur in the execution of a quantum circuit on actual quantum processors.As a result, data sets prepared via quantum computing methods will be affected by inaccuracies even in the ideal case of a perfect choice of the ansatz (see Sec. IV B above).It is therefore important to assess the possibility of successfully training a good MLP even in the presence of these effects.Incoherent errors and readout noise, which may increase fluctuations and bias in the energy evaluations and can even hinder the optimization of VQE ansatzes 72 , are particularly important in this context.
Errors belonging to the first class, namely incoherent noise, are primarily due to unwanted and uncontrolled interactions between qubits and their environment throughout the whole computation.These formally translate into finite relaxation and coherence times, named T 1 and T 2 respectively, which essentially correspond to amplitude and phase damping effects.
Readout errors instead affect the qubit measurement process: these may be modelled as bit flip channels which stochastically produce erroneous assignments while the state of a qubit is being probed.
Finally, coherent errors may also arise in the implementation of single-and 2-qubit logic gates, primarily due to imperfect device calibration and manipulation.These typically result in systematic infidelities of the individual operations.
Two observations are in order.On the one hand, it would seem cautious to expect that standard ML techniques will not be able by themselves to compensate for hardware noise, unless specifically designed for this purpose [73][74][75] : as a result, a minimal well posed target is to show trainability of an MLP up to an overall model error -with respect to noiseless exact values -matching as close as possible the characteristic inaccuracy induced by noise on the training points.On the other hand, one should also keep in mind that fast technological advancements, possibly in combination with error mitigation techniques 64,[76][77][78][79][80][81][82][83][84] , will progressively reduce the impact of hardware noise.It is therefore interesting to investigate the possible improvements that ML generated potentials could enjoy in the future, showing that their quality could closely follow the increased accuracy of the available datasets.

V. NEURAL-NETWORK FORCE FIELDS
In this paper, we adopt the high-dimensional neuralnetwork potential (HDNNP) architecture of Behler and Parinello 52 for the machine learning potential (MLP).For the general motivation and the description of this ML model we refer to a review by Jörg Behler 53 .Here, we provide a detailed discussion of some non-trivial aspects of the architecture, which are also important to reproduce our results.
We use the following procedure for the training of an MLP.(i) Prepare a training and a validation dataset.This should be done with VQE as explained in Sect.III.(ii) Fix the neural-network architecture (neural-network geometry, learning parameters, symmetry functions) (iii) Train an MLP using the training dataset.(iv) Evaluate the MLP on the validation dataset.(v) Repeat steps (iiiv) for different sets of hyperparameters.(vi) Choose the MLP with the lowest prediction error on the validation dataset.
The prediction error is measured in terms of the rootmean-square error (RMSE), which is defined for the energy (E) as and for the forces (F) as where the explicit dependence of N i a on the sample index i comes from the fact that in general the dataset contains structures with different number of atoms.Notice that the dataset labels are normalized as explained in Appendix A.
Another crucial ingredient for this procedure are the symmetry functions.These are many-body functions that capture, in a compact fashion, the structural information in the local environment of an atom.The symmetry functions values are the real inputs for the NN, instead of the raw Cartesian coordinates of the atoms.The main motivation behind this choice is that translational and rotational invariance can be easily implemented 53 .
In this work, we adopt so-called G 2 and G 3 symmetry function classes.The first is a family of radial symmetry functions made of two-body terms, while the second contains also three-body terms, which are needed to encode the tridimensional structure of an atomic configuration.We provide in Appendix B the explicit functional form of these functions, as well as other details needed to reproduce our settings.
It is important to notice that one would like to avoid redundancies in the symmetries function set.In this work, we first define a set of candidate symmetry functions, then select a small subset that still enables us to capture the structural information of a given dataset.To this end, we adopt the automatic selection of symmetry functions as proposed by Imbalzano et al. 85 , which is detailed in Appendix C.
However, in the case of the bulk systems of Sect.VI A which have already been studied in Refs 30,86 , we adopt the symmetry functions already used in the respective publications.

VI. RESULTS AND DISCUSSION
We now present the results of the proposed HDNNP approach trained with electronic structure calculations performed with a quantum algorithm, and affected by typical noise sources compatible with near-term quantum computers.We proceed systematically by analyzing the impact of each different noise source on quality of the predictions for a series of model systems.For the statistical error analysis, we start with the study of the effect of a Gaussian distributed noise model on the energies of forces evaluated for liquid and solid water.We then proceed to the investigation of a smaller system, namely a single water molecule, which can be implemented on today's quantum devices and for which a resource assessment is possible.Finally, we validate our approach for the case of the H 2 -H 2 cluster, where the sampling of intermolecular distances and orientations is required.The analysis of the impact of the optimization errors on the quality of the HDNNP predictions is investigated for the same water molecule system introduced above, using different wavefunction ansaetze.Finally, the effect of hardware noise is investigated on the simpler molecular system, namely H 2 , for which we can efficiently perform the required sampling of the intramolecular distance both in simulations and in hardware experiments.Details on the simulation parameters and system setups will be introduced in order of appearance in the following sections.
A. Resilience against statistical noise 1.A bulk system example: liquid and solid water The first of our assessments concerns the trainability of an MLP in the presence of the statistical noise alone (see IV A).This study can be performed already in a prototypical bulk system, which is the end goal of the whole technique.Indeed, the statistical noise in the labels of the training dataset can be easily and rigorously emulated by adding a Gaussian distributed random variable with zero mean.For each structure in the training dataset the reference energy and forces are modified according to where E is the energy of the structure and F µ i is the force corresponding to atom i and component µ = {x, y, z}.∆ E and ∆ F correspond to the variance of the statistical noise that is introduced for the energy and the forces, respectively.
In this study, we consider a bulk water system.The dataset is taken from Ref. 87 and contains 7241 configurations of ice and liquid water.The energies and forces were calculated with DFT using the RPBE functional 88 with D3 corrections 89 .The mean energy in the dataset is -694.47eV/atom with a standard deviation of 0.11 eV/atom.The standard deviation of the forces is 1.225 eV/ Å.Here we follow the reasonable assumption that the potential energy surface obtained with a DFT model is in qualitative agreement with the exact one, and that the remaining difference does not play any role in this particular assessment concerning the learnability of an MLP from a noisy dataset.
We then use the noisy training datasets to fit the MLPs and a noiseless validation dataset to assess their accuracy.The amount of noise in the energies and the forces is varied independently.The values considered for the energy noise are {1000, 100, 10, 1, 0.1} meV/atom.For the force noise the values {10, 1, 0.1, 0.01, 0.001} eV/ Å are used.As a reference also the training with no noise in the energy and/or the force labels is considered.The symmetry functions are taken from Ref. 55.The resulting prediction RMSE values of the trained MLPs are shown in Fig. 1.
First of all, we observe that the prediction accuracy for the training on the dataset without noise (cell in the top right corner in the plots of Fig. 1) is consistent with Ref. 90 (0.7 meV/atom for the energy and 0.036 eV/ Å for the forces).Most importantly, we notice that there exists a limiting value of ∆ E and ∆ F below which the prediction accuracy is as low as in the noiseless case.The important consequence is that one is not forced to reduce the statistical error bars in the dataset to zero, enabling in principle a practical implementation of the method.

A single water molecule
The goal of this section is to introduce a smaller system, a single water molecule, for which a quantum resource assessment is feasible.We will translate the error threshold ∆ E into a quantum measurement resource estimate.The single molecule configurations are extracted from the bulk water dataset used in the previous section.
We first repeat the above assessment using simulated noise.Concerning the training, we select 20 symmetry functions for hydrogen and 15 symmetry functions for oxygen with the CUR feature selection method 85 (see Appendix C).In this case, we also consider the possibility to train the MLP without the use of the forces.In this case, it is interesting to also assess the dependence of the RMSE on the dataset size.The results of both, the training with and without using the forces, are shown in Fig. 2. We observe qualitatively similar behavior as for the bulk water.
In view of the non-trivial computational cost of an electronic structure calculation on a quantum computer, we aim to reduce the number of configurations in the training and validation dataset as much as possible, using the CUR decomposition 85 .In Fig. 2 (right panel) we observe that we can reduce this number down to 100 configurations in the training set, without a noticeable increase in the RMSE.
In terms of the dependence on the energy noise, the behaviour of the RMSE in the training with and without using the forces is qualitatively the same, with an energy noise threshold of 10 meV/atom.However, there is a small advantage if the forces are used in the training.The calculation of the forces with VQE was already proposed by Sokolov et al. 70 .However, due to technical limitations and the fact that the training without the forces is possible, we focus on the energy noise threshold in the following discussion.
The next step is to assess the number of shots M (see Sect.IV A) required to achieve the desired accuracy.We use the energy noise threshold of 10 meV/atom.
To estimate the number of total measurements M to reach a certain accuracy E in the energy estimation of an N -qubit Hamiltonian we consider the probability p (δ < E) that the deviation of the energy estimate to the ground state energy E 0 is smaller than the desired accuracy.Following Ref. 44, this probability is given by where S = M/K is the number of measurements per Pauli operator Pk and σ 2 [H] is the measurement variance of the Hamiltonian.The estimate for the total number of measurements is then given by the number of measurements that is required to reach p (δ < E) ≈ 1.
A loose upper bound of the resource estimation p max can be obtained by determining the variance in the equation above with where c k are the coefficients of the qubit Hamiltonian in Eq. (2) 59 .However, a more realistic estimate should be performed by directly emulating the quantum measurements process, where σ 2 [ Pk ] is the variance of the samples obtained from the measurement of the Pauli string Pk .The water molecule consists of three atoms and therefore the desired accuracy to be inserted in the previous formula is E = 30 meV, which is comparable with chemical accuracy (1.6 mHa ≈ 43.5 meV).
We then define the second quantized Hamiltonian using the molecular orbitals obtained from the minimal STO-3G atomic basis set.The fermion-to-qubit mapping is then achieved using the parity mapping 91 .This results in a Hamiltonian encoded on 12 qubits.We further reduce this requirement down to 9 qubits by exploiting the mapping-specific two-qubit reduction, the planar structure of the molecule (this feature holds even in the presence of distortions) as well as the freezing of the core   The probability p(δ < E) that the deviation of the ground state energy estimate of the H 2 O molecule to the exact ground state energy is less than E = 30 meV is shown in Fig. 3.
From Fig. 3, we observe that a probability of p(δ < E) ≈ 1 is reached for a total number of about 10 10 measurements.However, advances in quantum measurement protocols are expected to improve this estimate by some orders of magnitude 15,44,65,66,69,92 .

H2-H2 cluster
This model system features two hydrogen molecules, with intramolecular distances sampled from a Gaussian distribution having mean µ = 1.42 Bohr and standard deviation σ = 0.03 Bohr.The intermolecular distances are instead sampled from a skewed distribution with two Gaussian tails of different widths.This corresponds to distances between about 4.5 Bohr and 10 Bohr and a Eq. ( 13) would exceed 10 12 (orange line).
mean value of 6.0 Bohr.Their respective molecular orientations are also sampled randomly.This system is particularly challenging, since it is either unbounded or weakly bounded, depending on the level of theory 93 .We perform the same kind of assessment and investigate the RMSE on the energy and the forces as a function of the strength of the artificially generated statistical noise.We use a training dataset of 1000 configurations, and we compute the label using DFT, applying the PBE functional with D3 corrections.The mean energy of the dataset is −15.8066eV/atom with a standard deviation of 2.75 meV/atom.The standard deviation in the force labels is 0.318 eV/ Å. Fig. 4 shows qualitatively similar results as for the bulk water and the single water molecule cases.However, this time the energy noise threshold to be met in order to get an RMSE comparable to the noiseless label case is about 0.1 meV/atom.This target is more demanding compared to the single water molecule case as the energy scale of the bounded cluster is also much smaller (about 1.9 meV/atom).For this reason, the number of shots required to compute the energy within the error of 0.1 meV/atom is about 10 12 .

B. Resilience against optimization errors
In this section, we discuss the kind of error we can expect when using variational approaches for the calculation of energy expectation values.More specifically, we consider the case when the electronic structure calculations are not fully converged.In particular, we test two types of variational circuits: the unitary couple cluster ansatz and the heuristic ansatz.and the force labels (y-axis) in the training dataset.
Clearly, the MLP is not supposed to improve upon the level of theory at which the dataset has been computed.Therefore, here we focus on the impact of unconverged variational optimization in the training.For instance, an ansatz featuring many variational parameters can be more difficult to optimize compared to others, thus producing a dataset with scattered labels.
We compare the performance of UCCSD 94 with a heuristic ansatz, the so-called RY-CNOT circuit, with linear connectivity and depth 24, meaning that the circuit features 24 repeating subunits of RY single qubit gates and a cascade of CNOT (or CX) gates, which represent an entangling block.This depth was necessary to obtain results comparable with the UCCSD ansatz.These results refer to the 9 qubit single water molecule model introduced above.The UCCSD ansatz features 58 variational parameters with a total CNOT gate count of 4056, while the heuristic ansatz contains 225 parameters but a less complex circuit, made of 192 CNOT gates.
In Fig. 5 we observe that the MLP trained on datasets coming from the two different variational circuits give good results on the respective validation dataset, reaching an RMSE of 0.212 meV/atom and 21.2 meV/atom for the UCCSD and the heuristic ansatz, respectively.As expected, the UCCSD ansatz outperforms the heuristic ansatz, since the circuit optimization proceeds smoothly in that case.
It is important to stress that the energies in the validation dataset, reported in the top panels of Fig. 5 (i.e. on the x -axis), are also computed with VQE, meaning that they are also affected by the same optimization errors.As it will become clear in the following, this explains, for the most part, the deviations in the top-right panel of Fig. 5.
If we focus on the heuristic ansatz and compare the energy labels obtained by VQE with the MLP predic- The VQE error is present also in the validation dataset labels.In the bottom panels we plot both data series as a function of the exact energy instead.Bottom left VQE energy labels for the validation dataset plotted against their exact values.The positive offset shows the residual variational error of the ansatz, while the fluctuations around it are due to the optimization noise, namely the energy of some configurations is optimized better compared to others.Bottom right MLP energy predictions for the validation dataset plotted against their exact values.While the MLP (correctly) cannot improve the average variational error of the ansatz, it strongly reduces the fluctuations.The data of the bottom panels refer to the heuristic ansatz only.
tions on the same validation dataset, but using the exact energies as a benchmark, we observe that the MLP fit achieves a significant reduction of the energy variance (i.e, less scattered points).Indeed, the RMSE of the validation dataset compared to the exact energy benchmark (obtained via exact diagonalization) is 96.0 ± 63.9 meV/atom while the RMSE of the MLP on the same benchmark is 95.

C. Resilience against hardware noise
In this section, we assess the third type of errors, which is due to the uncorrected hardware errors typical of stateof-the-art noisy quantum computers.Similar to the variational noise discussed in section VI A, we do not expect the MLP to improve upon the energies calculated under the effect of hardware noise.Therefore, the focus in this section is on the effect of hardware noise on the learned MLP.
Our assessment includes both noisy simulation and real hardware experiments.Simulations are useful to investigate different levels of gate errors, also beyond the current values.Real hardware experiments are important as they include all possible sources of errors, beyond the ones considered in the simulations.

Noisy hardware simulations
We simulate the actual hardware noise using a custom Qiskit 95 noise model whose baseline parameters are derived from the calibration data of current IBM Quantum backends.The custom noisy backend consists of identical qubits and identical gates, meaning each type of gate behaves identically on all qubits (I, RZ, SX, X) and all pairs of qubits (CX).The parameters of the custom backend are listed in Tab.I.
We specifically focus on two types of hardware noise: gate error and readout error.To make the full analysis suited for hardware calculations, we will limit our investigation to a simpler model, namely the hydrogen molecule, H 2 .
a. Gate error.We model the gate error using coherence limited fidelity for individual gates, i.e. assuming that the reported gate error is due solely to thermal relaxation and dephasing effects, parameterized with the thermal relaxation time T1 and the decoherence time T2, respectively.While this simplified scenario does not entirely reflect the actual experimental conditions -where other effects (e.g., coherent control and calibration errors) and noise channels (including correlated multi-qubit noise) may be present -it is nevertheless sufficient to capture the dominant behavior of current noisy processors without complicating the analysis.The baseline value for both T1 and T2 is 100 µs (see Tab. I).However, in simulations it is also possible to assess different scenarios that would correspond to future expected technical improvements in device fabrication.To this end, we systematically increase T1 and T2 to investigate the effect of a future gate error reduction on the calculation of the energies.More specifically, we extend T1 to a maximum value of 2 ms, which is a realistic prediction for the next years according to recent hardware developments.
For this analysis, we randomly create 20 training datasets and one validation dataset for the hydrogen molecule, each with 20 configurations characterized by different bond lengths.Each molecular configuration is randomly rotated in space and features an intramolecular bond distance in the range [0.6, 4.2] Bohr.The H 2 wavefunction is encoded in the STO-3G basis set using the parity mapping and the mapping specific two-qubit reduction, which results in a Hamiltonian on 2 qubits.The VQE calculations feature a simple variational ansatz tailored to the system, that contains only one variational parameter and one CNOT gate.For each dataset, we calculate the energy of the configurations at different levels of the gate errors and train an MLP on each set of labels.As a reference, we also train an MLP on each training dataset with noiseless energies.In Fig. 6, we report the energy RMSE at different gate errors for the configurations in the validation dataset (solid blue line), the average energy RMSE of the MLP predictions (orange dots) and the average energy RMSE of the reference (noiseless) MLP predictions (dashed green line).All RMSE values are given with respect to energies obtained with noiseless VQE calculations.
We first notice that the MLP predictions closely follow the blue line (for noisy VQE), and we therefore conclude that the MLPs faithfully learn the noisy potential energy landscape.As expected, reducing the gate error leads to more accurate MLP in the absolute sense.The crossing point, at which the error due to the gate noise gets smaller than the model error of the MLP, is at about T1 ≈ 1.75 ms.Above this value, the MLP error saturates to the model error.
Some comments are in order: on a positive take, these results show that, in principle, even a finite gate noise can produce MLPs which are as accurate as the best MLP trained on noiseless data.On the other side, the hydrogen molecule is one of the simplest systems we can study, and the circuit used is very shallow.Larger molecules will require much deeper circuits, and therefore, we expect that the effect of the gate errors will increase significantly.Therefore, this assessment represents a best-case scenario concerning MLP training on quantum data in b.Readout error.The second type of error we simulate is the readout error.The baseline parameters for the readout errors are listed in table I.The readout error is best probed by emulating the measurement process.To highlight readout inaccuracies, we suppress the statistical fluctuations (see Sec. IV A) using 10 5 shots per circuit to measure the energy.
In this section, we simulate the readout error at baseline level and at a level where the error is reduced by a factor of 100 to create a training and validation dataset of the hydrogen molecule, each with 20 configurations.For demonstration purposes, we also train and evaluate an MLP on the resulting datasets.Their performance is reported in Fig. 7.In the figure, the black solid line shows the exact energies of the hydrogen molecule dissociation path, the dashed lines show the predictions of the trained MLPs, and the dots show the energies of the configurations in the validation datasets.The MLPs achieve an energy RMSE of 19.7 meV/atom and 13.5 meV/atom on the validation dataset, having the baseline and the reduced readout error, respectively.Compared to the exact energy, they show an error of 607 meV/atom and 210 meV/atom.As expected, reducing the readout error leads to more accurate energy estimations and therefore to more accurate MLPs.

Hardware experiments
Finally, we also run experiments on IBM Quantum superconducting processors, where all actual error sources are present.We run the hardware calculations for a training and validation dataset of the hydrogen molecule, each with 20 configurations.For each configuration we run 4, 5 and 10 VQE runs on the IBM Quantum devices ibmq toronto, ibmq bogota and ibmq manila, respectively.All these quantum processors feature a quantum volume of 32.The final energy label is obtained by averaging over the different experiment realizations, after excluding clear unconverged runs.Data measured on different devices contribute to separate datasets.All energy expectation values are computed with 8192 measurements (or shots).
The results are summarized in Fig. 8.The plot on the left reports the curves obtained with ibmq toronto (blue) and ibmq bogota (orange), while the plot on the right shows the results for ibmq manila (blue).In all cases, the predictions of the MLPs closely follow the data points of the training and validation dataset (cross and point markers, respectively).The difference between the curves is due to the different properties of each device, with ibmq bogota outperforming ibmq toronto.With ibmq manila, we also apply measurement error mitigation techniques; to this end, we apply the full calibration matrix method 96 on 10 additional VQE runs of each selected configuration, where the calibration matrix is refreshed every 30 minutes.To estimate the final energy of each configuration, we apply the same procedure as described above; as usual, we neglect unconverged VQE runs and average over the remaining energy measurements.The energy estimates using the MLP trained on the obtained datasets are shown in Fig. 8 in the panel on the right.We observe that the mitigated energies are much closer to the exact energies, in agreement with the error simulation of section VI C 1 b.
We stress once more that the goal of training an MLP from quantum data is not hardware error mitigation per se, but rather to obtain a smooth and re-usable interpolation of the noisy data.The computational gain compared to the straightforward molecular dynamics approach of Ref. 70 even when performed on the same single molecule, is straightforward.The standard method requires a new VQE calculation for each iteration, while the training of an MLP in this specific case only requires order O(10) single point VQE runs.Moreover, in Ref. 70 a costly Lanczos error mitigation scheme was sometimes needed, making every single-point VQE calculation O(10 2 ) times more expensive compared to the present work, where the stability of the dynamics is ensured by the smoothness of the MLP surface.Finally, this cost needs to be multiplied by the total number of time steps of the MD.To summarize, for a 100 fs simulation of the H 2 molecule, assuming a time step of 0.2 fs, the total cost of a stable quantum-powered MLP simulation is now reduced by a factor of 10 5 compared to the straightforward approach.

VII. CONCLUSIONS
We propose the usage of classical machine learning potentials (MLP) trained on quantum electronic structure data to enable large-scale materials simulations.The motivation behind this is quite simple: while quantum computing algorithms can outperform their classical counterpart for electronic structure problems, they still feature a polynomial runtime (possibly with a large prefactor) that can still prevent applications to bulk materials.
MLPs have been successfully introduced in materials simulations powered by classical approximate electronic structure solvers, enabling truly large-scale and equilibrated simulations 30,32 .Here, we assess the trainability of an MLP using quantum data obtained from a variational calculation.In particular, we study the impact of three types of noise that are characteristic to the quantum algorithm: the statistical noise, the optimization error, and the hardware errors.
These errors impact the training and validation energy labels as where ∆ is a systematic error and η is a fluctuation around this offset.While the MLP is not intended to compensate for any systematic error ∆, it can greatly mitigate the random fluctuations affecting the labels.These may arise from the statistical error in the evaluation of the energy and forces estimator, the VQE optimization error that may affect some dataset points more than others, as well as any non-systematic component of the hardware noise.
Here we use an MLP based on state-of-the-art neuralnetwork potentials, and show that there exist a threshold value for the noise strength, such that we can achieve a training as good as in the noiseless case.The resulting MLP features a smooth energy surface that would allow for stable molecular dynamics simulations or structural optimizations.
We substantiate our research through the simulations of the separate sources of noises.We finally generate training and validation datasets using actual quantum hardware and obtain the first MLP trained with electronic structure calculations using a real quantum computer.
While in our assessment we consider a neural-network type of MLP, next research directions include the possibility of using kernel-based models, which tend to have a better performance when only few training data is available 97 .
VIII. ACKNOWLEDGEMENTS I.T. acknowledges the financial support from the Swiss National Science Foundation (SNF) through the grant No. 200021-179312.IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide.Other product and service names might be trademarks of IBM or other companies.The current list of IBM trademarks is available at https://www.ibm.com/legal/copytrade.

Appendix A: Dataset normalization
Normalizing the labels in the training dataset is a common practice to improve the training of a Neural Network 90 .In the case of a machine learning potential (MLP) the normalization defines internal units which are independent of a physical unit system.Fortunately the normalization process is integrated into the training procedure of n2p2 55 .Given a dataset of atomic structures with energy (E) and force (F) the normalization transformation is parameterized by three parameters, the mean energy per atom E , a conversion factor for the energies c energy = 1/σ E and a conversion factor for distances c length = σ F /σ E , where σ E and σ F are the standard deviation of the energy and the forces, respectively.Applying the transformation to each configuration in the dataset, ensures that the transformed labels (E * , F * ) have zero mean and unit standard deviation 90 , i.e.E * = 0, σ * E = 1 and σ * F = 1 (the forces should already have zero mean).The normalization is successful as long as we input both, the energies and the forces.However, using the VQE algorithm we only calculate the energy of the atomic configurations and set all the forces to zero.Concerning the training this is no issue, as the training on only the energy labels is supported by n2p2.However, during the normalization process, the conversion factor for distances is set to c length = 0 (σ F = 0).This leads to problems in subsequent steps of the training process.Therefore, if the forces are not available for the normalization process, we manually set c length = 1.
A list of other cutoff functions can be found in Ref. 98.
The second type of symmetry functions we consider are angular symmetry functions which are a sum of threebody terms.It is defined as where θ ijk = arccos is the angle between the three atoms i, j and k.The parameters determining the shape of the function are η, λ and ζ.The parameter η, again, determines the width of the Gaussian part of the function.The parameter λ can only take the values 1 and −1 which shifts the maximum of the cosine part either to θ ijk = 0 • or θ ijk = 180 • , respectively.The parameter ζ determines the angular resolution.
Normalization Similarly to the labels also the input to the NN is normalized.This balances the impact of the different symmetry functions on the first hidden layer in the NN.
The normalization transformation is which centers the symmetry functions G i with their mean G i and rescales them to the interval [−1, 1] 90 .Forces The force component F i,k of atom i is calculated from the total energy by taking the derivative with respect to the component k of the position R i of the atom, This expression can be evaluated by applying the chain rule, where N sym,i is the number of symmetry functions for atom i and G ij the j-th symmetry function of atom i.
The first partial derivative is given by the functional form of the NN.The second partial derivative is given by the functional form of the symmetry functions and can be calculated analytically.For the two symmetry function types given above, the derivatives can be found in the Supporting Information of Ref. 98.

Appendix C: Automatic selection of symmetry functions
This section provides a short review of the method to automatically select symmetry functions proposed in Ref. 85 and adopted in this work.The algorithm is based on a feature selection method, called the CUR decomposition, which creates a low-rank approximation of the initial feature matrix X in terms of its columns and rows.The first step of the procedure is the construction of a pool of N candidate symmetry functions {Φ j }.Given a dataset of M configurations {A i }, the feature matrix is defined as X ij = Φ j (A i ).In the second step we then apply the feature selection to the columns (rows) of the feature matrix to select a small subset of N symmetry functions (M configurations) which capture the important structural information of the considered system.Below the two steps are reviewed in more detail.
The creation of candidate symmetry functions is done by generating values for the parameters that determine the shape of the symmetry functions.For the radial symmetry functions G 2 (Eq.(B1)) two parameter sets are created.In the first set, the Gaussians are centered at the reference atom (R s = 0) and have widths chosen according to where n is the number of desired parameters in this parameters set and m = 0, 1, . . .n.The second set of parameters is created in the following way The method to select the most important features of a feature matrix X has the following form where C and R are matrices that consist of a subset of columns and rows of the original feature matrix X.We execute the following steps for the selection of the subset of columns (C).
• Calculate the singular value decomposition (SVD) of X.
• Calculate an importance score for each column c, where v (j) c is the c-th coordinate of the j-th right singular vector and k is the number of singular vectors that are considered for the score.A value of k = 1 is proposed for an efficient selection.
• Pick column l with the highest importance score.
• Orthogonalize the remaining columns in the feature matrix X with respect to the l-th column X l X j ← X j − X l (X l • X j )/ |X l | 2 .(C6) • Repeat the steps above on the orthogonalized matrix until the desired number of columns is reached or the error of the approximation (Eq.(C7)) is below a desired threshold.
The extracted columns form the matrix C. Similarly, the matrix R can be constructed using the algorithm above to select a subset of columns from X T (rows from X).The matrix U then is defined as U = C + X R + ( + indicates the pseudoinverse).The accuracy of the approximation is Appendix D: Creating reference datasets For the bulk water we use a dataset that is already established in the literature 86,87,90 .The dataset for the single water molecule is derived from the bulk water dataset by extracting H 2 O atom groups.The initial training dataset was then created by randomly selecting 1000 configurations from the extracted H 2 O configurations.We could further reduce the number of configurations by applying the CUR decomposition 85 (see Sect.C) to select for a subset of the configurations.We found that 100 is a convenient dataset size.The validation dataset is created in the same way, by choosing a different (distinct) set of extracted H 2 O configurations.
We created our own reference datasets for the H 2 -H 2 cluster and the hydrogen molecule, H 2 .For the creation of a reference dataset it is recommended to use the procedure reviewed in Ref. 53.However, the considered systems are very simple, and we expect a sufficient coverage of the configuration space already from a random sampling.In both cases, the H 2 -H 2 cluster and the hydrogen molecule, we created an initial dataset with 1000 configurations.We also tried to reduce the number of configurations in the dataset with the CUR decomposition.For the H 2 -H 2 cluster the training accuracy got gradually worse when reducing the number of configurations, so we kept all the 1000 configurations in the training dataset.In the case of the hydrogen molecule we found that 20 is a convenient dataset size.For both systems the validation datasets are also created by randomly sampling from the configuration space of the respective system.
FIG. 1: Bulk water system: RMSE of the MLPs trained using noisy labels.Left Energy RMSE as function of the noise level (see text) affecting the energy labels (x-axis) and the force labels (y-axis) in the training dataset.The standard noiseless case correspond to the top-right entry.Right Same assessment but targeting the MLP force RMSE.

FIG. 3 :
FIG.3: Probability of obtaining an energy estimate with a statistical error δ smaller than E as a function of the number of measurements.Here E is 30 meV for the single water molecule model (see text), and the probability reaches p ≈ 1 when the number of shots is about 10 10 using the standard Pauli measurement technique (blue line).The upper bound as defined in Eq. (13) would exceed 10 12 (orange line).

4 PredictedFIG. 5 :
FIG.5: Top MLP prediction on the validation dataset for a single water molecule model, where the energy labels have been computed using VQE with the UCCSD ansatz (left) and the heuristic ansatz (right).The VQE error is present also in the validation dataset labels.In the bottom panels we plot both data series as a function of the exact energy instead.Bottom left VQE energy labels for the validation dataset plotted against their exact values.The positive offset shows the residual variational error of the ansatz, while the fluctuations around it are due to the optimization noise, namely the energy of some configurations is optimized better compared to others.Bottom right MLP energy predictions for the validation dataset plotted against their exact values.While the MLP (correctly) cannot improve the average variational error of the ansatz, it strongly reduces the fluctuations.The data of the bottom panels refer to the heuristic ansatz only.

FIG. 6 :
FIG. 6: Average energy RMSE at different levels of gate errors.The gate errors are characterized by the thermal relaxation time T1 and the dephasing time T2, which are set to the same value and varied simultaneously.The blue solid line shows the energy RMSE of the validation configurations, where the energies are obtained at the corresponding level of gate errors and compared to the respective noiseless energies.Each orange point is an average of 20 MLPs that were trained on different training datasets.The error bars shows one standard deviation of the energy predictions.The green dashed line serves as a reference, and shows the average energy RMSE of the MLP predictions where no gate errors were present in the energy calculations of the reference datasets.

FIG. 7 :
FIG. 7: Predicted hydrogen molecule dissociation pathat different levels of readout error assumed in the calculation of the reference energies.The dashed lines show the predictions by the MLPs, the dots show the energies of the configurations in the validation datasets and the black line shows the dissociation path obtained by exact diagonalization.The baseline readout error values used for the data in blue are listed in Tab.I.For the data in orange, the readout error is reduced by a factor of 100.

FIG. 8 :
FIG.8: Left Prediction of hydrogen molecule dissociation path by an MLP that was trained and evaluated on datasets obtained with the IBM Quantum devices ibmq toronto (blue) and ibmq bogota (orange).The energies in the training and validation dataset are a filtered average over 4 (5) VQE runs for ibmq toronto (ibmq bogota).Right Prediction of hydrogen molecule dissociation path by an MLP that was trained and evaluated on datasets obtain with the IBM Quantum device ibmq manila using no readout error mitigation (blue) and using readout error mitigation (orange).The energies in the training and validation dataset are a filtered average over 10 VQE runs.

η s,m = 1 (
R s,n−m − R s,n−m−1 ) 2 , (C3) which creates a set of Gaussians that are narrow close to the reference atom and wider as the distance increases.For the angular symmetry functions G 3 (Eq.(B3)) only one set of parameters is created.The values for η are chosen according to Eq. (C1), λ takes the values {−1, 1} and for ζ a few values on a logarithmic scale are chosen, e.g.{1, 4, 16}.
FIG. 4: H 2 -H 2 cluster: Energy RMSE as function of the noise level (see text) affecting the energy labels (x-axis)

TABLE I :
Baseline parameters for custom noise backend.The values are either taken directly from a specific qubit (frequency and anharmonicity) or inspired by an average of different IBM Quantum devices (all remaining values).the RZ gate is applied virtually and therefore the gate error and time are both 0. *