W2020: A Database of Validated Rovibrational Experimental Transitions and Empirical Energy Levels of H216O

A detailed understanding of the complex rotation–vibration spectrum of the water molecule is vital for many areas of scientific and human activity, and thus, it is well studied in a number of spectral regions. To enhance our perception of the spectrum of the parent water isotopologue, H2O, a dataset of 270 745 non-redundant measured transitions is assembled, analyzed, and validated, yielding 19204 rovibrational energy levels with statistically reliable uncertainties. The present study extends considerably an analysis of the rovibrational spectrum of H2O, published in 2013, by employing an improvedmethodology, considering about one-thirdmore new observations (often with greatly decreased uncertainties), and using a highly accuratefirst-principles energy list for validationpurposes. The database of experimental rovibrational transitions and empirical energy levels ofH2O createdduring this study is calledW2020. Someof the new transitions inW2020allow the improved treatment ofmanyparts of the dataset, especially considering the uncertainties of the experimental line positions and the empirical energy values. TheW2020 dataset is examined to assess wheremeasurements are still lacking even for thismost thoroughly studied isotopologue ofwater, and toprovidedefinitive energies for the lower and upper statesofmany yet-to-be-measured transitions. TheW2020dataset allows the evaluation of several previous compilationsof spectroscopic data of water and the accuracy of previous effective Hamiltonian fits. ©2020Author(s). All article content, exceptwhere otherwisenoted, is licensedunderaCreativeCommonsAttribution (CCBY) license (http://creativecommons.org/ licenses/by/4.0/). https://doi.org/10.1063/5.0008253


Introduction
Several hundred papers exist, spanning almost 100 years of studies, which report laboratory measurements of gas-phase rovibrational transitions within the ground electronic state of water isotopologues. Many of the relevant data have been cited and evaluated in Refs. 1-5, detailing the work of an International Union of Pure and Applied Chemistry (IUPAC) Task Group (TG) on "A Database of Water Transitions from Experiment and Theory" (Project No. 2004-035-1-100). This TG carefully analyzed all the measured and assigned transitions available to them and recommended a large number of validated experimental transitions and the corresponding empirical energy levels for nine major isotopologues of water. 5 The study on the main water isotopologue, H 2 16 O, 3 was published in 2013. Since then, a lot of new developments have been made concerning water spectroscopy,  which impelled us to revisit the task of validating and analyzing an enlarged set of experimental transitions and empirical energy levels of H 2 16 O. Furthermore, despite the extensive list of sources considered in Ref. 3, hereafter called Part III (of the IUPAC TG's efforts), a number of spectroscopic studies prior to 2013 have come to our attention, which were not dealt with explicitly during the Part III study.  Note that the TG's work was built on a previous comprehensive compilation of empirical energy levels of H 2 16 O. 84 Empirical rovibrational energy values of water isotopologues, which can be obtained from the experimental line positions, for example, via the MARVEL (Measured Active Rotational-Vibrational Energy Levels) methodology, [85][86][87] employed, in fact, during this study, have a number of important applications. The energy levels have been used to supplement, and often replace, the observed transition wavenumbers in spectroscopic databases designed for applications to room-temperature [88][89][90][91] and hot 31,92 planetary atmospheres. The availability of accurate energies allows the computation of highly accurate ideal-gas partition functions and related thermodynamic quantities, [93][94][95] provide input for spectral assignments, 27,32,44 facilitate the testing and construction of revised potential energy surfaces (PESs), [96][97][98][99] and expedite the validation of theoretical models, in general. 100 There are also more specific applications, such as the search for ortho-para transitions for the H 2 16 O molecule 101 and a reliable prediction for transition frequencies of various water isotopologues. 92 Finally, we mention that limited updates to the Part III data were published as part of our ongoing attempts to improve certain aspects of the MARVEL algorithm. 86,87,102 This study, dedicated to the high-resolution spectroscopy of the parent water isotopologue, H 2 16 O, provides the first major extension of the Part III 3 results. Based on the list of experimental H 2 16 O lines available from the literature, here we present new recommendations for old energy levels, as well as a number of new rovibrational states, by assembling a significantly updated line-by-line spectroscopic database for H 2 16 O. Due to the large number of technical developments during the MARVEL analysis and the much enlarged experimental dataset employed, it is suggested that the database collated during this study, called W2020, should replace the list reported earlier by the IUPAC TG (Part III) and should be used exclusively by practitioners of water spectroscopy and those who need rovibrational transitions and energy levels of H 2 16 O. In particular, it is expected that the new and improved data of this study will enter the next edition of the canonical spectroscopic database, HITRAN. 88

Spectroscopic networks
To obtain the best possible estimates for the rovibrational energy values of H 2 16 O, all of the observed high-resolution rovibrational lines taken from the literature were processed simultaneously, yielding a spectroscopic network (SN). 85,103,104 SNs consist of nodes (energy levels) linked by edges (measured or computed transitions), which are oriented from their lower energy levels to their upper ones (independently of whether they were recorded by absorption, emission, or action spectroscopy). Since often there are repeated observations for the same transition, these multiple measurements are represented with multiple edges in the SN. The lines sharing the same upper and lower states form coincidence classes.
The energy levels of SNs often form distinct components, that is, sets not connected by spectral lines. If a component of the SN contains the lowest-energy state of a nuclear-spin isomer of the molecule examined, this component is called a principal component (PC); otherwise, it is a floating component (FC). Excluded energy levels, 87 forming isolated nodes (whose transitions have all been deleted), are all special FCs of the SN. For H 2 16 O, no transitions linking energy levels of its two (ortho and para) nuclear-spin isomers have been observed; 101 thus, the energy separation of the ortho and para PCs is not known experimentally, though it can be estimated with considerable accuracy. 44 As dictated by their edge distributions, 104

extMARVEL
During the present study, the extended MARVEL 85-87 scheme (extMARVEL) 87 and the implementation of its automated part (intMARVEL) were utilized to determine accurate empirical energy values for H 2 16 O with well-defined uncertainties from the collated set of observed and assigned rovibrational transitions with specified uncertainties. As shown in Ref. 87, the use of the intMARVEL code helps to retain the high accuracy of the best measurements for the derived energy levels. Since this feature was not available during the Part III compilation and analysis, 3 the W2020 dataset constructed with the extMARVEL protocol exhibits much improved energy values (with more dependable uncertainties), especially for the low-lying rovibrational states of H 2 16 O. In the interest of space, only the four most relevant characteristics of the extMARVEL procedure are outlined here; for all the other terms and technical aspects, the reader is referred to Ref. 87.
First, in contrast to the standard MARVEL algorithm, the extMARVEL protocol is based on the use of segments (sets of transitions coming from the same data source with approximately the same experimental uncertainties). For each segment, an estimated segment uncertainty (ESU) is given; its refined value, provided by an analysis of the cycles 106 in the SN, serves as an initial uncertainty value for the lines of this segment. Although line-by-line input uncertainties are not required, originally published uncertainties, if accessible, are still utilized by the intMARVEL code. Second, inversion and the weighted least-squares refinement of the line uncertainties are built upon consecutive addition of transition blocks of decreasing accuracy, leading to highly accurate empirical energy values. The extMARVELpredicted wavenumbers determined in a particular block are not permitted to be changed by the inclusion of less accurate observations. Third, automatic recalibration of the ill-calibrated segments is performed. As demonstrated earlier, 3 MARVEL is well suited to complete this important task. Fourth, synchronization 87 of the combination difference relations is executed to reduce the uncertainties of the empirical energies.
In Ref. 107, the original six-grade classification scheme 87 of the extMARVEL approach was simplified. Here, we make again a slight change to MARVEL's classification scheme: energy levels are now associated with four so-called resistance labels-R+, strongly resistant; R−, weakly resistant; S, semi-resistant; and U, unresistant. Resistance labels reflect the empirical energy level's reliability, which is an extra, qualitative information beyond that offered by uncertainties, and their resistance toward changes forced upon them by some (at present, less precise) spectral lines. Energy values with the resistance label R+ are deemed to be fully reliable, and the availability of new transitions is not expected to influence them outside of their present uncertainty limits. The other transitions may vary when further, even more accurate transitions will be added to the transition database. Although it is likely that even the non-R+ rovibrational states (particularly those with R− classification) have reliable energy values, they are not fully certified by the extMARVEL analysis. We emphasize again that the uncertainties of the extMARVEL empirical rovibrational energy levels and their resistance labels provide complementary information about the reliability of the levels.

XML-based management of the transition database
For the traditional MARVEL code, 85,86 the input transitions have to be provided in the form of a plain text file. If each column of the text file consists of the same type of data, this input representation is a reasonable choice for storing the spectral lines processed by MARVEL. However, when different kinds of information (e.g., measured line positions with various units, comments on the transition entries, or some characteristics of the data sources and their segments 87 having approximately the same experimental accuracy) are collected in the same column, a simple text file becomes unsuitable for handling the complex data structure. To support the annotation and the transparent administration of the collated MARVEL transitions, we decided to replace the original input format with a file structure following the rules of the eXtensible Markup Language (XML). An XML-based input helps to (a) maintain provenance of the data; (b) have a timestamp on the individual transitions, reflecting their occasional relabeling; and (c) give a description of the "original" literature information. The XML-based extMARVEL protocol is called xMARVEL. This novel feature of the present study is reflected in the related entries of the supplementary material.
The general structure of MARVEL XML is defined in Fig. 1. Briefly, the root element of MARVEL XML is named MARVEL, which contains the source elements. A source element comprises information about a particular data source [tag name and reference-the latter corresponds to the digital object identifier (DOI) for journal articles]. Each source element must include at least one segment element, storing data of the related data-source segment. The children of the segment element are the input transitions collected in the line elements. A line element should consist of the following entries: line position (in element pos) and its uncertainty (in element unc), labels of the upper and lower states (in element us and ls, respectively), and a unique tag for the identification of the underlying transition entry. A much more detailed discussion of the MARVEL XML format will be given in an upcoming paper. 108

Construction of the W2020 Database
As mentioned in the Introduction, the Part III database 3 of transitions and energy levels served as the starting point of the present xMARVEL study of the rovibrational spectrum of H 2 16 O. Extension and improvement of the Part III dataset includes five major, to some extent interrelated, tasks: (a) based on a careful search of the literature, construct the most complete catalog of published experimental lines; (b) set the best possible (often unreported) initial uncertainties for the observed line positions and segment 87 the data sources appropriately to help the uncertainty refinement process; (c) certify the existence of the empirical energy levels by a comparison with their first-principles counterparts (most important here is the PoKaZaTeL 31 list); (d) relying on the xMARVEL energy values, the related assignments, and reliable first-principles results, 31,109 expand the transition database with certain unmeasured, unreported, or even artificial transitions, inspired by well-founded spectroscopic information, and derive further energy levels; and (e) create the best possible, self-consistent labels for the rovibrational states.
As to task (a), the initial, Part III-based dataset was (i) reduced since two of the 98 sources were removed (81Kyro 110 contains only redundant lines, while the transitions of 05ZoShPoTe 111 proved to be not accurate enough to combine them with high-resolution data of other sources), (ii) extended with data sources published after the Part III article, and (iii) enlarged with earlier sources not treated explicitly in Part III. It should also be noted that sources where some lost transitions or lines of unidentified origin were recognized were reentered afresh into the W2020 database. As a result, 78 new data sources were included in the W2020 database, leading to 174 experimental sources in total.
Concerning task (b), the initial line uncertainties, indispensable for the successful execution of the xMARVEL method, were selected with great care. This is an especially important step as these values determine in which transition block the individual lines should be refined. Thus, all the data sources were divided into segments, and these segments were supplied with estimated segment uncertainties. The ESU values serve as educated guesses for the initial uncertainties of the line positions, except for those transitions, mainly microwave and terahertz, characterized by originally reported individual uncertainties. For the accuracy of particular segments, we tried to adopt the smallest reasonable ESU values to facilitate the generation of correct uncertainty estimates for the derived empirical energies. Accordingly, some segments ended up with higher accuracy than those given in Part III. For example, the expected uncertainties of the segments 06MaToNaMo, 112 85BrTo, 113 05Toth, 114 and 99Toth 115 were modified as follows: 3 3 10 −4 → 3 3 10 −6 , 1 3 10 −3 → 1 3 10 −4 , 1 3 10 −3 → 2 3 10 −4 , and 1 3 10 −3 → 3 3 10 −4 (all values in cm −1 ), respectively. In other cases, slightly larger uncertainties had to be assigned to certain segments for which a systematic increase occurred in the unsigned least-squares residuals (differences between the observed and calculated wavenumbers). An example is 08ZoShOvPo, 116 for which a change (in cm −1 ) of 5 3 10 −3 → 2 3 10 −2 was made to improve the overall consistency of the transition database. Clearly, many of the ESU values are optimized for the W2020 dataset; thus, some of them might need to be modified when future measurements become available.
After the generation of the rovibrational empirical energies, an automated check [task (c)] was performed to certify that they have unique counterparts in the PoKaZaTeL list (based just on J and parity information; see Table 1 of Ref. 3), within a tolerance of 10 −4 3 E, where E is the energy of the xMARVEL energy level examined and J is the rotational quantum number. For E ≤ 30 000 cm −1 , those empirical rovibrational states that could not be matched within this tolerance were excluded from the database, together with the corresponding transitions. Due to these exclusions and some missing links, 148 energy levels became part of FCs. Above 30 000 cm −1 , all the energy levels were left intact because, in this region, the PoKaZaTeL PES needs further refinement to ensure that it can be properly utilized for this validation task.
During the execution of task (d), several explicitly unmeasured (and unmeasurable) transitions were placed into the W2020 database. First, the disconnected ortho and para states were linked with a forbidden line, (0 0 0)1 0,1 ← (0 0 0)0 0,0 , for which an extremely accurate, empirical wavenumber value, 44 23.794 361 22(25) cm −1 , was adopted. The rovibrational lines of H 2 16 O are designated hereafter using the standard normal-mode-rigid-rotor notation where ′ and ″ refer to the upper and lower state of the given transition, respectively, while the (v 1 v 2 v 3 )J Ka,Kc descriptors represent a rovibrational state. Here, v 1 , v 2 , and v 3 are standard 117 vibrational quantum numbers for the symmetric stretch, bend, and asymmetric stretch modes of H 2 16 O, respectively, while J Ka,Kc are rigid-rotor asymmetric-top quantum numbers. 118 Second, nearby ortho and para states were searched for the construction of additional lines, in order to maximize the number of empirically known energy levels. These so-called virtual transitions, collected in source "20virt," were obtained from the PoKaZaTeL energy list, together with their wavenumbers. Third, similar to Part III, close-lying ortho-para transition doublets were also utilized. If the PoKaZaTeL energy-level set revealed that the (so-called) complementary (usually para) transition (separated by not more than 5 3 10 −3 cm −1 based on first-principles information) is not reported in the data source of an experimental (mainly ortho) line with a σ wavenumber, then this complementary transition was added to the source "20compl" with the same σ wavenumber.
After the derivation of reliable empirical energy values, conflicts arising from the use of different labeling schemes were resolved, and a self-consistent set of normal-mode and rigid-rotor quantum numbers was provided for the energy levels [task (e)]. Note that in the localmode notation, the standard normal-mode v 1 , v 2 , and v 3 vibrational quantum numbers are replaced by (nm) ± v 2 , where n and m are the number of quanta of OH stretch, while v 2 retains its meaning. The ± sign also serves as a symmetry label, and it is usual not to indicate + for the n m cases. For sources 08GrMaZoSh 119 and 09GrBoRiMa, 120 where the vibrational states were originally represented with localmode quantum numbers, one-to-one conversion (see Table I.1 of Ref. 121) to normal-mode notation was applied. In the case of 09GrBoRiMa, where the v 2 values are missing for some of the transitions, v 2 0 was assumed. 122 Unfortunately, there are no theoretical techniques that can yield unambiguous labels for high-lying vibrational bands (say, above 11 000 cm −1 , the barrier to linearity of water [123][124][125] ). What makes the situation even more complicated is that it is not unusual that the same authors change their labeling scheme between consecutive publications. For example, the rovibrational state at 10 492.2 cm −1 was labeled as (1 2 0)14 8,7 in 11MiKaWaCa, 76 while in 17MoMiKaBe, 25 this designation was changed to (2 0 0)14 8,7 . In the end, we decided to accept, whenever possible, the rovibrational labels of the latest source, hoping that in this way we can minimize labeling conflicts with future measurements. As an additional check, the empirical energies were plotted as an approximately quadratic function of K a [at fixed v 1 , v 2 , v 3 , J, and (−1) v3+Ka+Kc ], and these quadratic trends were slightly smoothed, where feasible, by relabeling some rough outlier states. Note that the original labels are included in the MARVEL XML file of the supplementary material. Finally, we have to point out that there are states where we know that the label attached is incorrect, but we were unable to come up with a feasible label. For example, for the states (3 1 0)11 8,4 and (3 0 1)11 8,3 , part of the lines 18TaMiWaLi.900 and 18TaMiWaLi.901, 36 which can be well matched with the PoKaZaTeL data by J and parity, we could not determine the correct labels. While noting the problem in the XML file, we decided not to exclude these "existing" levels from the W2020 dataset. Table 1 provides a summary of all the data-source segments used during this work, along with some relevant statistical parameters [ESU values, median segment uncertainties (MSUs), and largest segment uncertainties (LSUs)]. 87 This table updates Table 2 of Part III 3 for all the common entries. The data sources applied are partitioned into 200 segments, of which 7 had to be recalibrated (see column 7 of Table 1). For all the segments, the ESUs and MSUs are very similar, implying that the initial uncertainties did not change considerably during the xMARVEL refinement. The final transition database includes 270 745 non-redundant lines from 289 070 experimentally assigned transitions among which 267 289 could be validated during this study.
The number of new transitions, with respect to the Part III dataset, cannot be determined in a reliable way due to two significant changes between the Part III and W2020 data. First, a large number of lines were relabeled at the time of compiling the extended dataset of this study. Second, during the revision of the Part III transitions, we often returned to the original assigned lines of the experimental data sources. This change also implies that originally unassigned transitions, labeled during the operation of the IUPAC TG, were removed from the related sources. Nevertheless, to avoid losing energy levels, those lines assigned in Part III and specifying unique rovibrational states were later reinstated in the database as an extra source, "20extra."

Updated xMARVEL energy levels
The W2020 dataset includes 19 427 energy levels of which 19 204 relate to the PCs of the underlying SN, while the others are distributed among 117 FCs. Within these FCs, the number of isolated nodes is 91.     Since the publication of the Part III database in 2013, several tens of thousands of new lines were detected, remeasured, or reassigned. It is notable that they determine only about 800 new energy levels. Thus, only a small fraction of the measured transitions leads to new rovibrational states, even though the energies of the great majority of the bound rovibrational states  of H 2 16 O are still not known empirically (H 2 16 O has close to one million bound rovibrational states, 31 of which fewer than 20 000 are determined by experiments). In this sense, the most useful new sources are 18RuFoScJo, 32 14CoMaPi, 11 and 15CaMiLoKa, 16 with 86, 79, and 63 new rovibrational states, respectively. Table 2 shows the (empirical) vibrational band origins (VBOs) derived from the W2020 transition database. The uncertainties of the VBOs of W2020 are significantly better than those determined in Part III. The vibrational fundamentals of H 2 16 O are now known with remarkable accuracy (≈10 −5 cm −1 ). In fact, all the VBOs with P 1, 2, and 3, where P 2v 1 + v 2 + 2v 3 is the polyad number, have a similarly high precision. The complete list of the 240 vibrational bands along with the polyad numbers and the number of xMARVEL energy levels associated with the specific bands is given in the supplementary material.

Validation of the W2020 database
As an independent validation of the transition wavenumbers, the derived xMARVEL energies, and the labels collated into the W2020 database, systematic and mostly automated comparisons were made with the results of variational nuclear-motion computations 31,109 and previously reported energies, mostly corresponding to effective Hamiltonian (EH) fits. 11,179,227,228 These comparisons were executed in order to exclude those transitions from the W2020 database that would lead to energy levels with large deviations from well-established first-principles or EH values.

Comparison with first-principles energy levels
As described for task (c) of Sec. 3, the xMARVEL energy levels were matched with their first-principles counterparts listed in both the recent PoKaZaTeL 31 and the older BT2 109 energy lists. An important property of the W2020 rovibrational energy-level set is that it is complete up to 9724 cm −1 . This increases the completeness limit from the Part III value of about 7500 cm −1 . 93 The unsigned (energy) deviations (UD) for these datasets are plotted in Fig. 2. Although the median values of the UDs are 0.085 cm −1 and 0.110 cm −1 for PoKaZaTeL and BT2, respectively, the xMARVEL-PoKaZaTeL energy differences become significantly smaller than their xMARVEL-BT2 counterparts above 15 000 cm −1 . Partitioning the 0-30 000 cm −1 range, where energy levels are present from both the PoKaZaTeL and the BT2 energy lists, into six intervals of equal widths of 5000 cm −1 , the following median unsigned deviations are obtained for BT2/PoKaZaTeL: 0.07/0.03 cm −1 , 0.09/0.06 cm −1 , 0.11/0.11 cm −1 , 0.13/0.09 cm −1 , 0.23/0.05 cm −1 , and 0.50/0.08 cm −1 .
These comparisons verify that each W2020 state has a variational counterpart with a deviation not larger than 1.5 cm −1 up to 30 000 cm −1 . For the PoKaZaTeL dataset, the highest UD is 1.32 cm −1 , but almost all UDs are well below 1 cm −1 in the 0-30 000 cm −1 range (see Fig. 2). The larger xMARVEL-PoKaZaTeL deviations (∼10 cm −1 ) above 30 000 cm −1 indicate that further empirical adjustment of the spectroscopic PES of H 2 16 O is required to secure that all first-principles PoKaZaTeL energies are below the venerable 0.03 cm −1 limit, characteristic of the 0-5000 cm −1 range. It is remarkable how well the PoKaZaTeL energies reproduce nearly all of their W2020 counterparts with deviations less than the level spacings, allowing unambiguous matching for all the xMARVEL states.

Comparison with 01LaCoCa-04CoPiVeLa
Following the first-principles validation, the empirical Part III and W2020 energy values were compared with those obtained in 01LaCoCa 227 and 04CoPiVeLa. 179 To find the appropriate pairs, we relied exclusively on the J values and the parities of the rovibrational states (that is, the normal-mode vibrational and rigid-rotor rotational labels were not utilized directly in the matching procedure).
The UDs for the (Part III, 01LaCoCa-04CoPiVeLa) and (W2020, 01LaCoCa-04CoPiVeLa) pairs are shown in Fig. 3 as a function of energy. The median and the largest UDs are 6.72 3 10 −4 /1.16 3 10 −3 and 9.78 3 10 −2 /1.85 3 10 −1 cm −1 for the W2020/Part III energy levels, respectively, implying that the W2020 dataset represents a significant improvement over the Part III database. These results also implicitly confirm the considerable accuracy of the energy levels of 01LaCoCa 227 and 04CoPiVeLa. 179

Comparison with Part III energy levels
Since the 01LaCoCa-04CoPiVeLa dataset does not cover the whole energy range represented by the W2020 database, it is important to evaluate the differences between the W2020 energy values and their Part III 3 analogs. The list of those 213 Part III levels that could not be matched within 0.1 cm −1 can be found in the supplementary material. Some of the 213 states are not present in the W2020 database because they caused large deviations from the PoKaZaTeL energies and therefore were excluded from W2020. In most other cases, another subset of transitions, in conflict with those chosen in Part III, were selected based on the differences, down to a few times 0.01 cm −1 , from the PoKaZaTeL energies.
The pattern of the UDs, calculated for the matched (Part III, W2020) twins, is shown in Fig. 4. The overwhelming majority of the UDs (14 370) are smaller than 0.005 cm −1 , demonstrating that the Part III and the W2020 data agree adequately. Nonetheless, the number of levels with 0.005 cm −1 < UD ≤ 0.05 cm −1 and 0.05 cm −1 < UD ≤ 0.1 cm −1 (3778 and 124, respectively) indicates that the improvement over the Part III data is significant. Figure 5 displays the unsigned deviations of the 14CoMaPi 11 energies from their W2020 counterparts. While the overall agreement is remarkably good, there are a few energy levels between 9000 and 13 000 cm −1 where the UDs are larger than 1.0 cm −1 . Since the variationally computed energies, from both the PoKaZaTeL and the BT2 lists, confirm the W2020 energies to a few 0.01 cm −1 , there is no reason to exclude the corresponding transitions from the xMARVEL analysis. In other words, the effective-Hamiltonian-based predictions appear to be slightly incorrect for these outliers, typically having K a > 27. It is likely that these deviant effective-Hamiltonian-based estimates are due to the lack of transitions characterized with high J and K a values used during the refinement of the spectroscopic parameters in the bending rotational Hamiltonian of 14CoMaPi.

Comparison with JPL data
On the JPL (Jet Propulsion Laboratory) website 228 a list of energy values is maintained, complemented with a timestamp of "Oct 31 21:54:56 2005," which were deduced from a fit with an Euler-type Hamiltonian. The parameterization of this model is quite similar to that introduced in 05PiPeMi. 229 Figure 6 exhibits the unsigned deviations of the JPL data 228 from their W2020 analogs up to J 22. Although there are extrapolated energy values for J 23 in the JPL set, we decided to ignore them during this analysis due to their considerably increased deviations. As can be seen from Fig. 6, most of the JPL energies reproduce well, better than 0.01 cm −1 , their xMARVEL counterparts. Similar to 14CoMaPi, 11 the W2020 energies corresponding to the outlier points of this figure were carefully checked and found to be dependable within the stated uncertainties, suggesting that there are issues for these energy values within the JPL list.

HITRAN 2016 issues
Employing the W2020 database of H 2 16 O transitions, one can check the reliability of the data of the canonical spectroscopic database HITRAN 2016. 88 The latest HITRAN version 88  In this case, the HITRAN 2016 lines should be reassigned and given the W2020 label of the upper state, assuming that the labels of the lower states agree. It is also possible that a HITRAN 2016 line comes from a theoretical source or that its W2020 counterpart is considerably more accurate. In this case, the next edition of HITRAN should use the W2020 line position. We found altogether 5539 problematic lines in HITRAN 2016. In 878 cases, the labels should be reassigned, while for the rest of the problematic cases, the line positions should be changed to the W2020 positions. It is important to emphasize that the largest intensity in these problematic cases is 5 3 10 −24 cm molecule −1 , which means that all strong lines in HITRAN 2016 are correct, though the W2020 line positions may have considerably higher accuracy.

Highly accurate pure rotational lines
A common feature of most present-day line-by-line spectroscopic databases, including HITRAN 2016, 88 is that they are based almost exclusively on lines obtained from Doppler-broadened spectra. The usual Doppler linewidth is about 1-5 GHz, strongly limiting the accuracy of the line centers of the high-resolution spectra measured. Modern laser spectroscopy measurements, relying on frequency combs, cavity enhancement, saturation, and other recent experimental advances, allow Doppler-free measurement of certain lines. The accuracy of these measurements approaches the 1 kHz region, though, in principle, it can be even better.
As shown in Table 3, containing H 2 16 O rotational lines protected by certain bodies of the National Academies of Sciences, Engineering,   and Medicine, 230 the W2020 database contains several sources reporting transitions with an accuracy of 10 kHz or better. It is expected that more and more such highly accurate transitions will become available in the near future for water isotopologues and beyond. Recent studies reporting highly accurate data for H 2 16 O are 20ToFuSiCs, 44 18KaStCaDa, 29 and 18ChHuTaSu 26 with uncertainties of about 2 3 10 −7 , 4 3 10 −7 , and 8 3 10 −7 cm −1 , respectively. While 18KaStCaDa 29 and 18ChHuTaSu 26 recorded only a few ultraprecise lines, 20ToFuSiCs 44 published 156 extremely accurate transitions in the near infrared, leading (by design) to benchmark energy levels within the vibrational ground state and those corresponding to the P 4 and P 5 polyad. Consequently, the uncertainties of most of the (0 0 0) rotational levels of H 2 16 O with J ≤ 8 are lowered in W2020 by about two orders of magnitude compared to those of Part III.
As to the data in Table 3, we note that there exists a regularly updated extended catalog of frequency allocations and spectrum protection for scientific uses. 230 This catalog is important for astronomical applications and includes a number of H 2 16 O lines, with the approximate rest frequencies and assignments recalled in Table 3. For all these protected H 2 16 O frequencies, several independent, highly accurate experimental determinations are available (see the last column of Table 3); all are included in the W2020 database. However, due to the efforts of Ref. 44, the protected transition frequencies can often be obtained more accurately from xMARVEL-based predictions than from direct, in fact, less precise, THz observations (see the fourth column of Table 3). Overall, the W2020 frequencies listed in the third column of Table 3 are the most accurate estimates of the rest frequencies of certain rotational lines of H 2 16 O under protection, important for a number of scientific and engineering applications.

Summary and Conclusions
After the latest improvements associated with the MARVEL protocol, 87,106 in particular, the introduction of the extMARVEL scheme 87 and the XML-based data management developed during the present study, as well as after learning about the large number of new line, respectively (the latter two values with the uncertainties of their last few digits). A factor of 100, compared to the uncertainties of the xMARVEL-predicted frequencies, is used as a cutoff for reporting W2020 entries in this table. The f(expt.) values of multiple measurements follow the order of their increased uncertainties. Where the xMARVEL predictions have smaller uncertainties than the most accurate literature observations, the corresponding values are highlighted in boldface. All transitions belong to the ground vibrational state.
(ultra)high-precision spectroscopic studies on H 2 16 O, we started a project aimed at the revision of the H 2 16 O database published by an IUPAC TG in 2013. 3 The XML-based line-by-line data treatment devised during this study helps to improve how we store, share, and transport spectroscopic data. Our XML-based input helps to (a) maintain the provenance of the data, (b) have a timestamp on the individual MARVEL entries, and (c) maintain a description of the "original" literature source and its data. It is planned that in the future, all MARVEL-based studies will utilize the considerable advantages of the XML format. To distinguish the new protocol from the old one, we use the name xMARVEL for the improved technique and code. Utilizing xMARVEL, revision and significant extension of the IUPAC Part III database 3  Second, we "optimized" the uncertainties of the observed lines. The aim of what we call optimization has been to decrease the uncertainties as much as possible within the experimental limitations, including occasional recalibration. These optimized uncertainties are used to classify the recorded transitions of a given source into segments, as required by the extMARVEL protocol. Note that extMARVEL was designed to retain the accuracy of the most precise experiments during the MARVEL analysis and transfer that to the energy values. This means that a large number of the empirical energy levels of this study have uncertainties characteristic of the high-precision measurements considered during the construction of the W2020 database. This also implies that some of these transitions surpass the accuracy of those maintained by the National Academies of Sciences, Engineering, and Medicine 230 and the International Astronomical Union, updating our previous related compilation. 87 Third, we attempted to create the best set of self-consistent labels for the rovibrational energy levels. Occasionally, this meant modifications of some of the published labels. While recognizing that only the J quantum number and the parity of the state are unique descriptors, it is still felt that most of the labels of this study provide at least an approximate picture about the relevant rovibrational motions characterizing the quantum states of H 2 16 O. One must comment on the fact that, although the W2020 database contains significantly more entries than the IUPAC database, the number of energy levels these transitions determine is about the same in the two databases. This observation is in itself an important warning that high-resolution experiments should be designed much more efficiently if the goal is to extend our knowledge about the rovibrational states of H 2 16 O. The number of empirical energy levels is still less than 20 000, which should be compared to the about one million bound rovibrational states of H 2 16 O indicated by the best first-principles computations. 31,93 Construction of the W2020 database of the rovibrational transitions of H 2 16 O involved their in-depth validation. During this process, we compared the W2020 energies, obtained via the xMARVEL protocol, with first-principles (PoKaZaTeL 31 and BT2 109 ) energy lists in order to check whether all the W2020 states can be matched with their theoretical counterparts within appropriate symmetry and J values. As a result of this comparison and due to some missing links, 229 energy levels had to be removed from the PCs. We also compared the W2020 energies with the previously published rovibrational datasets of H 2 16 O (01LaCoCa, 227 04CoPiVeLa, 179 14CoMaPi, 11 and JPL 228 ), augmented with a thorough matching of the W2020 and Part III datasets. These tests show that the W2020 dataset not only better represents the original literature sources, but its empirical energy levels are more accurate and reliable than their Part III counterparts.
Finally, note that the much improved accuracy of the J ≤ 8 rotational lines present in W2020, resulting mostly from the precision-spectroscopy measurements of Ref. 44, means that the upper energy levels deduced in several sources 8,26,173,231 reporting highly accurate transitions would change significantly. This secondary effect of the W2020 dataset must be taken into account in future studies.

Supplementary Material
See the supplementary material for listings of transitions and energy levels characterizing the W2020 dataset.