NIST 23: The Largest Increases in Compound Coverage for the Tandem and NIST/EPA/NIH EI Libraries Since NIST Became Curator

by | Articles, Mass Spectrometry

O. David Sparkman (July 2023)

nist_23

In Houston, Texas, on the morning of June 5, 2023, at the American Society for Mass Spectrometry's 71st Annual Conference on Mass Spectrometry and Allied Topics, the Mass Spectrometry Data Center of the National Institute of Standards and Technology (née National Bureau of Standards), a part of the United States Department of Commerce, announced the largest increase ever in compound coverage for the Tandem and Electron Ionization (EI) Mass Spectral Libraries, NIST 23. Along with the announcement of these increased coverage libraries, NIST announced a new version (v.3.0) of the NIST Mass Spectral Search Program for use with Microsoft Windows. In addition to the introduction of NIST 23, NIST personnel presented several orals and posters

The NIST 23 release of the Tandem Library has both integer and accurate m/z data for 51,501 compounds. This is 60% more compounds than were in the 2020 release, NIST 20. This increased coverage yields 400 thousand precursor ions and 2.4 million spectra (more than double the number of spectra in the NIST 20 Tandem Library). The number of spectra measured for the Tandem Library grew by over one million.

The NIST/EPA/NIH EI Library now has 394,054 spectra for 347,100 compounds. The accompanying GC Method/Retention Index Database has 492 thousand entries for 180,620 compounds, 153 thousand of these have EI mass spectra. This is a >40 thousand increase in EI spectral coverage. In addition, there are artificial intelligence (AI) estimated retention indexes (RI) for all the compounds in the EI Library. These AI RI values, first introduced with NIST 20, have median deviations of ±13 units from measured values, whereas the previously estimated values have an average deviation of ±84 units (See Figure 1). These AI RI values may now be used in Match Factor adjustments when there are no experimental values for the compound in the NIST/EPA/NIH Library. 

fig_1_retention_index_estimation

Figure 1: Comparison of deviation from measured values between the AI method for estimation and the previous method.

In addition to the large number of spectra measured by NIST that were added to the EI Library, another significant collection was added to NIST 23. This was the Robert Adams Identification of Essential Oil Components by Gas Chromatography/Mass Spectrometry collection, previously only available through Diablo Analytical (Antioch, CA). This added another 2,000 plus spectra.

In 1988, NIST became the steward of the EI Mass Spectral Library that had been managed jointly by the US Environmental Protection Agency and the National Institutes of Health, under the direction of Dr. Stephen Stein. At that time, this library could be accessed using the rapidly expanding computer data systems that accompanied many gas chromatograph-mass spectrometers (GC-MSs) or through a time-share system using voice-grade telephone lines and modems. Access through instrument data systems was limited to submitting a measured spectrum and retrieving a matching spectrum along with its metadata from the library. Users wanted to be able to access data without having to submit a measured spectrum. They wanted to access data by entering a name, chemical formula (elemental composition), CAS number, nominal mass, etc. Dr. Stein quickly wrote a program to do this. This was the genesis of what today is known as the NIST Mass Spectral Search Program. One of the first additions to the EI Library was structures for each compound. This was necessary to begin a thorough evaluation of the mass spectral data. The first program was written for the IBM PC DOS system. As computers evolved, this software grew into an ever-improved Microsoft Windows Program.

Until NIST became the steward of what today is known as the NIST/EPA/NIH Electron Ionization Mass Spectral Library, data were accumulated from a private collection, which often used techniques for the introduction of analytes to the mass spectrometer's ion source other than gas chromatography. This was because many spectra were measured by mass spectrometry starting in the early 1940s, well before the commercial use of gas chromatography, which began in the mid-1950s. In 1992, NIST began measuring EI spectra in addition to collecting and evaluating spectra from other sources. All these measurements have been made through sample introduction by GC. Today, in addition to measuring the EI spectra, the GC method and measured retention index are recorded with the compound's metadata. Of the 394,054 spectra (this includes a primary spectrum for each compound and one or more replicates for the more encountered compounds) in the NIST 23 EI Library, 116,308 (far greater than 25%) have been measured by NIST. When a spectrum is submitted by outside contributors for inclusion in the NIST EI Library, NIST first looks to see if the compound can be acquired. If it can, it is measured. The user-submitted spectrum may be used as a replicate, depending on it passing a rigorous inspection by one or more evaluators.

When NIST began the expansion of the EI Library, the process of selecting which compounds should be measured was simple. The NIST MS Lab would take the Sigma-Aldridge catalog and order what looked like an organic compound that could be measured by gas chromatography/mass spectrometry (GC/MS). After receiving the compound, an attempt was made to measure the spectrum. Each of these measured spectra was evaluated by a computer, and at least two people trained in mass spectrometry. As pointed out by Tytus Mak in the NIST 23 Introduction Breakfast at ASMS, the question should not be "What new compounds were added," but rather "How and why were these new compounds selected." Following the NIST 14 release, a new policy was established to select compounds to be added to both the EI and Tandem Libraries. This involved looking at non-mass spectral databases for compounds of relevance. Figure 2 is a representation of some of the relevant sources. These cover areas from ethical and illicit drugs to food and food activities to pollutants to metabolites to Wikipedia. Every compound in Wikipedia has relevance because someone had to create a page for that compound. Looking at the common presence of individual compounds in multiples of all these databases has relevance. Figure 3 illustrates the lengths NIST has gone to to determine what compounds should be added by the examination of various non-mass spectral databases.

fig_2_sources_chemical_compounds

Figure 2: Examples of sources of chemical compounds studied to see which compounds appeared the most often and were therefore relevant to inclusion in an NIST Mass Spectral Library.

Once relevant compounds are identified, a determination is made, first, as to their current inclusion in an NIST Library, and second, as to their availability for purchase. Those that were not in the current libraries and can be purchased were acquired, and their mass spectra were measured by EI and product-ion LC-MS/MS analysis. The spectra are evaluated and then included in the appropriate library. In the case of EI, when appropriate, a derivative of the compound is formed and measured. Figure 3 shows the type of coverage that has been obtained for NIST 23.

fig_3_ei_tandem_examples

Figure 3: Example of sources of compounds added to the two NIST 23 Libraries.

Another important issue with respect to selection compounds for the EI Library is an emphasis on per- or poly-fluoroalkyl substances (PFASs). These have become very important in environmental chemistry. Over 9,000 compounds are on the United States Environmental Protection Agency's Master List of PFAS. When polymers and salts (not amenable to GC/MS) are removed, the list is reduced to 6,695 compounds. An examination of the NIST/EPA/NIH EI Library in NIST 20 had spectra for 942 PFASs. 201 of these spectra were considered to be of low quality or did not have retention indexes. NIST then determined that 830 PFAS compounds not in the current EI Library were commercially available. The first batch of 231 compounds has been obtained and measured. In many cases, derivatives of these compounds were measured at the same time the pure compounds were analyzed by EI mass spectrometry. This has resulted in more than 400 spectra for the 231 compounds that were ordered being added to the NIST 23 EI Library. 

In the early 2000s, NIST began the acquisition and curation of product-ion mass spectra produced by MS/MS from precursor ions formed in LC/MS of small molecules. The first edition of this, the NIST Tandem Library, appeared as part of NIST 05. This library contained both positive and negative ion spectra. This release contained 5,191 spectra of 1,832 compounds. As the Tandem Library evolved, it now contains nearly equal numbers of accurate and integer m/z-value spectra measured by tandem-in-space instruments (tandem quadrupoles and quadrupole-TOF instruments) and tandem-in-time instruments (ion trap instruments like the linear ion trap and Orbitrap instruments). Figure 4 shows the growth in compounds and spectra of the NIST Tandem Library since its inception.

fig_4_tandem_library_size_evolution

Figure 4: Evolution in size of the number of compounds and the number of spectra in the NIST Tandem Library since its inception.

Beginning with NIST 17, all spectra added to the Tandem Library have been measured by NIST. These are spectra measured of pure compounds introduced from verified sources through direct infusion. Figure 5 is an illustration of the Tandem Library's diversity.

fig_5_tandem_library_diversity

Figure 5: Illustration of the diversity exhibited by the NIST 23 Tandem Library (Subject: 1,5-Pentanediamine).

It is said that a picture is worth a thousand words. When it comes to the illustration of the process used in the development and curation of the NIST Tandem Library, this statement could not be truer. The process of producing the NIST Tandem Library is illustrated in Figure 6.

fig_6_tandem_library_curation

Figure 6: The process of curating the NIST Tandem Library.

Software Enhancements

For those existing NIST MS Search Program users who took advantage of the hot-link of the InChIKey to PubChem, this has been changed in v.3.0 to launch a Google Search. This enhancement provides far more information about the individual compounds.

NIST Mass Spectral Search Program and Other Software

As NIST continues to grow, the size and quality of these two libraries of mass spectra that have become the standards for GC/MS and LC/MS/MS, the NIST Mass Spectral Search Program has evolved into the standard for searching spectra of unknown compounds against all mass spectral libraries. The program has added utilities such as Mass Spectrometry (MS) Interpreter (used to correlate spectra of all types with structures to illustrate their relevance to one another) and AMDIS (used for the generation of pure EI spectra and associated retention indices produced by GC/MS used in the identification of unknowns). 

Specifying Data Type

Features have been added to MS Search that are specific to searching either the EI or Tandem Library. Depending on which libraries are being searched, features not specific to that type of data could cause confusion, especially among users who are new to the technique. Version 3.0 of MS Search has implemented a utility that will hide features of the type of search (EI or Tandem) not being used. This greatly increases the usability of each of the NIST libraries. For those users desiring both types of data, switching back and forth is now very easy.

Changes in Searching

One major change in MS Search v.3.0 is the area of the Spectrum Search Option. Previous versions allow for a Reverse Search, which displayed the Hit List of a submitted spectrum in decreasing order of the Reverse Match Factor (a Match Factor disregarding any sample spectrum peaks that are not in the library spectrum match). This proves very useful when the sample spectrum represents more than a single compound. Another search type has been added to v.3.0. This is the Partial Spectrum Search. The search method is selected from a dropdown list box containing the following three options: 1) Full Spectrum Search (sorts the Hit List by decreasing Match Factor), Impurity Tolerant Search (the new name for the Reverse Search of previous versions), and the Partial Spectrum Search (displays the Match Factor calculated by disregarding any peak in the library spectrum that is not in the sample spectrum). This type of search proves to be very valuable in cases of low concentrations of coeluting compounds. A sample containing metabolites was analyzed by GC/MS and the resulting data file was processed using AMDIS. A spectrum, identified as that of desmosterol, TMS derivative, was submitted for a Normal EI Search using the NIST 23 NISTEPANIH EI Library and selecting the Spectrum Search Option of Partial Spectrum Search. Because the overall signal was very weak, a number of the low-intensity peaks in the spectrum were not saved by AMDIS. The Full Spectrum Search MF was 565, as was the RMF. However, the PSS MF was 935. This value for the PSS MF, along with a matching retention index sent over by AMDIS with the sample spectrum, provided good confidence for an identification.

fig_7_partial_spectrum_search

Figure 7: Results of a Partial Spectrum Search

NEW Incremental Name Search

Another significant enhancement is a major change in the Incremental Names Search (the Names tab). The left hand vertical window which displays compound names as text or structures has been divided. The upper portion still displays names that are entered in the text-entry box at the top of the view; the lower part is now the Spectrum window. The number of characters allowed in the text-entry box has been expanded from sixteen to 249 in v.3.0 of MS Search.

The spectrum highlighted in the Spectrum window is displayed in the Plot and Text Information windows on the right, as seen in Figures 8 and 9. When the NIST/EPA/NIH EI Library is being used, the spectra listed in the Spectrum window are mainlib and all the replib spectra, followed by stereoisomers and derivatives of that compound. Each spectrum is clearly designated. When the Tandem Library (both the High-Resolution and Low-Resolution Libraries simultaneously) is selected, the Spectrum window contains spectra for the collision energies used, instrument type, and fragmentation type.

fig_8_names_tab_view_ei

Figure 8: The Names tab view, displaying the Names and Spectrum windows for the EI data.

fig_9_names_tab_view_tandem

Figure 9: The Names tab view, displaying the Names and Spectrum windows for the tandem data.

Inclusion in Other (Non-Mass Spectral) Databases

From the beginning development of MS Search, it was decided, where possible, to provide information on the inclusion of compounds in other non-mass-spectral databases to indicate the compound's significance. For many years, these databases were limited to nine. Users quickly became aware that when two hits with high Match Factors were obtained, the one found in multiple other non-mass spectral databases was more likely to be the unknown than the one not found in any of these databases. As the new method of selecting compounds for inclusion was developed, it was decided that all the databases searched for inclusion candidates would become a part of the metadata for entries (present and future) in the NIST libraries. This did eliminate the ability to constrain a search based on a specific database inclusion; however, it did provide a much clearer picture of the compound's significance. There are seven classes of Other Databases in v.3.0 of MS Search: Contaminants (C), Drugs (D), Environmental (E), Food (F), General (G), Metabolite (M), and Wikipedia (W). There are 59 generic titles in total. As seen on the left side of Figure 10, the generic tiles are listed in the Text Information windows of the various displays below the specific categories. When the display of Other Databases is selected in the Properties dialog box of a Hit List, the total number of Other Databases where the compound is found precedes the letters for the seven categories (right side of Figure 9).

fig_10_text_information_window

Figure 10: Partial view of Text Information window (left) for the display of the EI mass spectrum of cocaine in the NIST/NIH/EPA/ EI Library, NIST 23. The right side is a Library Search Hit List window showing the display of Other Databases.

The NIST Hybrid Search

With the release of NIST 17, the Hybrid Search for EI and product-ion mass spectra was introduced. This search is designed to aid in the identification of unknowns when there are no spectra for the unknown in any of the searched libraries. The algorithm looks at the neutral losses exhibited by the spectrum of the unknown. These are then compared to spectra in the searched libraries. Models are then selected, and hybrid spectra are created. The difference in the neutral losses of the model compounds is compared with those of the unknown, and that delta mass is reported. From the delta masses, the changes to the structure of the model to produce the proposed hybrid structure can be elucidated. This structure is then associated with the mass spectrum of the unknown, and the structure and spectrum are evaluated with MS Interpreter to see the compatibility of the two. The one disadvantage of the Hybrid Search is its necessity for the precursor ion of the unknown. With a product-ion spectrum, this is always known. However, only somewhat more than 80% of the compounds in the NIST/EPA/NIH EI Library exhibit an unambiguous molecular ion peak. The nominal mass of the unknown is a requirement of the Hybrid Search of an EI spectrum. This value can be obtained either through a soft ionization technique like chemical or field ionization (CI or FI) or through derivatization. In some cases, an educated guess as to the unknown's nominal mass may be possible.

Since the introduction of the NIST Hybrid Search, there have been publications illustrating its utility in the identification of the substance from a mass spectrum (EI or product-ion) when there is no corresponding spectrum in the searched library. Figure 11 shows a partial list of some of the peer review publications.

fig_11_nist_publications

Figure 11: A sampling of the publications about the NIST from inside and outside of NIST.

One last (but not least) new feature of NIST 23 is its all-electronic manual. It contains links throughout, including links to outside sources—such as a list of common noted delta mass values—for use with the Hybrid Search and the complete list of other (non-mass spectral) databases.

Support

Today, the NIST MS Search Program and its various utilities and the NIST/EPA/NIH EI and Tandem Libraries have produced a comprehensive yet complex tool for the use of mass spectrometry in qualitative analyses. This can be overwhelming, particularly for new users. James Little (an Eastman Chemical Co. retired mass spectrometer scientist and contract spectrum evaluator to NIST) has created a series of YouTube videos, along with handouts, for various aspects of using the NIST mass spectrometry tools.

Both NIST Libraries associated with the NIST MS Search Program are tools used in the qualitative analysis of compounds using mass spectra data. The introduction of the Hybrid Search extends the value of the library. The Hybrid Search is becoming better with the growing size of the NIST Libraries due to the increased numbers of models. If you are using NIST 14 or an earlier version, an upgrade is recommended. This not only significantly increases the number of compounds for both the NIST/EPA/NIH EI (by almost 45%) and Tandem (by a factor of 5) Libraries, but also provides the full advantages of the Hybrid Search. If you have retention indices for your GC/MS data, an immediate upgrade is highly recommended because each identification can now be validated by retention index matching and scoring. Upgrading to the Full version (NIST/EPA/NIH EI and Tandem Libraries) will allow you to install the EI Library and NIST Software on one computer and the Tandem Library and NIST software on another computer using the same license. When upgrading the NIST EI Library, a separate license is required for each computer; the previous version cannot be installed on another computer.

Those using NIST 17 or NIST 20 of either Library should consider an upgrade because of the increased coverage. There is an upgrade path for the NIST/EPA/NIH EI Library or the Full NIST 23 package. The NIST Tandem Library has been priced very low; therefore, there is no upgrade price for it.

New licenses and upgrades are available only through NIST-authorized distributors. Prices are set by the distributor and can vary widely. The NIST website includes an alphabetical list of these distributors.

The NIST Mass Spectrometry Data Center operates under the auspices of the Standard Reference Data Act of 1968, passed into law by the 90th United States Congress on July 11, 1968. This law states:

The Congress hereby finds and declares that reliable standardized scientific and technical reference data are of vital importance to the progress of the Nation's science and technology. It is therefore the policy of the Congress to make critically evaluated reference data readily available to scientists, engineers, and the general public. It is the purpose of this Act to strengthen and enhance this policy.

The Act also states: 

Standard reference data conforming to standards established by the Secretary may be made available and sold by the Secretary or by a person or agency designated by him. To the extent practicable and appropriate, the prices established for such data may reflect the cost of collection, compilation, evaluation, and publication, and dissemination of the data, including administrative expenses; and the amounts received shall be subject to the Act of March 3, 1901, as amended.

This means that all costs of measuring and evaluating the mass spectra are covered by the fees the authorized distributors are charged for each license to the NIST data. Also, in accordance with this Act, the data are covered by the United States Copyright assigned to the Department of Commerce.

ODS_Photo_croppedO. David Sparkman is Director of the Mass Spectrometry Facility in the Chemistry Department at the University of the Pacific in Stockton, CA, USA, a consultant in mass spectrometry, and contractor to NIST. He is also coauthor (with J. Throck Watson) of Instruction to Mass Spectrometry, 4th Ed.

Home 9 Content Type 9 Articles 9 NIST 23: The Largest Increases in Compound Coverage for the Tandem and NIST/EPA/NIH EI Libraries Since NIST Became Curator