Skip to main content

MS Solutions #13: NIST 11: What’s New and What Value Does it Offer? Part 3

The previous two installments provided information on the NIST/EPA/NIH Mass Spectral Database, past and present, and on the use of the NIST MS Search Program in identifying a compound from its mass spectrum using both the NIST EI Database and the NIST Database of spectra obtained using MS/MS techniques. In Part 3, the use of the searches in the Other Search tab view is examined. These searches can be beneficial to the identity of compounds from mass spectra obtained by ionization techniques other than EI and from data that provides a higher accuracy of the measure m/z value than is usually available for EI data.

The Other Searc h tab view, shown in Figure 1, has a dropdown list-box on the upper left side of the display just to the right of the binoculars icon. Selecting the dropdown button for this list-box, results in a list of the various types of search which can be executed from this view. The first step in any of the various searches is to display a search dialog box. This is done by selecting a search type from the dropdown list-box, or if the desired search type appears in the display, clicking on the binoculars icon; because you changed button to icon in another place, we say that was good and for consistency we changed it here. 3rd line this paragraph. Any of the up to 127 databases in the MS Search Program folder (\NIST11\MSSearch\, by default) can be searched by Nominal Mass , Exact Mass (monoisotopic mass), Formula (elemental composition), Any Peaks (types, m/z values, and intensity range), a combination of these (the Sequential Search ), Chemical Abstracts Services registry number (CASrn), NIST number (a unique number assigned to each spectrum in the NIST EI Mass Spectral Archive), or position in a selected database. Just like the spectral matching search performed in the Lib Search tab view, many of these searches can be constrained; CASr , ID, and NIST # Searches cannot be constrained. The dialog boxes for each of the constraints are shown in Figure 2. The dialog boxes for each search are similar to the constraint dialog boxes except in some cases, Nominal Mass , Formula , and Sequential Searches , the search dialog box has two or three tabs. The first tab on the left at the top of the dialog box is often labeled as the Options tab. This tab is labeled as the Libraries tab and is in the second position rather than the first in the Exact Mass and Any Peaks Searches . The primary purpose of these Options/Libraries tab views is to specify the databases to be searched and to specify the search order. The Sequential and Any Peaks Searches limit the number of Hits that can be displayed. This number is settable (Peak tab in the Any Peaks Search and on the Libraries tab in the Sequential Search ). The dialog boxes for the CAS # , NIST # , and ID Number Searches do not have tabs. The single database which can be searched is selected for each of these searches in the only view available for these three dialog boxes; only one database at a time can be searched.

   When any search is performed, the Hit List display is constrained, the search is not constrained. If a maximum number of Hits is specified, then only the first hits in the Hit List specified by that maximum number will be displayed in the Hit List, which is subsequently constrained, and only the Hits in that remaining list will be displayed. If only 10 Hits in the first 500 Hits for a Sequential search meet the constrained criteria, only those 10 Hits will be displayed. Although there may be 200 or 300 possible Hits in the actual unconstrained Hit List that meet the constrained criteria, those belonging to the group that are in the list after the first 500 will be ignored. This can lead to some enormous results, because unlike the spectral match searches, these Hits are not ranked. This is why it is so important to be aware of exactly what the Program is doing when performing searches in the Other Searches tab view.

After adding, say, 2 peaks in an Any Peaks Search and the Program shows that there are 2,445 spectra that meet both criteria, and the Max Num of Hits is set at 100, all 2,445 Hit will be displayed unless Use Constraints is selected. If Use Constraints is selected, only the first 100 Hits will be displayed; if actual constraints have been selected, then only those members of that first 100 Hits that meet the constraint criteria will be displayed. The maximum number of Hits allowed in any Hit List is 6,000. If the number of Hits exceeds this 6,000 value for any search, an error message will be displayed. The maximum that can be entered into Max Num of Hits field is 1,000. Attempting to enter a larger value in this field results in the display of an error message.

   As pointed out in Parts 1 and 2 of this series, one of the more interesting features of v.2.0g of the NIST MS Search Program is the ability to use accurate mass value measured with mass spectrometers such as the Orbitrap (Trademark of Thermo Fisher Scientific) Add supperscript TM and reference to ThermoFisher San Jose, CA USA at the end of the paper. and the time-of-flight and tandem-n-in-space quadrupole-TOF instruments that are growing in popularity. Accurate monoisotopic masses measured with such instruments are compared against the exact monoisotopic masses of the compounds in any database. Any database can be indexed to the exact masses of its compounds using the Tools/(Re)Index Exact Mass selection from the Main Menu bar. This means that no matter what database is searched, the search can be constrained to the exact mass with precision limits. Figure 3 shows the dialog boxes for the Exact Mass constraint of the Any Peaks Search and the Exact Mass Search . One unique feature of Exact Mass Search, or constraint, in MS Search is the ability to use a formula in addition to numeric entries for mass and m/z values. The Program allows the precision to be specified in millimass units (mmu) or in parts-per-million (ppm). It is also possible to specify whether the mass or m/z value entered represents a specific formula gain or loss. This is only a consideration if a chemical formula (Na, Cl, CH3CO, etc.) is entered in the text entry box next to the Gain/Loss label in this dialog box. If an m/z value is to be searched, then the number of charges on the ion can be specified and whether or not the mass of the electron should be considered as a part of the search.

   Normally, the selection in the Find box is Monoisotopic precursor mass. If Among XX most abundant isotopes is selected, then masses of up to 16 most abundant isotopes of the compound are included in the search (the actual number is selectable). For some compounds, the exact (monoisotopic) mass may not be among masses of the isotopes. If the Search value is a chemical formula, then no matter what was selected in the Find box, the exact (monoisotopic) mass corresponding to the formula will be searched. The values of the found isotopic masses are not displayed. They may be calculated using the MS Interpreter .

   The ability to use the constraints of Elements Present and the number of atoms of various elements (Element Value), means that the Exact Mass search in MS Search can be used just like the elemental formula calculators associated with the software of many commercial instruments and products like Mass Spec Tools ( The one difference is that all the returned hits are only those compounds that are in the searched database(s) and multiple compounds for the same elemental composition can be suggested.

   It should be noted that just like the Hit List in the Lib. Search tab view, the Hit List in the Other Search tab view can be displayed as structure as well as a list of compound names. However, when the structures-view is used, the number of synonyms and presence in other database information is not visible; nor is it possible to sort the Hit List. In the text-view, the Hit list is storable by Name, Syn., or DBs.

   In recent years there have been a number of publications relating to the use of accurate mass vales in the identification of known unknowns *. Jim Little is believed to be the first to characterize analytes as known knowns (target analytes), known unknowns (compounds that are known to exist, but it is unknown whether they exist in an individual sample), unknown unknowns (compounds that have never been identified that are found to be in sample). In all likelihood, even metabolites of new drugs fall into the category of known unknown even when the metabolite has never been identified because it is a derivative of the drug, which has a known chemical composition and structure. When attempting to identify the cause of an off-odor, a miscoloration, a potential cause of death to a chemical consumption, the reason for an increase in the total organ carbon level of a water sample, etc., usually the identified compound is something that is available commercially or at least has been encountered before. Liao, Draper, and Perera stated something similar when they describe proprietary software for doing identifications of environmental samples that pose a potential health hazard [1].

   The NIST MS Search Program has features to facilitate the identification of known unknowns . As seen in Figure 1, the Hit List contains columns with the number of synonyms for the hit and the number of the other databases where the analyte can be found. The compounds in the NIST/EPA/NIH Mass Spectral Database are indexed as to the other database where they can be found. This feature is unique to the NIST EI Mass Spectral Database . These other databases are non-mass spectral databases. A list is found Table 1.

   The Hit List can be sorted by the decreasing number of synonyms or number of other databases containing the Hit. Other databases besides the NIST EI Database have synonyms associated with their entries. For example, the Wiley Registry of Mass Spectral Data has synonyms for many of its entries; however, only the NIST EI Mass Spectral Database contains information as to other databases where the compound can be found. This ability to sort by the number of synonyms or the number of other databases can be very useful in selecting candidates for known unknowns . The larger the number of other databases containing the compound, the more known the Hit. The same is true for the number of synonyms, especially when some of the synonyms are trade names or company-proprietary code names such as the synonyms Ciba 2059 and Herbicide C-2059 for N,N-dimethyl-N’-(trifluoromethyl)phenyl]-urea. Not only are the synonyms and/or presence in other database an aid in identifying unknowns from one of the Other Searches using data from non-EI techniques, but it is also useful in selecting between Hits of similarly high Match Factors and Reverse Match Factors resulting from a spectral matching search of an EI spectrum against the NIST EI Database .

   Another example of how the sorting of a Hit List by the number of synonyms a compound has or the number of other databases it is in is seen when a Formula search for C8H16O is performed against the mainlib of the NIST 11 Database. This results in 137 Hits. Examining each of these Hits would be tedious and time consuming. When sorted by the number of synonyms, only the first four Hits have ten or more synonyms. Only the first 13 Hits have five or more synonyms. When sorted by the number-of-other-databases, the first seven Hits are in seven or more other databases; the first 18 are in five or more other databases and all of the top Hits from the number-of-synonyms sort are in this list.

   It is obvious that the searches available through the Other Search tab view are valuable in the identification of compounds using mass spectral data other than that obtained by EI. Even when the EI mass spectrum of an analyte is not in the NIST Database, these searches can be used to find the mass spectra similar compounds that can be used in reaching a good guess .

   The next installment in this series will be about the Incremental Name Search, replicate spectra, and the NIST GC Methods, and Retention Index Database.

* To the best of my knowledge the first reference to the use of the term known unknown in the chemical literature was by Jim Little (Eastman Chemical Company) in Little, J. L.; Cleven, C. D.; Brown, S. D.: Identification of “known unknowns” utilizing accurate mass data and Chemical Abstracts Service databases. J. Am. Soc. Mass Spectrom. 22, 348-359 (2011).  In this article he cited a quote by Donald Rumsfeld, then Secretary of Defense with regards to weapons of mass destruction in Iraq [Department of Defense News Briefing, Feb 12, 2002].

[1] Wenta Liao, William M. Draper,* and S. Kusum Perera. Identification of Unknowns in Atmospheric Pressure Ionization Mass Spectrometry Using a Mass to Structure Search Engine, Anal. Chem. 2008, 80, 7765–7777.

O. David Sparkman is currently an Adjunct Professor of Chemistry at the University of the Pacific in Stockton, California; Contractor to the National Institute of Standards and Technology Mass Spectrometry Data Center; President of; and a former American Chemical Society Instructor and American Society for Mass Spectrometry Member-at-large for Education. At the University of the Pacific he teaches courses in mass spectrometry and analytical chemistry and manages the mass spectrometry facility. Over the past 28 years, he has developed and taught five different ACS courses in mass spectrometry. He is the author of Mass Spectrometry Desk Reference. 1st and 2nd editions; Introduction to Mass Spectrometry, 4th ed. with J. Throck Watson and Gas Chromatography Mass Spectrometry: A Practical Guide, 2nd ed. with Zelda Penton. He also provides general consulting services in mass spectrometry for a number of instrument manufacturers, manufacturing companies, and government agencies.