skip to main content
10.1145/3152494.3152522acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

Priority based functional group identification of organic molecules using machine learning

Published: 11 January 2018 Publication History

Abstract

Functional groups in organic compounds determine the properties of the compounds/molecules. When multiple functional groups are present, the dominant functional group determines majority of the properties of the compound. Hence priority based identification of functional groups is an important problem in chemistry. Fourier-transform Infrared spectroscopy (FTIR) is a commonly used spectroscopic method for identifying the presence or absence of functional groups within a compound, and the current approach for this task mainly relies on visual inspection and analysis of the FTIR spectral data. However, such visual identification process by humans is error prone, especially when patterns in the FTIR spectrum overlap, resulting in loss of uniqueness of features which help in identification of different functional groups in the unknown sample. Therefore, the main goal of this paper is to develop a machine-learning based classification system which can perform priority based functional group identification of organic molecules. To the best of our knowledge, this is the first effort to address this problem using machine learning (ML), and a unique aspect of our study is the incorporation of domain specific information into the process of classification by employing a set of priority rules generated from expert knowledge. We have carried out extensive study on real IR spectral data, first using a rule based approach and then using ML in an effort to improve the classification accuracy. Our analysis indicates that the basic rule based method is reasonably effective in predicting the presence (or absence) of functional groups. However, such approach is practically not accurate enough for the more challenging problem of priority based identification, and ML based classification offers much higher identification accuracies in this case. The primary reason is that ML algorithm can adaptively exploit data patterns to classify the functional group unlike the rule-based approach which uses a fixed set of rules for the said purpose. Finally, we have also carried out extensive statistical analysis of the results by using confidence intervals and permutation tests, in an effort to gain more descriptive information about the learning process, and not simply treat it as a black box.

References

[1]
Judit Ambro. 1991. Classifying organic compounds using expert system and neural networks. Theses, Dissertations, Professional Papers 5104 (1991).
[2]
M. J. Baker et al. 2015. Using Fourier transform IR spectroscopy to analyze biological materials. Nature Protocols 9 (2015), 1771--1791.
[3]
Chris W. Brown and Su-Chin Lo. 1998. Chemical Information Based on Neural Network Processing of Near-IR Spectra. Anal. Chem 70, 14 (1998), 2983--2990.
[4]
John Coates. 2000. Interpretation of Infrared Spectra, A Practical Approach. Encyclopedia of Analytical Chemistry, John Wiley Sons Ltd (2000), 10815âĂŞ10837.
[5]
Richard O. Duda, Peter E. Hart, and David G. Stork. 2000. Pattern Classification (2Nd Edition). Wiley-Interscience.
[6]
H. Favre and W. Powell. 2013. Nomenclature of Organic Chemistry: IUPAC Recommendations and Preferred Names (1st ed.). Royal Society of Chemistry.
[7]
R. J. Fessenden and L. Gyorgyi. 1991. Identifying functional groups in IR spectra using an artificial neural network. J. Chem 2 (1991), 1755--1762.
[8]
S. Kinugasa, K. Tanabe, and T. Tamura. 2017. National Institute of Advanced Industrial Science and Technology, Japan. http://sdbs.db.aist.go.jp. (2017). {Online; accessed May-July, 2017}.
[9]
C. Klawun and C. L. Wilkins. 1996. Joint neural network interpretation of infrared and mass spectra. J. Chem 36 (1996), 249--257.
[10]
C. Klawun and C. L. Wilkins. 1996. Optimization of Functional Group Prediction from Infrared Spectra Using Neural Networks. J. Chem 36 (1996), 69--81.
[11]
Peter Larkin. 2011. Infrared and Raman Spectroscopy (1st ed.). Elsevier.
[12]
M. Meyer, K. Meyer, and H. Hobert. 1993. Neural networks for interpretation of infrared spectra using extremely reduced spectral data. Anal. Chim. Acta 282 (1993), 407--415.
[13]
M. Meyer and T. Weigelt. 1992. Interpretation of infrared spectra by artificial neural networks. Anal. Chim. Acta 265 (1992), 183--190.
[14]
M. Minsky and S. Papert. 1969. Perceptrons, MIT Press, Cambridge, MA. (1969).
[15]
M. E. Munk, M. S. Madison, and E. W. Robb. 1991. Neural network models for infrared spectrum interpretation. Mikrochim. Acta {Wien} II (1991), 505--514.
[16]
Markus Ojala and Gemma C. Garriga. 2010. Permutation Tests for Studying Classifier Performance. J. Mach. Learn. Res. 11 (Aug. 2010), 1833--1863. http://dl.acm.org/citation.cfm?id=1756006.1859913
[17]
Saul Patai. 1964-1995. Patai's Chemistry of Functional Groups. Wiley.
[18]
P. N. Penchev, G. N. Andreev, and K. Varmuza. 1999. Automatic classiÂőcation of infrared spectra using a set of improved expert-based features. Anal. Chim. Acta 388 (1999), 145--159.
[19]
G. C. Bassler R. M. Silverstein and T. C. Morrill. 1991. Spectrometric Identification of Organic Compounds (5th ed.). John Wiley.
[20]
D. Ricard, C. Cachet, and D. Cabrol-Bass. 1993. Neural network approach to structure feature recognition from infrared spectra. J. Chem 33 (1993), 202--210.
[21]
E. W. Robb and M. E. A Munk. 1990. Neural Network Approach to Infrared Spectrum Interpretation. Mikrochim. Acta {Wien}I (1990), 131--155.
[22]
George Socrates. 2004. Infrared and Raman Characteristic Group Frequencies: Tables and Charts (3rd ed.). John Wiley and Sons.
[23]
K. Tanabe et al. 2001. Identification of Chemical Structures from Infrared Spectra by Using Neural Networks. Appl. Spectrosc 55 (2001), 1394--1403.
[24]
V. Tchistiakov, C. Ruckebusch, L. Duponchel, J. P. Huvenne, and P. Legrand. 2000. Neural network modelling for very small spectral data sets: reduction of the spectra and hierarchical approach. Chemometrics and Intelligent Laboratory Systems 54 (2000), 93--106.
[25]
T. Visser and H. Luinge. 1994. Recognition of visual characteristics of infrared spectra by artificial neural networks and partial least squares regression. J.; Van der Maas, J. H 296 (1994), 141--154.
[26]
Jr. Leroy G. Wade. 2007. Organic Chemistry (6th ed.). Pearson Education.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
January 2018
379 pages
ISBN:9781450363419
DOI:10.1145/3152494
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Fourier transform infrared (FTIR) spectroscopy
  2. chemical bonds
  3. functional group priority
  4. functional groups
  5. machine learning (ML)
  6. organic compounds
  7. pattern identification

Qualifiers

  • Research-article

Conference

CoDS-COMAD '18

Acceptance Rates

CODS-COMAD '18 Paper Acceptance Rate 50 of 150 submissions, 33%;
Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media