Graph Kernels for Chemical Informatics

Institute Homepage

Institute Homepage Sign In

Empirical Inference Article 2005

Empirical Inference

Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depthfirst search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5% on the Mutag dataset, 65-67% on the PTC (Predictive Toxicology Challenge) dataset, and 72% on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.

Author(s):	Ralaivola, L. and Swamidass, JS. and Saigo, H. and Baldi, P.
Journal:	Neural Networks
Volume:	18
Number (issue):	8
Pages:	1093-1110
Year:	2005
Day:	0

Bibtex Type:	Article (article)

DOI:	10.1016/j.neunet.2005.07.009

Digital:	0
Electronic Archiving:	grant_archive
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

Links:	PDF

BibTex

@article{4601,
  title = {Graph Kernels for Chemical Informatics},
  journal = {Neural Networks},
  abstract = {Increased availability of large repositories of chemical compounds is creating new
  challenges and opportunities for the application of machine learning methods to
  problems in computational chemistry and chemical informatics. Because chemical
  compounds are often represented by the graph of their covalent bonds, machine
  learning methods in this domain must be capable of processing graphical structures
  with variable size. Here we first briefly review the literature on graph kernels and
  then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea
  of molecular fingerprints and counting labeled paths of depth up to d using depthfirst
  search from each possible vertex. The kernels are applied to three classification
  problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly
  available data sets. The kernels achieve performances at least comparable, and most
  often superior, to those previously reported in the literature reaching accuracies of
  91.5% on the Mutag dataset, 65-67% on the PTC (Predictive Toxicology Challenge)
  dataset, and 72% on the NCI (National Cancer Institute) dataset. Properties and
  tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D
  representations of molecules, are briefly discussed.},
  volume = {18},
  number = {8},
  pages = {1093-1110},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  year = {2005},
  slug = {4601},
  author = {Ralaivola, L. and Swamidass, JS. and Saigo, H. and Baldi, P.}
}