explainable machine learning

ground truth dataset

Explanations of machine learning can have limited accuracy and reliability. Yet, improvements in explanation approaches are limited by small datasets. We developed a larger dataset with 5 million ground truth explanations for molecular graphs. Chemical validity is guaranteed by using chemical operations, and one explanation value per atom or bond is ensured with decision trees. The resulting pairs of molecules differ only by a change for one atom or bond.

Explanation values differ for individual elements and ring sizes, but we don’t observe a dependency on chemical property values.

Similar explanation values for neighboring atoms were observed with topological autocovariance. In contrast to typical synthetic explanation datasets, this dataset can include long-range information.

This dataset is available on figshare. Crucially; this explanation extraction method can be applied to any existing dataset with molecules and properties.

First