DFTSurrogates

Density Functional Theory (DFT) refers to the computational method of investigating properties of complex multi-electron systems by modeling their quantum mechanical structures. It is one of the most accurate ways of predicting materials properties and has seen wide applicability in physics, chemistry and materials sciences. It is also a very computationally expensive technique, with the simulation and training over a single crystal structure taking on the order of hours of CPU time.

DFTSurrogates provides Machine Learning surrogates that are able to predict materials properties to the same accuracy as DFT but are orders of magnitude faster to train and infer. A popular technique in this sphere is the Crystal Graph Convolutional Neural Network (CGCNN), which represents the crystal structure by a crystal graph to encode atomic information and bonding interactions between atoms. A convolutional neural network is employed on this graph and the model is trained using the data generated by a DFT.

Consult this material for more information.

Using DFTSurrogates on JuliaHub

DFTSurrogates is built with Chemellia, and is thus compatible with data structures exported by ChemistryFeaturization.jl.

DFTSurrogates is offered via JuliaSim (system image), so the user only needs to run using JuliaSim in the Julia REPL to use this functionality.

The surrogates

There are several pre-trained surrogates available. These surrogates are primarily trained on standard datasets including the Materials Project, QM9 etc., and are trained to predict specific properties. More details can be found by looking at the various surrogates. These surrogates require the input to be of the same form as the data they were trained with. Typically, this means calling the model with a FeaturizedAtoms object as described by ChemistryFeaturization.jl.

From SMILES strings

SMILES strings are used to represent a molecule in a compact form and constitute a popular encoding scheme for representing organic molecules.

It is quite trivial for DFTSurrogates to read in SMILES strings using the smilestomol function.

using DFTSurrogates, MolecularGraph

 mol = smilestomol("c1ccncc1")

We can also convert this Mol object to an AtomGraph.

julia> AtomGraph(mol)
 AtomGraph{GraphMol{SmilesAtom, SmilesBond}}  with 6 nodes, 6 edges
 	atoms: ["C", "C", "C", "N", "C", "C"]

From CIF/ XYZ Files

In order to read data from a CIF/ XYZ file, we can make use of the AtomGraph function, which can take the filename as its argument. It would look something like this:

using DFTSurrogates, ChemistryFeaturization

 AtomGraph("/path/to/file.cif")

On first use, ChemistryFeaturization will initialize the environment in the following way:

julia> ag = AtomGraph("../.julia/packages/ChemistryFeaturization/O2LBl/test/test_data/strucs/mp-224.cif")
 [ Info: Installing ase.io via the Conda ase package...
 [ Info: Running `conda config --add channels conda-forge --file /home/jrun/data/.julia/conda/3/condarc-julia.yml --force` in root environment
 [ Info: Running `conda install -y ase` in root environment
 [...]
 AtomGraph{Crystal} mp-224 with 6 nodes, 6 edges
 	atoms: ["W", "W", "S", "S", "S", "S"]

Using the surrogates

Now that we have our AtomGraph object, we can choose how we want to featurize this molecule. ChemistryFeaturization comes with a number of features that we can use - these include element features, species features, bond and pair features etc. It also defines a number of default featurizations. In this tutorial, we will use a Graph Convolutional Model, so we would go with a GraphNodeFeaturization. We simply call into the GraphNodeFeaturization function and give it a list of features we are looking to extract.

julia> fzn = GraphNodeFeaturization(["Group", "Row", "Block", "Atomic mass", "Atomic radius", "X"])
 GraphNodeFeaturization encoding 6 features:
 	ElementFeatureDescriptor{ChemistryFeaturization.Codec.OneHotOneCold} Group
 	ElementFeatureDescriptor{ChemistryFeaturization.Codec.OneHotOneCold} Row
 	ElementFeatureDescriptor{ChemistryFeaturization.Codec.OneHotOneCold} Block
 	ElementFeatureDescriptor{ChemistryFeaturization.Codec.OneHotOneCold} Atomic mass
 	ElementFeatureDescriptor{ChemistryFeaturization.Codec.OneHotOneCold} Atomic radius
 	ElementFeatureDescriptor{ChemistryFeaturization.Codec.OneHotOneCold} X

Now that we have a featurization scheme, as well as an atom graph, we are ready to featurize our molecule.

julia> fa = featurize(ag, fzn)
 FeaturizedAtoms{AtomGraph{Xtals.Crystal}, GraphNodeFeaturization} with 28 x 6 encoded features:
 	Atoms: AtomGraph{Xtals.Crystal} test/test_data/strucs/mp-224 with 6 nodes, 6 edges
 	Featurization: GraphNodeFeaturization encoding 6 features

There we have it, our FeaturizedAtoms object. Now we can load the model. DFTSurrogates uses the ModelLibraryBase protocol to load the surrogates. This is consistent with other surrogates found in JuliaSim. Loading a model is simple:

julia> using DFTSurrogates, ModelLibraryBase

 julia> model = load_model(DFTSurrogates, "agn")
 DFTSurrogates.DFTSurrogate{Flux.Chain{Tuple{AGNConv{Float32, typeof(NNlib.softplus)}, AGNConv{Float32, typeof(NNlib.softplus)}, AGNPool, Flux.Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}}(Chain(AGNConv{Float32, typeof(NNlib.softplus)}(Float32[-0.19773944 -4.07479 … -11.3405 0.28272364; 0.17661938 2.7930422 … 2.2066104 0.025649978; … ; -1.9608192 -4.0721436 … -0.48233512 -5.37896; -2.212397 -5.1564727 … 0.6275714 -7.9748564], Float32[-1.2271698 0.34747842 … 0.6574416 1.1068541; 0.27524942 1.706642 … 0.86515605 -1.5620344; … ; -0.088145226 -2.5802314 … -2.304801 -1.030459; -3.001888 -0.7218966 … 0.23606095 6.736839], Float32[-0.6987946; 1.6395504; … ; -0.23941554; -0.8436502], NNlib.softplus), AGNConv{Float32, typeof(NNlib.softplus)}(Float32[1.0775148 -1.5288225 … 1.1802717 3.6067185; -0.70993954 0.4633188 … 2.0744123 1.7245898; … ; 2.0033507 -1.6037053 … -1.1870372 6.400128; -0.29440978 0.7449011 … -1.6382565 0.28559992], Float32[3.433453 0.17542483 … 0.16797212 1.0449519; 0.097464405 0.24745826 … 1.2495623 0.8470874; … ; 0.217869 0.78088385 … 1.6145365 5.0583467; -1.0477351 -1.9897792 … 1.4026865 2.1261296], Float32[-0.4688482; -1.1074967; … ; -4.9761996; -1.4188688], NNlib.softplus), AGNPool(NNlib.meanpool, 8, 2, 3), Dense(40, 1)))

This particular surrogate is trained on the materials project dataset, and predicts the formation_energy_per_atom feature.

We can now call our model with the FeaturizedAtoms object from the previous steps.

julia> model(fa)
 1-element Vector{Float32}:
  -0.6792327

There we have it, we have successfully run an inference pass on a custom molecule using a pre-trained surrogate!

Content