First example

This very simple tutorial constructs an ACE1 model for Si by fitting to an empirical potential.

Make sure to first read the installation notes. Now start by importing the required packages:

using ACEpotentials
import Random
using LinearAlgebra: norm, Diagonal

Step 1: specify the ACE Model

The parameters have the following meaning:

  • elements: list of chemical species, symbols
  • order : correlation order
  • totaldegree: maximum total polynomial degree used for the basis
  • rcut : cutoff radius (optional, defaults are provided)
model = acemodel(elements = [:Si,],
                 order = 3,
                 totaldegree = 10,
                 rcut = 5.0)
@show length(model.basis);
length(model.basis) = 110

Step 2: Generate a training set

Next we need to generate some training data to estimate the model parameters. Normally one would generate a training set using DFT data, stored as an .xyz file. Here, we create a random training set for simplicity. Please note that this is generally not a good strategy to generate data!

  • gen_dat() generates a single training configuration wrapped in an ACEpotentials.AtomsData structure. Each d::AtomsData contains the structure d.atoms, and energy value and a force vector to train against.
  • train is then a collection of such random training configurations.
data_keys = (energy_key = "energy", force_key = "forces")

function gen_dat()
   sw = StillingerWeber()
   at = rattle!(bulk(:Si, cubic=true) * rand(2:3), 0.3)
   set_data!(at, data_keys.energy_key, energy(sw,at))
   set_data!(at, data_keys.force_key, forces(sw,at))
   return at
end

Random.seed!(0)
train = [gen_dat() for _=1:20];

Step 3: Estimate Parameters

We specify a solver and then let ACEfit.jl to do all the work for us. More fine-grained control is possible; see the ACEfit.jl documentation. For sake of illustration we use a Bayesian Ridge Regression solver. This will automatically determine the regularisation for us.

solver = ACEfit.BLR()
acefit!(model, train; solver=solver, data_keys...);
┌─────────┬──────────┬───────┬────┬──────┬─────┐
│    Type  #Configs  #Envs  #E    #F   #V │
├─────────┼──────────┼───────┼────┼──────┼─────┤
│ default │       20 │  2344 │ 20 │ 7032 │   0 │
├─────────┼──────────┼───────┼────┼──────┼─────┤
│   total │       20 │  2344 │ 20 │ 7032 │   0 │
│ missing │        0 │     0 │  0 │    0 │ 120 │
└─────────┴──────────┴───────┴────┴──────┴─────┘
[ Info: Assembling linear problem.
[ Info:   - Creating feature matrix with size (7052, 110).
[ Info:   - Beginning assembly with processor count:  1.

Progress:  10%|████▏                                    |  ETA: 0:00:08
Progress:  25%|██████████▎                              |  ETA: 0:00:06
Progress:  40%|████████████████▍                        |  ETA: 0:00:05
Progress:  55%|██████████████████████▌                  |  ETA: 0:00:03
Progress:  70%|████████████████████████████▊            |  ETA: 0:00:02
Progress:  85%|██████████████████████████████████▉      |  ETA: 0:00:01
Progress: 100%|█████████████████████████████████████████| Time: 0:00:07
[ Info:   - Assembly completed.
[ Info: Assembling full weight vector.
Iter     Function value   Gradient norm
     0     6.727465e+03     6.769388e+03
 * time: 4.601478576660156e-5
     1    -1.653584e+03     2.072568e+04
 * time: 0.016025781631469727
     2    -4.158362e+03     2.522873e+04
 * time: 0.019546985626220703
     3    -5.498239e+03     1.598170e+04
 * time: 0.023854970932006836
     4    -5.550825e+03     2.287761e+04
 * time: 0.025662899017333984
     5    -5.676271e+03     3.198566e+03
 * time: 0.026741981506347656
     6    -5.705141e+03     6.108063e+03
 * time: 0.0288238525390625
     7    -6.157720e+03     1.698372e+04
 * time: 0.031889915466308594
     8    -6.268922e+03     4.218925e+04
 * time: 0.0346980094909668
     9    -1.078187e+04     4.957756e+04
 * time: 0.03861880302429199
    10    -1.122144e+04     1.255557e+04
 * time: 0.0636439323425293
    11    -1.123452e+04     1.712059e+03
 * time: 0.06508493423461914
    12    -1.123475e+04     2.110054e+02
 * time: 0.06639885902404785
    13    -1.125247e+04     1.433444e+04
 * time: 0.07122278213500977
    14    -1.172092e+04     5.522867e+04
 * time: 0.07428598403930664
    15    -1.198674e+04     3.606533e+04
 * time: 0.07614898681640625
    16    -1.219040e+04     6.010459e+03
 * time: 0.0780940055847168
    17    -1.227600e+04     2.395251e+04
 * time: 0.07994389533996582
    18    -1.251747e+04     3.355479e+04
 * time: 0.08245587348937988
    19    -1.259945e+04     7.580663e+03
 * time: 0.0842738151550293
    20    -1.260342e+04     7.578673e+02
 * time: 0.08553481101989746
    21    -1.260798e+04     1.470364e+03
 * time: 0.08791494369506836
    22    -1.260823e+04     3.313226e+02
 * time: 0.08987092971801758
    23    -1.260823e+04     3.276097e+01
 * time: 0.09171199798583984
    24    -1.260825e+04     1.153499e-01
 * time: 0.09300684928894043
    25    -1.260824e+04     1.764959e-01
 * time: 0.09428286552429199
    26    -1.260824e+04     4.251593e-02
 * time: 0.09607696533203125
    27    -1.260824e+04     2.742839e-02
 * time: 0.09732890129089355
    28    -1.260825e+04     1.320557e-01
 * time: 0.09858298301696777
    29    -1.260824e+04     1.308243e-01
 * time: 0.10042786598205566
    30    -1.260824e+04     3.088911e-02
 * time: 0.10223889350891113
    31    -1.260824e+04     2.574200e-02
 * time: 0.10344791412353516
    32    -1.260825e+04     3.205160e-02
 * time: 0.1046597957611084
    33    -1.260825e+04     6.959670e-02
 * time: 0.10594892501831055
    34    -1.260825e+04     1.800780e-01
 * time: 0.1077878475189209
    35    -1.260824e+04     1.037120e-01
 * time: 0.11023998260498047
    36    -1.260824e+04     2.642961e-01
 * time: 0.11483478546142578
    37    -1.260824e+04     2.253813e+01
 * time: 0.11605691909790039
    38    -1.260824e+04     9.479669e-02
 * time: 0.1172788143157959
    39    -1.260824e+04     2.069538e-03
 * time: 0.11850285530090332
 * Status: success

 * Candidate solution
    Final objective value:     -1.260824e+04

 * Found with
    Algorithm:     L-BFGS

 * Convergence measures
    |x - x'|               = 5.51e-09 ≤ 1.0e-08
    |x - x'|/|x'|          = 5.33e-11 ≰ 0.0e+00
    |f(x) - f(x')|         = 1.13e-03 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 8.99e-08 ≰ 0.0e+00
    |g(x)|                 = 2.07e-03 ≰ 0.0e+00

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    39
    f(x) calls:    198
    ∇f(x) calls:   198

To see the training errors we can use

@info("Training Errors")
ACEpotentials.linear_errors(train, model; data_keys...);
[ Info: Training Errors
[ Info: RMSE Table
┌─────────┬─────────┬──────────┬─────────┐
│    Type  E [meV]  F [eV/A]  V [meV] │
├─────────┼─────────┼──────────┼─────────┤
│ default │   0.396 │    0.037 │   0.000 │
├─────────┼─────────┼──────────┼─────────┤
│     set │   0.396 │    0.037 │   0.000 │
└─────────┴─────────┴──────────┴─────────┘
[ Info: MAE Table
┌─────────┬─────────┬──────────┬─────────┐
│    Type  E [meV]  F [eV/A]  V [meV] │
├─────────┼─────────┼──────────┼─────────┤
│ default │   0.299 │    0.028 │   0.000 │
├─────────┼─────────┼──────────┼─────────┤
│     set │   0.299 │    0.028 │   0.000 │
└─────────┴─────────┴──────────┴─────────┘

Step 4: Run some tests

At a minimum one should have a test set, check the errors on that test set, and confirm that they are similar as the training errors.

@info("Test Errors")
test =  [gen_dat() for _=1:20]
ACEpotentials.linear_errors(test, model; data_keys...);
[ Info: Test Errors
[ Info: RMSE Table
┌─────────┬─────────┬──────────┬─────────┐
│    Type  E [meV]  F [eV/A]  V [meV] │
├─────────┼─────────┼──────────┼─────────┤
│ default │   0.223 │    0.037 │   0.000 │
├─────────┼─────────┼──────────┼─────────┤
│     set │   0.223 │    0.037 │   0.000 │
└─────────┴─────────┴──────────┴─────────┘
[ Info: MAE Table
┌─────────┬─────────┬──────────┬─────────┐
│    Type  E [meV]  F [eV/A]  V [meV] │
├─────────┼─────────┼──────────┼─────────┤
│ default │   0.172 │    0.028 │   0.000 │
├─────────┼─────────┼──────────┼─────────┤
│     set │   0.172 │    0.028 │   0.000 │
└─────────┴─────────┴──────────┴─────────┘

If we wanted to perform such a test ``manually'' it might look like this:

@info("Manual RMSE Test")
potential = model.potential
test_energies = [ JuLIP.get_data(at, "energy") / length(at) for at in test]
model_energies = [energy(potential, at) / length(at) for at in test]
rmse_energy = norm(test_energies - model_energies) / sqrt(length(test))
@show rmse_energy;
[ Info: Manual RMSE Test
rmse_energy = 0.00022336058195964

But in practice, one should run more extensive test simulations to check how robust the fitted potential is.

Step 5: export the model

The fitted model can be exported to a JSON or YAML file, or to a LAMMPs compatible yace file. We won't go through that in this tutoral. See export2json and export2lammps for further information.

Step 6: Using the model

Let's do something very simple: relax a vacancy.

We create a small Si cell, delete an atom and rattle the rest

at = bulk(:Si, cubic=true) * 3
deleteat!(at, 1)
rattle!(at, 0.03 * rnn(:Si))
JuLIP.Atoms with 215 atoms

we can now minimize the ACE energy.

set_calculator!(at, potential);
minimise!(at)
E_ace = energy(at)
-924.9764073956869

If we want a formation energy, we could get it like this.

at0 = bulk(:Si)
E0_ace = energy(potential, at0);
Evac_ace = E_ace - (length(at)-1)/length(at0) * E0_ace
@show Evac_ace;
Evac_ace = 2.7148091746998944

Note that there are no vacancy structures in the training set, so this is a prediction out of sample. We have no guarantee of the accuracy of this prediction. In fact the prediction is quite far off:

sw = StillingerWeber()
set_calculator!(at, sw)
minimise!(at; verbose=false);
E_sw = energy(at)
E0_sw = energy(sw, bulk(:Si))
Evac_sw = E_sw - (length(at)-1)/length(at0) * E0_sw
@show Evac_sw;
Evac_sw = -1.1122891578452254e-5

To obtain accurate predictions on a vacancy structure, we must add it to the training set. This iterative model development process goes beyond the scope of this tutorial.


This page was generated using Literate.jl.