Smoothness Priors

using ACEpotentials, LinearAlgebra, Plots, LaTeXStrings, Unitful

ACEpotentials models make heavy use of smoothness priors, i.e., prior parameter distributions that impose smoothness on the fitted potential. This tutorial demonstrates how to use the smoothness priors implemented in ACEpotentials. We start by reading in a tiny testing dataset, and bring the data into a format that ACEfit likes. Note that using a very limited dataset makes the use of priors particularlty important. In general, the larger and more diverse the dataset, the less important the prior becomes.

rawdata, _, _ = ACEpotentials.example_dataset("Si_tiny")
datakeys = (energy_key = "dft_energy", force_key = "dft_force", virial_key = "dft_virial")

rcut = 6.0     # cut off distance
r_nn = 2.3     # typical nearest neighbour distance

model = ace1_model(elements = [:Si],
                    order = 3, totaldegree = 12,
                    rcut = rcut, r0 = r_nn,
                    Eref = Dict(:Si => -158.54496821))

data = [ AtomsData(at; datakeys..., v_ref = model.model.Vref) for at in rawdata ]
A, Y, W = ACEfit.assemble(data, model);

[ Info: Assembling linear problem.
[ Info:   - Creating feature matrix with size (1052, 223).
[ Info:   - Beginning assembly with processor count:  1.

Progress:   4%|█▌                                       |  ETA: 0:01:36
Progress:   9%|███▉                                     |  ETA: 0:00:49
Progress:  28%|███████████▋                             |  ETA: 0:00:23
Progress:  34%|█████████████▉                           |  ETA: 0:00:20
Progress:  53%|█████████████████████▋                   |  ETA: 0:00:13
Progress:  58%|████████████████████████                 |  ETA: 0:00:11
Progress:  64%|██████████████████████████▎              |  ETA: 0:00:09
Progress:  74%|██████████████████████████████▏          |  ETA: 0:00:07
Progress:  79%|████████████████████████████████▌        |  ETA: 0:00:05
Progress:  85%|██████████████████████████████████▊      |  ETA: 0:00:04
Progress:  91%|█████████████████████████████████████▏   |  ETA: 0:00:02
Progress: 100%|█████████████████████████████████████████| Time: 0:00:24
[ Info:   - Assembly completed.
[ Info: Assembling full weight vector.

A positive definite matrix P specifies a normal prior distribution in the Bayesian framework, but for the purpose of this tutorial it is maybe more intuitive to simply think of it as a regularisation operator. The regularised linear least squares problem is

\[ \| A c - y \|^2 + \lambda \| P c \|^2\]

where A is the design matrix, $y$ is the vector of observations, $c$ is the vector of parameters, and $\lambda$ is a regularisation parameter. The prior matrix $P$ is specified by the user. At present we support diagonal operators $P$. The diagonal elements of $P$ are the prior variances. The larger the prior variance, the smoother the fitted potential. Although not strictly true, we can think of each basis function as specified by a the parameters $(n_t, l_t)_{t = 1}^N$, where $N$ is the correlation-order, i.e., corresponding prior matrix element must be a function of those $n_t, l_t$ values,

\[ P_{\bf k \bf k} = \gamma({\bf k}).\]

We currently provide convenient interfaces for three classes: algebraic, exponential and gaussian. Technically, (but loosely speaking) an algebraic prior declares that the target function has p derivatives (for some parameter p), the exponential prior declares that the target function is analytic, while the gaussian prior declares that the target function is entire. In practice, especially for small datasets when the model is underdetermined, one may use a stronger prior than is justified by our knowledge about the target function. For large and diverse datasets it can happen that too strong a prior will adversely affect the fit accuracy. For the definition of the three main prior classes, see the documentation for

?algebraic_smoothness_prior
?exp_smoothness_prior
?gaussian_smoothness_prior

In the following we demonstrate the usage of algebraic and gaussian priors. The choices for σl, σn made here may seem "magical", but there is a good justification and we plan to automate this in future releases.

Pa2 = algebraic_smoothness_prior(model; p=2)
Pa4 = algebraic_smoothness_prior(model; p=4)
Pg  = gaussian_smoothness_prior( model; wl = (2/r_nn)^2, wn = (0.5/r_nn)^2);

Each of these object Pa2, Pa4, Pg are diagonal matrices. For each prior constructed above we now solve the regularised least squares problem. Note how design matrix need only be assembled once if we want to play with many different priors. Most of the time we would just use defaults however and then these steps are all taken care of behind the scenes.

priors = Dict("Id" => I, "Algebraic(2)" => Pa2, "Algebraic(4)" => Pa4,"Gaussian" => Pg)
rmse = Dict()
pots = Dict()

for (prior_name, P) in priors
    print("Solving with ", prior_name, " prior ... ")

    # solve the regularized least squares problem
    Ã = Diagonal(W) * (A / P)
    ỹ = Diagonal(W) * Y
    c̃ = ACEfit.solve(ACEfit.BLR(; verbose=false), Ã, ỹ)["C"]
    _modl = deepcopy(model)
    set_parameters!(_modl, P \ c̃)

    # compute errors and store them for later use (don't print them here)
    errs = compute_errors(rawdata, model; verbose=false, datakeys...)
    rmse[prior_name] = errs["rmse"]["set"]["F"]
    pots[prior_name] = _modl
    println(" force=rmse = ", rmse[prior_name])
end

Solving with Id prior ...  force=rmse = 1.2215538976358178
Solving with Algebraic(4) prior ...  force=rmse = 1.2215538976358178
Solving with Gaussian prior ...  force=rmse = 1.2215538976358178
Solving with Algebraic(2) prior ...  force=rmse = 1.2215538976358178

The force RMSE errors are comparable for the three priors, though slightly better for the weaker smoothness priors Algebraic(2) and Id. This is unsurprising, since those priors are less restrictive.

On the other hand, we expect the stronger priors to generalize better. A typical intuition is that smooth potentials with similar accuracy will be more transferable than rougher potentials. We will show three one-dimensional slices through the fitted potentials: dimer curves, trimer curves and a decohesion curve.

First, the dimer curves: the utility function ACEpotentials.dimers can be used to generate the data for those curves, which are then plotted using Plots.jl. We also add a vertical line to indicate the nearest neighbour distance. The standard identity prior gives a completely unrealistic dimer curve. The Algebraic(2) regularised potential is missing a repulsive core behaviour. The two remaining smoothness priors give physically sensible dimer curve shapes.

labels = sort(collect(keys(priors)))[[4,1,2,3]]
plt_dim = plot(legend = :topright, xlabel = L"r [\AA]", ylabel = "E [eV]",
               xlims = (0, rcut), ylims = (-2, 5))
for l in labels
    D = ACEpotentials.dimers(pots[l], [:Si,])
    plot!(plt_dim, D[(:Si, :Si)]..., label = l, lw=2)
end
vline!([r_nn,], lw=2, ls=:dash, label = L"r_{\rm nn}")
plt_dim

Next, we look at a trimer curve. This is generated using ACEpotentials.trimers. Both the Id and Algebraic(2) regularised models contain fairly significant oscillations while the Algebraic(4) and Gaussian models are much smoother. In addition, it appears that the Gaussian regularised model is somewhat more physically realistic on this slice with high energy at small bond-angles (thought the low energy at angle π seems somewhat strange).

plt_trim = plot(legend = :topright, xlabel = L"\theta", ylabel = "E [eV]",
               xlims = (0, pi), ylims = (-0.6, 0.2))
for l in labels
    D = ACEpotentials.trimers(pots[l], [:Si,], r_nn*u"Å",  r_nn*u"Å")
    plot!(plt_trim, D[(:Si, :Si, :Si)]..., label = l, lw=2)
end
vline!(plt_trim, [1.90241,], lw=2, label = "equilibrium angle")
plt_trim

Finally, we plot a decohesion curve, which contains more significant many-body effects. Arguably, none of our potentials perform very well on this test. Usually larger datasets, and longer cutoffs help in this case.

using AtomsBuilder  # gives us `bulk`

at0 = bulk(:Si, cubic=true)
plt_dec = plot(legend = :topright, xlabel = L"r [\AA]", ylabel = "strain [eV/Å]",
                xlim = (0.0, 5.0))
for l in labels
    rr, E, dE = ACEpotentials.decohesion_curve(at0, pots[l])
    plot!(plt_dec, ustrip.(rr), ustrip.(dE), label = l, lw=2)
end
plt_dec

This page was generated using Literate.jl.