# Fitting a TiAl potential (Julia)

Start by importing the required libraries

using ACE1pack

We need a dataset TiAl_tutorial.xyz for this tutorial which is provided as an artifact. Normally we would get the path to a datset via artifact"TiAl_tutorial but for these tutorial to run from anywhere it is easiest to let ACE1pack load the data for us. The following line will download the dataset, store is somewhere inside ~/.julia/... and return a string with the absolute path to the file.

data_file = joinpath(ACE1pack.artifact("TiAl_tutorial"), "TiAl_tutorial.xyz")
"/home/runner/.julia/artifacts/b437d7d5fac4424b8203d0afc31732879d3da5b2/TiAl_tutorial.xyz"

We can now use IPFitting.Data.read_xyz to load in the training set. This will not only load the structures, but also search for energies and force from a reference model, and all this will then be stored as a Vector{Dat}. We keep only every 10 training structures to keep the regression problem small.

data = IPFitting.Data.read_xyz(data_file, energy_key="energy", force_key="force", virial_key="virial")
train = data[1:5:end];
nothing #hide

The next step is to generate a basis set:

• Here we take 3-correlation, i.e. a 4-body potential,
• a relatively low polynomial degree maxdeg = 6, and
• a cutoff radious rcut = 5.5

These three are the most important approximation parameters to explore when trying to improve the fit-accuracy. In addition there is

• The parameter r0 is just a scalig parameter and the fits should not be very sensitive to its choice. A rough estimate for the nearest-neighbour distance is usually ok.
• The inner cutoff rin is will ensure that the many-body potential becomes zero when atoms get too close. The reason for this is that we usually do not have data against which to fit the potential in this deformation regime and therefore cannot make reliable predictions. Instead we will add a pair potential to model this regime below.

EG: pin should be set to 2?? i.e. bad ace_basis default?

r0 = 2.88
ACE_B = ace_basis(species = [:Ti, :Al],
N = 3,
r0 = r0,
rin = 0.6 * r0,
rcut = 5.5,
maxdeg = 6);

As alluded to above, we now add a pair potential to obtain qualitatively correct repulsive behaviour for colliding atoms. The many-body basis ACE_B and the pair potential Bpair are then combined into a single basis set B.

Bpair = pair_basis(species = [:Ti, :Al],
r0 = r0,
maxdeg = 6,
rcut = 7.0,
pcut = 1, 	# this is not a reasonable default?
pin = 0)
B = JuLIP.MLIPs.IPSuperBasis([Bpair, ACE_B]);

The next step is to evaluate the basis on the training set. Precomputing the basis once (and possibly save it to disk) makes experimenting with different regression parameters much more efficient. This is demonstrated below by showing various different solver options. Similarly once could also explore different data weights (see weights below).

dB = LsqDB("", B, train);
nothing #hide

Vref specifies a reference potential, which is subtracted from the training data and the ACE parameters are then estimates from the difference. I.e. the this reference potential will in the end be added to the ACE model. Here we just use a one-body potential i.e. a reference atom energy for each individual species.

Vref = OneBody(:Ti => -1586.0195, :Al => -105.5954)
OneBody{Float64}(Dict(:Al => -105.5954, :Ti => -1586.0195))

The next line specifies the regression weights: in the least squares loss different observations are given different weights,

$$$\sum_{R} \Big( w_^E_R | E(R) - y_R^E |^2 + w_F^R | {\rm forces}(R) - y_R^F |^2 + w_V^R | {\rm virial}(R) - y_R^V |^2 \Big),$$$

and this is specificed via the following dictionary. The keys correspond to the configtype field in the Dat structure.

weights = Dict(
"FLD_TiAl" => Dict("E" => 30.0, "F" => 1.0 , "V" => 1.0 ),
"TiAl_T5000" => Dict("E" => 5.0, "F" => 1.0 , "V" => 1.0 ))
Dict{String, Dict{String, Float64}} with 2 entries:
"FLD_TiAl"   => Dict("V"=>1.0, "E"=>30.0, "F"=>1.0)
"TiAl_T5000" => Dict("V"=>1.0, "E"=>5.0, "F"=>1.0)

We are finally coming to the parameter estimation. In this tutorial we provide four different algorithms to solve the LLSQ problem: a Krylov method LSQR, rank-revealing QR, scikit-learn BRR solver as well as the scikit-learn ARD solver. TODO: discuss regularisation

solver_type = :lsqr
smoothness_prior = true

if solver_type == :lsqr
solver = Dict(
"solver" => :lsqr,
"lsqr_damp" => 1e-2,
"lsqr_atol" => 1e-6)
elseif solver_type == :rrqr
solver = Dict(
"solver" => :rrqr,
"rrqr_tol" => 1e-5)
elseif solver_type == :brr
solver = Dict(
"solver" => :brr,
"brr_tol" => 1e-3)
elseif solver_type == :ard
solver= Dict(
"solver" => :ard,
"ard_tol" => 1e-3,
"ard_threshold_lambda" => 10000)
end
Dict{String, Any} with 3 entries:
"lsqr_atol" => 1.0e-6
"lsqr_damp" => 0.01
"solver"    => :lsqr

ACE1.jl has a heuristic smoothness prior built in which assigns to each basis function Bi a scaling parameter si that estimates how "rough" that basis function is. The following line generates a regularizer (prior) with si^q on the diagonal, thus penalizing rougher basis functions and enforcing a smoother fitted potential.

if smoothness_prior
using LinearAlgebra
q = 3.0
solver["P"] = Diagonal(vcat(ACE1.scaling.(dB.basis.BB, q)...))
end

Once all the solver parameters have been determined, we use IPFitting.lsqfit to estimate the parameters. This routine will return the fitted interatomic potential IP as well as the a dictionary lsqfit with some information about the fitting process.

IP, lsqinfo = lsqfit(dB, solver=solver, weights=weights, Vref=Vref, error_table = true);
nothing #hide

For example,lsqinfo will contain information about the fit accuracy which we can display as follows:

@info("Training Error Table")
rmse_table(lsqinfo["errors"])

We should of course also look at test errors, which can be done as follows. Depending on the choice of solver, and solver parameters, the test errors might be very poor. Exploring different parameters in different applications can lead to significantly improved predictions.

@info("Test Error Table")
test = data[2:10:end]
rmse_table(test)
If we want to save the fitted potentials to disk to later use we can use one of the following command: the first saves the potential as an ACE1.jl compatible potential, while the second line exports it to a format that can be ready by the pacemaker code to be used within LAMMPS.
save_dict("./TiAl_tutorial_pot.json", Dict("IP" => write_dict(IP), "info" => lsqinfo))
ACE1pack.ExportMulti.export_ACE("./TiAl_tutorial_pot.yace", IP)