3.2. Input files
Input scripts contain settings that tell FitSNAP how to perform a fit. Our input scripts take the form of configuration files with a format explained by Python’s native ConfigParser class. These configuration files are composed of sections, each of which contains keys with values, e.g. like:
[SECTION1]
key1 = value1
key2 = value2
[SECTION2]
key3 = value3
key4 = value4
key5 = value5
In FitSNAP, each section declares a setting for a certain aspect of the machine learning problem.
For example we have a BISPECTRUM
section whose keys determine settings for the bispectrum
descriptors that describe interatomic geometry, a CALCULATOR
section whose keys determine
which LAMMPS computes to use for calculating the descriptors, a SOLVER
section whose keys
determine which numerical solver to use for performing the fit, and so forth.
There are many examples on the GitHub repo, for example the linear SNAP tantalum example has the following input script:
[BISPECTRUM]
numTypes = 1
twojmax = 6
rcutfac = 4.67637
rfac0 = 0.99363
rmin0 = 0.0
wj = 1.0
radelem = 0.5
type = Ta
wselfallflag = 0
chemflag = 0
bzeroflag = 0
quadraticflag = 0
[CALCULATOR]
calculator = LAMMPSSNAP
energy = 1
force = 1
stress = 1
[ESHIFT]
Ta = 0.0
[SOLVER]
solver = SVD
compute_testerrs = 1
detailed_errors = 1
[SCRAPER]
scraper = JSON
[PATH]
dataPath = JSON
[OUTFILE]
metrics = Ta_metrics.md
potential = Ta_pot
[REFERENCE]
units = metal
atom_style = atomic
pair_style = hybrid/overlay zero 10.0 zbl 4.0 4.8
pair_coeff1 = * * zero
pair_coeff2 = * * zbl 73 73
[GROUPS]
# name size eweight fweight vweight
group_sections = name training_size testing_size eweight fweight vweight
group_types = str float float float float float
smartweights = 0
random_sampling = 0
Displaced_A15 = 1.0 0.0 100 1 1.00E-08
Displaced_BCC = 1.0 0.0 100 1 1.00E-08
Displaced_FCC = 1.0 0.0 100 1 1.00E-08
Elastic_BCC = 1.0 0.0 1.00E-08 1.00E-08 0.0001
Elastic_FCC = 1.0 0.0 1.00E-09 1.00E-09 1.00E-09
GSF_110 = 1.0 0.0 100 1 1.00E-08
GSF_112 = 1.0 0.0 100 1 1.00E-08
Liquid = 1.0 0.0 4.67E+02 1 1.00E-08
Surface = 1.0 0.0 100 1 1.00E-08
Volume_A15 = 1.0 0.0 1.00E+00 1.00E-09 1.00E-09
Volume_BCC = 1.0 0.0 1.00E+00 1.00E-09 1.00E-09
Volume_FCC = 1.0 0.0 1.00E+00 1.00E-09 1.00E-09
[EXTRAS]
dump_descriptors = 1
dump_truth = 1
dump_weights = 1
dump_dataframe = 1
[MEMORY]
override = 0
We explain the sections and their keys in more detail below.
3.2.1. [BISPECTRUM]
This section contains settings for the SNAP bispectrum descriptors from Thompson et. al.
numTypes
number of atom types in your set of configurations located in the [PATH] sectiontype
contains a list of element type symbols, one for each type. Make sure these are ordered correctly, e.g. if you have a LAMMPS type 1 atom that isGa
, and LAMMPS type 2 atoms areN
, list this asGa N
.
The remaining keywords are thoroughly explained in the LAMMPS docs on computing SNAP descriptors but we will give an overview here. These are hyperparameters that *could* be optimized for your specific system, but this is not a requirement. You may also use the default values, or values used in our examples, which are often well behaved for other systems.
twojmax
determines the number of bispectrum coefficients for each element type. Give an argument for each element type, e.g. for two element types we may use6 6
declaringtwojmax = 6
for each type. Highertwojmax
increases the number of bispectrum components for each atom, thus potentially giving more accuracy at an increased cost. We recommend using atwojmax
of 4, 6, or 8. This corresponds to 14, 30, and 55 bispectrum components, respectively. Default value is 6.rcutfac
is a cutoff radius parameter. One value is used for all element types. We recommend a cutoff between 4 and 5 Angstroms for most systems. Default value is 4.67 Angstroms.rfac0
is a parameter used in distance to angle conversion, between 0 and 1. Default value is 0.99363.rmin0
another parameter used in distance to angle conversion, between 0 and 1. Default value is 0.wj
list of neighbor weights. Give one argument for each element types, e.g. for two element types we may use1.0 0.5
declaring a weight of 1.0 for neighbors of type 1, and 0.5 for neighbors of type 2. We recommend taking values from the existing multi-element examples.radelem
list of cutoff radii, one for each element type. These values get multiplied by2 * rcutfac
to determine the effective cutoff of a particular type. For each element, the effective cutoff radius is2 * rcutfac * radelem
.wselfallflag
is 0 or 1, determining whether self-contribution is for elements of a central atom or for all elements, respectively.chemflag
is 0 or 1, determining whether to use explicit multi-element SNAP descriptors as explained in Cusentino et. al., and used in the InP example. This increases the number of SNAP descriptors to resolve multi-element environment descriptions, and therefore comes at an increase in cost but higher accuracy. This option is not required for multi-element systems; the default value is 0.bzeroflag
is 0 or 1, determining whether or not B0, the bispectrum components of an atom with no neighbors, are subtracted from the calculated bispectrum components.quadraticflag
is 0 or 1, determining whether or not to use quadratic descriptors in a linear model, as done by Wood and Thompson, and illusrated in theTa_Quadratic
example.
The following keywords are necessary for extracting per-atom descriptors and individual derivatives of bispectrum components with respect to neighbors, required for neural network potentials. See more info in PyTorch Models
bikflag
is 0 or 1, determining whether to compute per-atom bispectrum descriptors instead of sums of components for each atom. We do the latter for linear fitting because of the nature of the linear problem, which saves memory, but per-atom descriptors are required for neural networks.dgradflag
is 0 or 1, determining whether to compute individual derivatives of descriptors with respect to neighboring atoms, which is required for neural networks.
3.2.2. [CALCULATOR]
This section houses keywords determining which calculator to use, i.e. which descriptors to calculate.
calculator
is the name of the LAMMPS connection for getting descriptors, e.g. for SNAP descriptors useLAMMPSSNAP
.energy
is 0 or 1, determining whether to calculate descriptors associated with energies.force
is 0 or 1, determining whether to calculate descriptor gradients associated with forces.stress
is 0 or 1, determining whether to calculate descriptors gradients associated with virial terms for calculating and fitting to stresses.per_atom_energy
is 0 or 1, determining whether to use per-atom energy descriptors in association withbikflag = 1
nonlinear
is 0 or 1, and should be 1 if using nonlinear solvers such as PyTorch models.
3.2.3. [ESHIFT]
This section declares an energy shift applied to each atom type. These values are free to choose however desired. For example these values could come from the per-atom energy predicted in a vacuum from ab initio calculations. These values may also be treated as hyperparameters.
3.2.4. [SOLVER]
This section contains keywords associated with specific machine learning solvers.
solver
name of the solver. We recommend usingSVD
for linear solvers andPYTORCH
for neural networks.
3.2.5. [SCRAPER]
This section declares which file scraper to use for gathering training data.
scraper
is eitherJSON
orXYZ.
If using the XYZ scraper, each Group of configurations has its own XYZ file
containing configurations of atoms concatenated together, in extended XYZ format. Follow the example
in examples/Ta_XYZ
.
If using the JSON scraper, each Group may have its own directory containing
separate JSON files for each configuration. Guarantee compatibility with FitSNAP by using our
tools/VASP2JSON.py
conversion script; this requires that your DFT training data be in VASP
OUTCAR format. Likewise for tools/VASPxml2JSON.py
.
We are also working on a scraper that directly reads VASP output; more documentation on this coming soon.
3.2.6. [PATH]
This section contains a dataPath
keyword that locates the directory of the training data.
For example if the training data is in a file called JSON
in the previous directory relative
to where we run the FitSNAP executable, this section looks like:
[PATH]
dataPath = ../JSON
3.2.7. [OUTFILE]
This section declares the names of output files.
metrics
gives the name of the error metrics markdown file. If using LAMMPS metal units, energy mean absolute errors are in eV and force errors are in eV/Angstrom.potential
gives the prefix of the LAMMPS-ready potential files to dump.
3.2.8. [REFERENCE]
This section includes settings for an optional potential to overlay our machine learned potential with. We call this a “reference potential”, which is a pair style defined in LAMMPS. If you choose to use a reference potential, the energies and forces from the reference potential will be subtracted from the target ab initio training data. We also declare units in this section.
units
declares units used by LAMMPS, see LAMMPS units docs for more info.atom_style
the atom style used by the LAMMPS pair style you wish to overlay, see LAMMPS atom style docs for more info.
The minimum working reference potential setup involves not using a reference potential at all, where the reference section would look like (using metal units):
[REFERENCE]
units = metal
pair_style = zero 10.0
pair_coeff = * *
The rest of the keywords are associated with the particular LAMMPS pair style you wish to use.
3.2.9. [GROUPS]
Each group should be its own sub-directory in the directory given by the dataPath/
keyword in
the [PATH] section. There are a few different allowed syntaxes; subdirectory
names in the first column is common to all options.
group_sections
declares which parameters you want to set for each group of configurations.
For example:
group_sections = name training_size testing_size eweight fweight vweight
means you will supply group names, training size as a decimal fraction, testing size as a decimal fraction, energy weight, force weight, and virial weight, respectively. We must also declare the data types associated with these variables, given by
group_types = str float float float float float
Then we may declare the group names and parameters associated with them. For a particular group
called Liquid
for example, this looks like:
Liquid = 1.0 0.0 4.67E+02 1 1.00E-08
where Liquid
is the name of the group, 1.0
is the training fraction, 0.0
is
the testing fraction, 6.47E+02
is the energy weight, 1
is the force weight, and
1.00E-8
is the virial weight.
Other available keywords are
random_sampling
is 0 or 1. If 1, configurations in the groups are randomly sampled between their training and testing fractions.smartweights`
is 0 or 1. If 1, we declare statistically distributed weights given your supplied weights.
A few examples are found in the examples directory.
3.2.10. [EXTRAS]
This section contains keywords on optional info to dump. By default, linear models output error
metric markdown files that should be sufficient in most cases. If more detailed errors are required,
please see the output Pandas dataframe FitSNAP.df
used by linear models. Examples and
library tools for analyzing this dataframe are found in our
Colab Python notebook tutorial.
3.2.11. [MEMORY]
This section contains keywords for dealing with memory. We recommend using defaults.