Input files =========== Input scripts contain settings that tell FitSNAP how to perform a fit. Our input scripts take the form of configuration files with a format explained by `Python's native ConfigParser class `_. These configuration files are composed of sections, each of which contains keys with values, e.g. like:: [SECTION1] key1 = value1 key2 = value2 [SECTION2] key3 = value3 key4 = value4 key5 = value5 .. _configparser: https://docs.python.org/3/library/configparser.html In FitSNAP, each section declares a setting for a certain aspect of the machine learning problem. For example we have a :code:`BISPECTRUM` section whose keys determine settings for the bispectrum descriptors that describe interatomic geometry, a :code:`CALCULATOR` section whose keys determine which LAMMPS computes to use for calculating the descriptors, a :code:`SOLVER` section whose keys determine which numerical solver to use for performing the fit, and so forth. There are many examples on the GitHub repo, for example the linear SNAP tantalum example has the following input script:: [BISPECTRUM] numTypes = 1 twojmax = 6 rcutfac = 4.67637 rfac0 = 0.99363 rmin0 = 0.0 wj = 1.0 radelem = 0.5 type = Ta wselfallflag = 0 chemflag = 0 bzeroflag = 0 quadraticflag = 0 [CALCULATOR] calculator = LAMMPSSNAP energy = 1 force = 1 stress = 1 [ESHIFT] Ta = 0.0 [SOLVER] solver = SVD compute_testerrs = 1 detailed_errors = 1 [SCRAPER] scraper = JSON [PATH] dataPath = JSON [OUTFILE] metrics = Ta_metrics.md potential = Ta_pot [REFERENCE] units = metal atom_style = atomic pair_style = hybrid/overlay zero 10.0 zbl 4.0 4.8 pair_coeff1 = * * zero pair_coeff2 = * * zbl 73 73 [GROUPS] # name size eweight fweight vweight group_sections = name training_size testing_size eweight fweight vweight group_types = str float float float float float smartweights = 0 random_sampling = 0 Displaced_A15 = 1.0 0.0 100 1 1.00E-08 Displaced_BCC = 1.0 0.0 100 1 1.00E-08 Displaced_FCC = 1.0 0.0 100 1 1.00E-08 Elastic_BCC = 1.0 0.0 1.00E-08 1.00E-08 0.0001 Elastic_FCC = 1.0 0.0 1.00E-09 1.00E-09 1.00E-09 GSF_110 = 1.0 0.0 100 1 1.00E-08 GSF_112 = 1.0 0.0 100 1 1.00E-08 Liquid = 1.0 0.0 4.67E+02 1 1.00E-08 Surface = 1.0 0.0 100 1 1.00E-08 Volume_A15 = 1.0 0.0 1.00E+00 1.00E-09 1.00E-09 Volume_BCC = 1.0 0.0 1.00E+00 1.00E-09 1.00E-09 Volume_FCC = 1.0 0.0 1.00E+00 1.00E-09 1.00E-09 [EXTRAS] dump_descriptors = 1 dump_truth = 1 dump_weights = 1 dump_dataframe = 1 [MEMORY] override = 0 We explain the sections and their keys in more detail below. [BISPECTRUM] ^^^^^^^^^^^^ This section contains settings for the SNAP bispectrum descriptors from `Thompson et. al. `_ .. _snappaper: https://www.sciencedirect.com/science/article/pii/S0021999114008353 - :code:`numTypes` number of atom types in your set of configurations located in `the [PATH] section `__ - :code:`type` contains a list of element type symbols, one for each type. Make sure these are ordered correctly, e.g. if you have a LAMMPS type 1 atom that is :code:`Ga`, and LAMMPS type 2 atoms are :code:`N`, list this as :code:`Ga N`. The remaining keywords are thoroughly explained in the `LAMMPS docs on computing SNAP descriptors `_ but we will give an overview here. **These are hyperparameters that *could* be optimized for your specific system, but this is not a requirement. You may also use the default values, or values used in our examples, which are often well behaved for other systems.** - :code:`twojmax` determines the number of bispectrum coefficients for each element type. Give an argument for each element type, e.g. for two element types we may use :code:`6 6` declaring :code:`twojmax = 6` for each type. Higher :code:`twojmax` increases the number of bispectrum components for each atom, thus potentially giving more accuracy at an increased cost. We recommend using a :code:`twojmax` of 4, 6, or 8. This corresponds to 14, 30, and 55 bispectrum components, respectively. Default value is 6. - :code:`rcutfac` is a cutoff radius parameter. One value is used for all element types. We recommend a cutoff between 4 and 5 Angstroms for most systems. Default value is 4.67 Angstroms. - :code:`rfac0` is a parameter used in distance to angle conversion, between 0 and 1. Default value is 0.99363. - :code:`rmin0` another parameter used in distance to angle conversion, between 0 and 1. Default value is 0. - :code:`wj` list of neighbor weights. Give one argument for each element types, e.g. for two element types we may use :code:`1.0 0.5` declaring a weight of 1.0 for neighbors of type 1, and 0.5 for neighbors of type 2. We recommend taking values from the existing multi-element examples. - :code:`radelem` list of cutoff radii, one for each element type. These values get multiplied by :code:`2 * rcutfac` to determine the effective cutoff of a particular type. For each element, the effective cutoff radius is :code:`2 * rcutfac * radelem`. - :code:`wselfallflag` is 0 or 1, determining whether self-contribution is for elements of a central atom or for all elements, respectively. - :code:`chemflag` is 0 or 1, determining whether to use explicit multi-element SNAP descriptors as explained in `Cusentino et. al. `_, and used in the InP example. This increases the number of SNAP descriptors to resolve multi-element environment descriptions, and therefore comes at an increase in cost but higher accuracy. This option is not required for multi-element systems; the default value is 0. - :code:`bzeroflag` is 0 or 1, determining whether or not B0, the bispectrum components of an atom with no neighbors, are subtracted from the calculated bispectrum components. - :code:`quadraticflag` is 0 or 1, determining whether or not to use quadratic descriptors in a linear model, as done by `Wood and Thompson `_, and illusrated in the :code:`Ta_Quadratic` example. The following keywords are necessary for extracting per-atom descriptors and individual derivatives of bispectrum components with respect to neighbors, required for neural network potentials. See more info in `PyTorch Models `__ - :code:`bikflag` is 0 or 1, determining whether to compute per-atom bispectrum descriptors instead of sums of components for each atom. We do the latter for linear fitting because of the nature of the linear problem, which saves memory, but per-atom descriptors are required for neural networks. - :code:`dgradflag` is 0 or 1, determining whether to compute individual derivatives of descriptors with respect to neighboring atoms, which is required for neural networks. .. _lammpssnap: https://docs.lammps.org/compute_sna_atom.html .. _quadsnappaper: https://aip.scitation.org/doi/full/10.1063/1.5017641 .. _chemsnappaper: https://www.sciencedirect.com/science/article/pii/S0021999114008353 [CALCULATOR] ^^^^^^^^^^^^ This section houses keywords determining which calculator to use, i.e. which descriptors to calculate. - :code:`calculator` is the name of the LAMMPS connection for getting descriptors, e.g. for SNAP descriptors use :code:`LAMMPSSNAP`. - :code:`energy` is 0 or 1, determining whether to calculate descriptors associated with energies. - :code:`force` is 0 or 1, determining whether to calculate descriptor gradients associated with forces. - :code:`stress` is 0 or 1, determining whether to calculate descriptors gradients associated with virial terms for calculating and fitting to stresses. - :code:`per_atom_energy` is 0 or 1, determining whether to use per-atom energy descriptors in association with :code:`bikflag = 1` - :code:`nonlinear` is 0 or 1, and should be 1 if using nonlinear solvers such as PyTorch models. [ESHIFT] ^^^^^^^^ This section declares an energy shift applied to each atom type. These values are free to choose however desired. For example these values could come from the per-atom energy predicted in a vacuum from *ab initio* calculations. These values may also be treated as hyperparameters. [SOLVER] ^^^^^^^^ This section contains keywords associated with specific machine learning solvers. - :code:`solver` name of the solver. We recommend using :code:`SVD` for linear solvers and :code:`PYTORCH` for neural networks. [SCRAPER] ^^^^^^^^^ This section declares which file scraper to use for gathering training data. - :code:`scraper` is either :code:`JSON` or :code:`XYZ.` If using the XYZ scraper, each `Group `__ of configurations has its own XYZ file containing configurations of atoms concatenated together, in extended XYZ format. Follow the example in :code:`examples/Ta_XYZ`. If using the JSON scraper, each `Group `__ may have its own directory containing separate JSON files for each configuration. Guarantee compatibility with FitSNAP by using our :code:`tools/VASP2JSON.py` conversion script; this requires that your DFT training data be in VASP OUTCAR format. Likewise for :code:`tools/VASPxml2JSON.py`. We are also working on a scraper that directly reads VASP output; more documentation on this coming soon. [PATH] ^^^^^^ This section contains a :code:`dataPath` keyword that locates the directory of the training data. For example if the training data is in a file called :code:`JSON` in the previous directory relative to where we run the FitSNAP executable, this section looks like:: [PATH] dataPath = ../JSON [OUTFILE] ^^^^^^^^^ This section declares the names of output files. - :code:`metrics` gives the name of the error metrics markdown file. If using LAMMPS metal units, energy mean absolute errors are in eV and force errors are in eV/Angstrom. - :code:`potential` gives the prefix of the LAMMPS-ready potential files to dump. [REFERENCE] ^^^^^^^^^^^ This section includes settings for an *optional* potential to overlay our machine learned potential with. We call this a "reference potential", which is a pair style defined in LAMMPS. If you choose to use a reference potential, the energies and forces from the reference potential will be subtracted from the target *ab initio* training data. We also declare units in this section. - :code:`units` declares units used by LAMMPS, see `LAMMPS units docs `_ for more info. - :code:`atom_style` the atom style used by the LAMMPS pair style you wish to overlay, see `LAMMPS atom style docs `_ for more info. The minimum working reference potential setup involves not using a reference potential at all, where the reference section would look like (using metal units):: [REFERENCE] units = metal pair_style = zero 10.0 pair_coeff = * * The rest of the keywords are associated with the particular LAMMPS pair style you wish to use. .. _lammpsunits: https://docs.lammps.org/units.html .. _lammpsatomstyle: https://docs.lammps.org/atom_style.html [GROUPS] ^^^^^^^^ Each group should be its own sub-directory in the directory given by the :code:`dataPath/` keyword in `the [PATH] section `__. There are a few different allowed syntaxes; subdirectory names in the first column is common to all options. :code:`group_sections` declares which parameters you want to set for each group of configurations. For example:: group_sections = name training_size testing_size eweight fweight vweight means you will supply group names, training size as a decimal fraction, testing size as a decimal fraction, energy weight, force weight, and virial weight, respectively. We must also declare the data types associated with these variables, given by group_types = str float float float float float Then we may declare the group names and parameters associated with them. For a particular group called :code:`Liquid` for example, this looks like:: Liquid = 1.0 0.0 4.67E+02 1 1.00E-08 where :code:`Liquid` is the name of the group, :code:`1.0` is the training fraction, :code:`0.0` is the testing fraction, :code:`6.47E+02` is the energy weight, :code:`1` is the force weight, and :code:`1.00E-8` is the virial weight. Other available keywords are - :code:`random_sampling` is 0 or 1. If 1, configurations in the groups are randomly sampled between their training and testing fractions. - :code:`smartweights`` is 0 or 1. If 1, we declare statistically distributed weights given your supplied weights. A few examples are found in the examples directory. [EXTRAS] ^^^^^^^^ This section contains keywords on optional info to dump. By default, linear models output error metric markdown files that should be sufficient in most cases. If more detailed errors are required, please see the output Pandas dataframe :code:`FitSNAP.df` used by linear models. Examples and library tools for analyzing this dataframe are found in our `Colab Python notebook tutorial `_. [MEMORY] ^^^^^^^^ This section contains keywords for dealing with memory. We recommend using defaults. .. _tutorialnotebook: https://colab.research.google.com/github/FitSNAP/FitSNAP/blob/master/tutorial.ipynb