3. Library
The FitSNAP library provides a high level connection to FitSNAP methods in external Python scripts.
The library is designed to provide effective and massively parallel tools for solving atomistic machine
learning problems. Examples include parallel scraping and calculation of atomistic features to fit
a potential, or extraction of this data for other unsupervised and supervised learning tasks with
external libraries. Familiar users can craft custom atomistic machine learning workflows suited to
their particular needs, such as automated active learning procedures and hyperparameter optimizers.
The overall goal of the API is to supply tools needed for solving a wide range of atomistic machine
learning problems in a flexible manner. API use is based on instances of FitSnap
objects,
noting some important points:
Each
FitSnap
instance possesses its own settings, such as hyperparameters.Each
FitSnap
instance possesses its own optional MPI communicator over which appropriate operations such as calculating descriptors are parallelized, and memory is shared between MPI ranks.All results of collating data, calculating descriptors, and fitting a potential are therefore contained within a
FitSnap
instance; this improves organization of fits and reduces confusion about where a trained model came from.
To use the library we must first import FitSnap
:
from fitsnap3lib.fitsnap import FitSnap
We will create an instance of FitSnap
with specific input settings.
First we need to define the settings used by FitSnap
. This can be a path to a traditional
input script, or a dictionary containing sections and keywords. For example a settings
dictionary to perform a fit can be defined like:
settings = \
{
"BISPECTRUM":
{
"numTypes": 1,
"twojmax": 6,
"rcutfac": 4.67637,
"rfac0": 0.99363,
"rmin0": 0.0,
"wj": 1.0,
"radelem": 0.5,
"type": "Ta"
},
"CALCULATOR":
{
"calculator": "LAMMPSSNAP",
"energy": 1,
"force": 1,
"stress": 1
},
"SOLVER":
{
"solver": "SVD"
},
"SCRAPER":
{
"scraper": "JSON"
},
"PATH":
{
"dataPath": "/path/to/FitSNAP/examples/Ta_Linear_JCP2014/JSON"
},
"REFERENCE":
{
"units": "metal",
"atom_style": "atomic",
"pair_style": "hybrid/overlay zero 6.0 zbl 4.0 4.8",
"pair_coeff1": "* * zero",
"pair_coeff2": "* * zbl 73 73"
},
"GROUPS":
{
"group_sections": "name training_size testing_size eweight fweight vweight",
"group_types": "str float float float float float",
"Displaced_FCC" : "1.0 0.0 100 1 1.00E-08",
"Volume_FCC" : "1.0 0.0 1.00E+00 1.00E-09 1.00E-09"
}
}
Create an FitSnap
instance using these settings like:
# The --overwrite command line arg lets us overwrite possible output files.
fs = FitSnap(settings, arglist=["--overwrite"])
Then use the high level functions for (1) scraping data, (2) calculating descriptors, and (3) performing a fit:
# Scrape fitting data.
fs.scrape_configs()
# Calculate descriptors.
fs.process_configs()
# Fit the model.
fs.perform_fit()
# Observe the errors.
print(fs.solver.errors)
Each FitSnap
instance contains its own settings for defining an entire machine learning fit
from start to finish.
This can include training data and hyperparameters all the way to the final fitting coefficients or
model and error metrics.
This design is similar to scikit-learn, where users make instances out of model classes like
instance = Ridge(alpha)
and call class methods such as instance.fit(A, b)
.
With FitSnap
, however, we have many more settings and hyperparameters.
It therefore improves organization to contain all these attributes in a single FitSnap
instance to reduce confusion about where a fit came from.
Most methods such as calculating descriptors and performing fits are methods of a particular
instance, and the actions of these methods depend on the state or settings of that instance.
These methods and the rest of the API are detailed below.