3.1. FitSnap
The FitSnap
class houses all objects needed for performing a fit. These objects
are instantiated from other core classes described in the rest of the docs. The two main
inputs to a FitSnap
instance are (1) settings and (2) an MPI communicator. The
settings can be a nested dictionary as shown in some examples, while the MPI communicator
is typically the world communicator containing all resources dictated by the mpirun
command. After instantiating FitSnap
with these inputs, the instance will contain
its own instance of ParallelTools
which houses functions and data structures that
operate on the resources in the input communicator. The settings will be stored in an instance
of the Config
class. The FitSnap
class is documented below.
- class fitsnap3lib.fitsnap.FitSnap(input=None, comm=None, arglist: list = [])
This classes houses the objects needed for machine learning a potential, start to finish.
- Parameters
input (str) – Optional dictionary or path to input file when using library mode; defaults to None for executable use.
comm – Optional MPI communicator when using library mode; defaults to None.
arglist (list) – Optional list of cmd line args when using library mode.
- pt
Instance of the ParallelTools class for helping MPI communication and shared arrays.
- Type
class
ParallelTools
- config
Instance of the Config class for initializing settings, initialized with a ParallelTools instance.
- Type
class
Config
- scraper
Instance of the Scraper class for gathering configs.
- Type
class
Scraper
- data
List of dictionaries, where each configuration of atoms has its own dictionary.
- Type
list
- calculator
Instance of the Calculator class for calculating descriptors and fitting data.
- Type
class
Calculator
- solver
Instance of the Solver class for performing a fit.
- Type
class
Solver
- perform_fit()
Solve the machine learning problem with descriptors as input and energies/forces/etc as targets
- process_configs(data: Optional[list] = None, allgather: bool = False, delete_data: bool = False)
Calculate descriptors for all configurations in the
data
list and stores info in the shared arrays.- Parameters
data – Optional list of data dictionaries to calculate descriptors for. If not supplied, we use the list owned by this instance.
allgather – Whether to gather distributed lists to all processes to just to head proc. In some cases, such as processing configs once and then using that data on multiple procs, we must allgather.
delete_data – Whether the data list is deleted or not after processing.Since data can retain unwanted memory after processing configs, we delete it in executable mode.
- scrape_configs(delete_scraper: bool = False)
Scrapes configurations of atoms and creates an instance attribute list of configurations called data.
- Parameters
delete_scraper – Boolean determining whether the scraper object is deleted or not after scraping. Defaults to False. Since scraper can retain unwanted memory, we delete it in executable mode.
FitSnap
contains instances of two classes that help with MPI communication and
settings; ParallelTools
and Config
, respectively. These two classes are
explained below. In addition to these classes, there are other core classes
Scraper
, Calculator
, and Solver
which are explained in later
sections.
3.1.1. Parallel Tools
Parallel Tools are a collection of data structures and functions for transferring data in FitSNAP workflow in a massively parallel fashion.
The ParallelTools
instance pt
of a FitSNAP instance fs
can be used
to create shared memory arrays, for example:
# Create a shared array called `a`.
fs.pt.create_shared_array('a', nrows, ncols)
# Change the shared array at a rank-dependent element.
fs.pt.shared_arrays['a'].array[rank,0] = rank
# Observe that the change happened on all procs.
print(f"Shared array on rank {rank}: {fs.pt.shared_arrays['a'].array}")
Currently these tools reside in a single file parallel_tools.py
which houses some
classes described below.
- class fitsnap3lib.parallel_tools.DistributedList(proc_length)
This class is used for distributed memory Python lists. The class to wraps Python’s list to ensure size stays the same allowing collection at end. This class is normally used like, for example:
pt.add_2_fitsnap("Groups", DistributedList(nconfigs))
- Parameters
proc_length (int) – Number of elements for the list on current process.
- _len
length of distributed list held by current proc
- Type
int
- _list
local section of distributed list
- Type
list
- get_list()
Returns deepcopy of internal list
- exception fitsnap3lib.parallel_tools.GracefulError(*args, **kwargs)
- class fitsnap3lib.parallel_tools.ParallelTools(comm=None)
This class creates and contains arrays used for fitting, across multiple processors.
- check_fitsnap_exist
Checks whether fitsnap dictionaries exist before creating a new one, set to False to allow recreating a dictionary.
- Type
bool
- add_2_fitsnap(name, an_object)
Add an object, such as a DistributedList, to the pt.fitsnap_dict dictionary. This dictionary contains configuration information such as group name, filename, testing bools, etc. This function is normally used in conjunction with the DistributedList class, where a distributed memory list is added with a keyname name.
- Parameters
name (str) – Key name of the object being added.
an_object – A Python object to add, usually an instance of our DistributedList class.
Create a shared memory array as a key in the
pt.shared_array
dictionary. This function uses theSharedArray
class to instantiate a shared memory array in the supplied dictionary keyname
.If the key name already exists, this function will free the memory associated with the existing array.
If not using MPI, i.e.
stubs == 0
, we create aStubsArray
.- Parameters
name (str) – Name of the array which will be the key name.
size1 (int) – First dimension size.
size2 (int) – Optional second dimension size, defaults to 1.
dtype (str) – Optional data type character, defaults to d for double.
- exception(err)
Gracefully exit with an exception.
- Parameters
err (str) – Error message to exit with.
- free()
Free memory associated with all shared arrays.
- gather_fitsnap(name, allgather: Optional[bool] = None)
Gather distributed lists. :param allgather: If true then we allgather. When number of procs is large this will use more memory.
- get_ncpn(nconfigs)
Get number of configs per node; return nconfigs if stubs.
- Parameters
nconfigs – integer number of configurations on this process, typically length of list of data dictionaries.
Returns number of configs per node, reduced across procs, or just nconfigs if stubs.
- new_slice_a()
Create array to show which sub a matrix indices belong to which proc. For linear solvers, the A matrix may be composed of either summed per-atom descriptors OR per-atom descriptors. For nonlinear solvers, the A matrix is composed of per-atom quantities like bispectrum components, etc.
- new_slice_b()
Create array to show which sub b matrix indices belong to which proc.
- new_slice_c()
Create array to show which sub c matrix indices belong to which proc.
- new_slice_dgrad()
Create array to show which sub dgrad matrix indices belong to which proc.
- new_slice_neighlist()
Create array to show which sub neighlist matrix indices belong to which proc.
- new_slice_t()
Create array to show which sub types matrix indices belong to which proc.
- slice_array(name)
Slices an array using Python’s native slice function. Creates an attribute pt.shared_arrays[name].sliced_array containing the sliced array.
Instantiating this class will create a shared memory array in the
array
attribute.- Parameters
size1 (int) – First dimension of the array.
size2 (int) – Optional second dimension of the array, defaults to 1.
dtype (str) – Optional data type, defaults to d for double.
multinode (int) – Optional multinode flag used for scalapack purposes.
comms (MPI.Comm) – MPI communicator.
Array of numbers that share memory across processes in the communicator.
- Type
np.ndarray
- class fitsnap3lib.parallel_tools.StubsArray(size1, size2=1, dtype='d')
Instantiating this class will create a stubs array in the
array
attribute. In plain speak, this is just a normal numpy array.- Parameters
size1 (int) – First dimension of the array.
size2 (int) – Optional second dimension of the array, defaults to 1.
dtype (str) – Optional data type, defaults to d for double.
- array
Array of numbers that share memory across processes in the communicator.
- Type
np.ndarray
3.1.2. Config
The Config class is used for storing settings associated with a FitSNAP instance. Throughout the code and library examples, you may see code snippets like:
fs.config.sections["GROUPS"].group_table
where fs
is the FitSNAP instance being accessed. In this snippet, the sections
attribute contains keys, such as "GROUPS"
, which contains attributes like the group
table which we can access. In this regard, fs.config
stores all the settings relevant
to a particular FitSNAP instance or fit, which can then be easily accessed anywhere else
throughout the code.
- class fitsnap3lib.io.input.Config(pt, input=None, arguments_lst: list = [])
Class for storing input settings in a config instance. The config instance is first created in io/output.py. If given a path to an input script, we use Python’s native ConfigParser to parse the settings. If given a nested dictionary, the sections are determined from the first keys and specific settings from the nested keys.
- Parameters
pt – A ParallelTools instance.
input – Optional input can either be a filename or a dictionary.
arguments_lst – List of args that can be supplied at the command line.
- infile
String for optional input filename. Defaults to None.
- indict
Dictionary for optional input dictionary of settings, to replace input file. Defaults to None.
- parse_cmdline(arguments_lst: list = [])
Parse command line args if using executable mode, or a list if using library mode.