3.1. FitSnap

The FitSnap class houses all objects needed for performing a fit. These objects are instantiated from other core classes described in the rest of the docs. The two main inputs to a FitSnap instance are (1) settings and (2) an MPI communicator. The settings can be a nested dictionary as shown in some examples, while the MPI communicator is typically the world communicator containing all resources dictated by the mpirun command. After instantiating FitSnap with these inputs, the instance will contain its own instance of ParallelTools which houses functions and data structures that operate on the resources in the input communicator. The settings will be stored in an instance of the Config class. The FitSnap class is documented below.

class fitsnap3lib.fitsnap.FitSnap(input=None, comm=None, arglist: list = [])

This classes houses the objects needed for machine learning a potential, start to finish.

Parameters
  • input (str) – Optional dictionary or path to input file when using library mode; defaults to None for executable use.

  • comm – Optional MPI communicator when using library mode; defaults to None.

  • arglist (list) – Optional list of cmd line args when using library mode.

pt

Instance of the ParallelTools class for helping MPI communication and shared arrays.

Type

class ParallelTools

config

Instance of the Config class for initializing settings, initialized with a ParallelTools instance.

Type

class Config

scraper

Instance of the Scraper class for gathering configs.

Type

class Scraper

data

List of dictionaries, where each configuration of atoms has its own dictionary.

Type

list

calculator

Instance of the Calculator class for calculating descriptors and fitting data.

Type

class Calculator

solver

Instance of the Solver class for performing a fit.

Type

class Solver

perform_fit()

Solve the machine learning problem with descriptors as input and energies/forces/etc as targets

process_configs(data: Optional[list] = None, allgather: bool = False, delete_data: bool = False)

Calculate descriptors for all configurations in the data list and stores info in the shared arrays.

Parameters
  • data – Optional list of data dictionaries to calculate descriptors for. If not supplied, we use the list owned by this instance.

  • allgather – Whether to gather distributed lists to all processes to just to head proc. In some cases, such as processing configs once and then using that data on multiple procs, we must allgather.

  • delete_data – Whether the data list is deleted or not after processing.Since data can retain unwanted memory after processing configs, we delete it in executable mode.

scrape_configs(delete_scraper: bool = False)

Scrapes configurations of atoms and creates an instance attribute list of configurations called data.

Parameters

delete_scraper – Boolean determining whether the scraper object is deleted or not after scraping. Defaults to False. Since scraper can retain unwanted memory, we delete it in executable mode.

FitSnap contains instances of two classes that help with MPI communication and settings; ParallelTools and Config, respectively. These two classes are explained below. In addition to these classes, there are other core classes Scraper, Calculator, and Solver which are explained in later sections.

3.1.1. Parallel Tools

Parallel Tools are a collection of data structures and functions for transferring data in FitSNAP workflow in a massively parallel fashion.

The ParallelTools instance pt of a FitSNAP instance fs can be used to create shared memory arrays, for example:

# Create a shared array called `a`.
fs.pt.create_shared_array('a', nrows, ncols)
# Change the shared array at a rank-dependent element.
fs.pt.shared_arrays['a'].array[rank,0] = rank
# Observe that the change happened on all procs.
print(f"Shared array on rank {rank}: {fs.pt.shared_arrays['a'].array}")

Currently these tools reside in a single file parallel_tools.py which houses some classes described below.

class fitsnap3lib.parallel_tools.DistributedList(proc_length)

This class is used for distributed memory Python lists. The class to wraps Python’s list to ensure size stays the same allowing collection at end. This class is normally used like, for example: pt.add_2_fitsnap("Groups", DistributedList(nconfigs))

Parameters

proc_length (int) – Number of elements for the list on current process.

_len

length of distributed list held by current proc

Type

int

_list

local section of distributed list

Type

list

get_list()

Returns deepcopy of internal list

exception fitsnap3lib.parallel_tools.GracefulError(*args, **kwargs)
class fitsnap3lib.parallel_tools.ParallelTools(comm=None)

This class creates and contains arrays used for fitting, across multiple processors.

check_fitsnap_exist

Checks whether fitsnap dictionaries exist before creating a new one, set to False to allow recreating a dictionary.

Type

bool

add_2_fitsnap(name, an_object)

Add an object, such as a DistributedList, to the pt.fitsnap_dict dictionary. This dictionary contains configuration information such as group name, filename, testing bools, etc. This function is normally used in conjunction with the DistributedList class, where a distributed memory list is added with a keyname name.

Parameters
  • name (str) – Key name of the object being added.

  • an_object – A Python object to add, usually an instance of our DistributedList class.

create_shared_array(name, size1, size2=1, dtype='d', tm=0)

Create a shared memory array as a key in the pt.shared_array dictionary. This function uses the SharedArray class to instantiate a shared memory array in the supplied dictionary key name.

If the key name already exists, this function will free the memory associated with the existing array.

If not using MPI, i.e. stubs == 0, we create a StubsArray.

Parameters
  • name (str) – Name of the array which will be the key name.

  • size1 (int) – First dimension size.

  • size2 (int) – Optional second dimension size, defaults to 1.

  • dtype (str) – Optional data type character, defaults to d for double.

exception(err)

Gracefully exit with an exception.

Parameters

err (str) – Error message to exit with.

free()

Free memory associated with all shared arrays.

gather_fitsnap(name, allgather: Optional[bool] = None)

Gather distributed lists. :param allgather: If true then we allgather. When number of procs is large this will use more memory.

get_ncpn(nconfigs)

Get number of configs per node; return nconfigs if stubs.

Parameters

nconfigs – integer number of configurations on this process, typically length of list of data dictionaries.

Returns number of configs per node, reduced across procs, or just nconfigs if stubs.

new_slice_a()

Create array to show which sub a matrix indices belong to which proc. For linear solvers, the A matrix may be composed of either summed per-atom descriptors OR per-atom descriptors. For nonlinear solvers, the A matrix is composed of per-atom quantities like bispectrum components, etc.

new_slice_b()

Create array to show which sub b matrix indices belong to which proc.

new_slice_c()

Create array to show which sub c matrix indices belong to which proc.

new_slice_dgrad()

Create array to show which sub dgrad matrix indices belong to which proc.

new_slice_neighlist()

Create array to show which sub neighlist matrix indices belong to which proc.

new_slice_t()

Create array to show which sub types matrix indices belong to which proc.

slice_array(name)

Slices an array using Python’s native slice function. Creates an attribute pt.shared_arrays[name].sliced_array containing the sliced array.

class fitsnap3lib.parallel_tools.SharedArray(size1, size2=1, dtype='d', multinode=0, comms=None, MPI=None)

Instantiating this class will create a shared memory array in the array attribute.

Parameters
  • size1 (int) – First dimension of the array.

  • size2 (int) – Optional second dimension of the array, defaults to 1.

  • dtype (str) – Optional data type, defaults to d for double.

  • multinode (int) – Optional multinode flag used for scalapack purposes.

  • comms (MPI.Comm) – MPI communicator.

array

Array of numbers that share memory across processes in the communicator.

Type

np.ndarray

class fitsnap3lib.parallel_tools.StubsArray(size1, size2=1, dtype='d')

Instantiating this class will create a stubs array in the array attribute. In plain speak, this is just a normal numpy array.

Parameters
  • size1 (int) – First dimension of the array.

  • size2 (int) – Optional second dimension of the array, defaults to 1.

  • dtype (str) – Optional data type, defaults to d for double.

array

Array of numbers that share memory across processes in the communicator.

Type

np.ndarray

3.1.2. Config

The Config class is used for storing settings associated with a FitSNAP instance. Throughout the code and library examples, you may see code snippets like:

fs.config.sections["GROUPS"].group_table

where fs is the FitSNAP instance being accessed. In this snippet, the sections attribute contains keys, such as "GROUPS", which contains attributes like the group table which we can access. In this regard, fs.config stores all the settings relevant to a particular FitSNAP instance or fit, which can then be easily accessed anywhere else throughout the code.

class fitsnap3lib.io.input.Config(pt, input=None, arguments_lst: list = [])

Class for storing input settings in a config instance. The config instance is first created in io/output.py. If given a path to an input script, we use Python’s native ConfigParser to parse the settings. If given a nested dictionary, the sections are determined from the first keys and specific settings from the nested keys.

Parameters
  • pt – A ParallelTools instance.

  • input – Optional input can either be a filename or a dictionary.

  • arguments_lst – List of args that can be supplied at the command line.

infile

String for optional input filename. Defaults to None.

indict

Dictionary for optional input dictionary of settings, to replace input file. Defaults to None.

parse_cmdline(arguments_lst: list = [])

Parse command line args if using executable mode, or a list if using library mode.