descriptor

Base class for all descriptor calculators.

class Descriptor(params: Parameters | None = None)[source]

Bases: PhysicalData

Base class for all descriptors available in MALA.

Descriptors encode the atomic fingerprint of a DFT calculation.

Parameters:: parameters (mala.common.parameters.Parameters) – Parameters object used to create this object.

parameters

MALA descriptor calculation parameters.

Type:: mala.common.parameters.ParametersDescriptors

static backconvert_units(array, out_units)[source]

Convert descriptors from MALA units into a specified unit.

Parameters:

array (numpy.ndarray) – Data in MALA units.
out_units (string) – Desired units of output array.

Returns:

converted_array – Data in out_units.

Return type:

numpy.ndarray

calculate_from_atoms(atoms, grid_dimensions, working_directory='.', **kwargs)[source]

Calculate the bispectrum descriptors based on atomic configurations.

Parameters:

atoms (ase.Atoms) – Atoms object holding the atomic configuration.
grid_dimensions (List) – Grid dimensions to be used, in the format [x,y,z].
working_directory (string) – A directory in which to write the output of the LAMMPS calculation. Usually the local directory should suffice, given that there are no multiple instances running in the same directory.
kwargs (dict) –
A collection of keyword arguments, that are mainly used for debugging and development. Different types of descriptors may support different keyword arguments. Commonly supported are
- ”use_fp64”: To use enforce floating point 64 precision for descriptors.
- ”keep_logs”: To not delete temporary files created during LAMMPS calculation of descriptors.

Returns:

descriptors – Numpy array containing the descriptors with the dimension (x,y,z,descriptor_dimension)

Return type:

numpy.ndarray

calculate_from_json(json_file, working_directory='.', **kwargs)[source]

Calculate the descriptors based on a MALA generated json file.

These json files are generated by the MALA DataConverter class and bundle information about a DFT simulation.

Parameters:

json_file (string) – Name of MALA json output file for snapshot.
working_directory (string) – A directory in which to write the output of the LAMMPS calculation. Usually the local directory should suffice, given that there are no multiple instances running in the same directory.
kwargs (dict) –
A collection of keyword arguments, that are mainly used for debugging and development. Different types of descriptors may support different keyword arguments. Commonly supported are
- ”use_fp64”: To use enforce floating point 64 precision for descriptors.
- ”keep_logs”: To not delete temporary files created during LAMMPS calculation of descriptors.

Returns:

descriptors – Numpy array containing the descriptors with the dimension (x,y,z,descriptor_dimension)

Return type:

numpy.ndarray

calculate_from_qe_out(qe_out_file, working_directory='.', **kwargs)[source]

Calculate the descriptors based on a Quantum Espresso outfile.

Parameters:

qe_out_file (string) – Name of Quantum Espresso output file for snapshot.
working_directory (string) – A directory in which to write the output of the LAMMPS calculation. Usually the local directory should suffice, given that there are no multiple instances running in the same directory.
kwargs (dict) –
A collection of keyword arguments, that are mainly used for debugging and development. Different types of descriptors may support different keyword arguments. Commonly supported are
- ”use_fp64”: To use enforce floating point 64 precision for descriptors.
- ”keep_logs”: To not delete temporary files created during LAMMPS calculation of descriptors.

Returns:

descriptors – Numpy array containing the descriptors with the dimension (x,y,z,descriptor_dimension)

Return type:

numpy.ndarray

convert_local_to_3d(descriptors_np)[source]

Convert the desciptors as done in the gather function, but per rank.

This is useful for e.g. parallel preprocessing. This function removes the extra 3 components that come from parallel processing. I.e. if we have 91 bispectrum descriptors, LAMMPS directly outputs us 97 (in parallel mode), and this function returns 94, as to retain the 3 x,y,z ones we by default include.

Parameters:: descriptors_np (numpy.ndarray) – Numpy array with the descriptors of this ranks local grid.

static convert_units(array, in_units='1/eV')[source]

Convert descriptors from a specified unit into the ones used in MALA.

Parameters:

array (numpy.ndarray) – Data for which the units should be converted.
in_units (string) – Units of array.

Returns:

converted_array – Data in MALA units.

Return type:

numpy.ndarray

static enforce_pbc(atoms)[source]

Explictly enforces the PBC on an ASE atoms object.

QE (and potentially other codes?) do that internally. Meaning that the raw positions of atoms (in Angstrom) can lie outside of the unit cell. When setting up the DFT calculation, these atoms get shifted into the unit cell. Since we directly use these raw positions for the descriptor calculation, we need to enforce that in the ASE atoms objects, the atoms are explicitly in the unit cell.

Parameters:: atoms (ase.atoms) – The ASE atoms object for which the PBC need to be enforced.
Returns:: new_atoms – The ASE atoms object for which the PBC have been enforced.
Return type:: ase.Atoms

gather_descriptors(descriptors_np, use_pickled_comm=False)[source]

Gathers all descriptors on rank 0 and sorts them.

This is useful for e.g. parallel preprocessing. This function removes the extra 3 components that come from parallel processing. I.e. if we have 91 bispectrum descriptors, LAMMPS directly outputs us 97 (in parallel mode), and this function returns 94, as to retain the 3 x,y,z ones we by default include.

Parameters:

descriptors_np (numpy.ndarray) – Numpy array with the descriptors of this ranks local grid.
use_pickled_comm (bool) – If True, the pickled communication route from mpi4py is used. If False, a Recv/Sendv combination is used. I am not entirely sure what is faster. Technically Recv/Sendv should be faster, but I doubt my implementation is all that optimal. For the pickled route we can use gather(), which should be fairly quick. However, for large grids, one CANNOT use the pickled route; too large python objects will break it. Therefore, I am setting the Recv/Sendv route as default.

read_dimensions_from_json(json_file)[source]

Read only the descriptor dimensions from a json file.

These json files are generated by the MALA DataConverter class and bundle information about a DFT simulation.

Parameters:: json_file (string) – Path to the numpy file.
Returns:: dimension_info – If read_dtype is False, then only a list containing the dimensions of the saved array is returned. If read_dtype is True, a tuple containing this list of dimensions and the dtype of the array will be returned.
Return type:: List or tuple

setup_lammps_tmp_files(lammps_type, outdir)[source]

Create the temporary lammps input and log files.

Parameters:

lammps_type (str) – Type of descriptor calculation (e.g. bgrid for bispectrum)
outdir (str) – Directory where lammps files are kept

Return type:

None

property descriptors_contain_xyz: Control whether descriptor vectors will contain xyz coordinates.

property feature_size: Get the feature dimension of this data.

property si_dimension

Dictionary containing the SI unit dimensions in OpenPMD format.

Needed for OpenPMD interface.

property si_unit_conversion

Numeric value of the conversion from MALA (ASE) units to SI.

Needed for OpenPMD interface.