data_converter

DataConverter class for converting snapshots into numpy arrays.

class DataConverter(parameters, descriptor_calculator=None, target_calculator=None)[source]

Bases: object

Converts raw snapshots (direct DFT outputs) into numpy arrays for MALA.

These snapshots can be e.g. Quantum Espresso results.

Parameters:

parameters (mala.common.parameters.Parameters) – The parameters object used for creating this instance.
descriptor_calculator (mala.descriptors.descriptor.Descriptor) – The descriptor calculator used for parsing/converting fingerprint data. If None, the descriptor calculator will be created by this object using the parameters provided. Default: None
target_calculator (mala.targets.target.Target) – Target calculator used for parsing/converting target data. If None, the target calculator will be created by this object using the parameters provided. Default: None

descriptor_calculator

Descriptor calculator used for parsing/converting fingerprint data.

Type:: mala.descriptors.descriptor.Descriptor

target_calculator

Target calculator used for parsing/converting target data.

Type:: mala.targets.target.Target

parameters

MALA data handling parameters object.

Type:: mala.common.parameters.ParametersData

parameters_full

MALA parameters object. The full object is necessary for some data handling tasks.

Type:: mala.common.parameters.Parameters

add_snapshot(descriptor_input_type=None, descriptor_input_path=None, target_input_type=None, target_input_path=None, simulation_output_type=None, simulation_output_path=None, descriptor_units=None, metadata_input_type=None, metadata_input_path=None, target_units=None)[source]

Add a snapshot to be processed.

Parameters:

descriptor_input_type (string) – Type of descriptor data to be processed. See mala.datahandling.data_converter.descriptor_input_types for options.
descriptor_input_path (string) – Path of descriptor data to be processed.
target_input_type (string) – Type of target data to be processed. See mala.datahandling.data_converter.target_input_types for options.
target_input_path (string or list) – Path of target data to be processed.
simulation_output_type (string) – Type of additional info data to be processed. See mala.datahandling.data_converter.simulation_output_types for options.
simulation_output_path (string) – Path of additional info data to be processed.
metadata_input_type (string) – Type of additional metadata to be processed. See mala.datahandling.data_converter.simulation_output_types for options. This is essentially the same as simulation_output_type, but will not affect saving; i.e., the data given here will only be saved in OpenPMD files, not saved separately. If simulation_output_type is set, this argument will be ignored.
metadata_input_path (string) – Path of additional metadata to be processed. See metadata_input_type for extended info on use.
descriptor_units (string) – Units for descriptor data processing.
target_units (string) – Units for target data processing.

convert_snapshots(complete_save_path=None, descriptor_save_path=None, target_save_path=None, simulation_output_save_path=None, naming_scheme='ELEM_snapshot*.npy', starts_at=0, file_based_communication=False, descriptor_calculation_kwargs=None, target_calculator_kwargs=None, use_fp64=False)[source]

Convert the snapshots in the list to numpy arrays.

These can then be used by MALA.

Parameters:

complete_save_path (string) – If not None: the directory in which all snapshots will be saved. Overwrites descriptor_save_path, target_save_path and simulation_output_save_path if set.
descriptor_save_path (string) – Directory in which to save descriptor data.
target_save_path (string) – Directory in which to save target data.
simulation_output_save_path (string) – Directory in which to save additional info data.
naming_scheme (string) – String detailing the naming scheme for the snapshots. * symbols will be replaced with the snapshot number.
starts_at (int) – Number of the first snapshot generated using this approach. Default is 0, but may be set to any integer. This is to ensure consistency in naming when converting e.g. only a certain portion of all available snapshots. If set to e.g. 4, the first snapshot generated will be called snapshot4.
file_based_communication (bool) – If True, the LDOS will be gathered using a file based mechanism. This is drastically less performant then using MPI, but may be necessary when memory is scarce. Default is False, i.e., the faster MPI version will be used.
target_calculator_kwargs (dict) – Dictionary with additional keyword arguments for the calculation or parsing of the target quantities.
descriptor_calculation_kwargs (dict) – Dictionary with additional keyword arguments for the calculation or parsing of the descriptor quantities.
use_fp64 (bool) – If True, data is saved with double precision. If False (default), single precision (FP32) is used. This is advantageous, since internally, torch models are constructed with FP32 anyway.