data_converter

DataConverter class for converting snapshots into numpy arrays.

class DataConverter(parameters, descriptor_calculator=None, target_calculator=None)[source]

Bases: object

Converts raw snapshots (direct DFT outputs) into numpy arrays for MALA.

These snapshots can be e.g. Quantum Espresso results.

Parameters:
  • parameters (mala.common.parameters.Parameters) – The parameters object used for creating this instance.

  • descriptor_calculator (mala.descriptors.descriptor.Descriptor) – The descriptor calculator used for parsing/converting fingerprint data. If None, the descriptor calculator will be created by this object using the parameters provided. Default: None

  • target_calculator (mala.targets.target.Target) – Target calculator used for parsing/converting target data. If None, the target calculator will be created by this object using the parameters provided. Default: None

descriptor_calculator

Descriptor calculator used for parsing/converting fingerprint data.

Type:

mala.descriptors.descriptor.Descriptor

target_calculator

Target calculator used for parsing/converting target data.

Type:

mala.targets.target.Target

parameters

MALA data handling parameters object.

Type:

mala.common.parameters.ParametersData

parameters_full

MALA parameters object. The full object is necessary for some data handling tasks.

Type:

mala.common.parameters.Parameters

add_snapshot(descriptor_input_type=None, descriptor_input_path=None, target_input_type=None, target_input_path=None, simulation_output_type=None, simulation_output_path=None, descriptor_units=None, metadata_input_type=None, metadata_input_path=None, target_units=None)[source]

Add a snapshot to be processed.

Parameters:
  • descriptor_input_type (string) – Type of descriptor data to be processed. See mala.datahandling.data_converter.descriptor_input_types for options.

  • descriptor_input_path (string) – Path of descriptor data to be processed.

  • target_input_type (string) – Type of target data to be processed. See mala.datahandling.data_converter.target_input_types for options.

  • target_input_path (string) – Path of target data to be processed.

  • simulation_output_type (string) – Type of additional info data to be processed. See mala.datahandling.data_converter.simulation_output_types for options.

  • simulation_output_path (string) – Path of additional info data to be processed.

  • metadata_input_type (string) – Type of additional metadata to be processed. See mala.datahandling.data_converter.simulation_output_types for options. This is essentially the same as simulation_output_type, but will not affect saving; i.e., the data given here will only be saved in OpenPMD files, not saved separately. If simulation_output_type is set, this argument will be ignored.

  • metadata_input_path (string) – Path of additional metadata to be processed. See metadata_input_type for extended info on use.

  • descriptor_units (string) – Units for descriptor data processing.

  • target_units (string) – Units for target data processing.

convert_snapshots(complete_save_path=None, descriptor_save_path=None, target_save_path=None, simulation_output_save_path=None, naming_scheme='ELEM_snapshot*.npy', starts_at=0, file_based_communication=False, descriptor_calculation_kwargs=None, target_calculator_kwargs=None, use_fp64=False)[source]

Convert the snapshots in the list to numpy arrays.

These can then be used by MALA.

Parameters:
  • complete_save_path (string) – If not None: the directory in which all snapshots will be saved. Overwrites descriptor_save_path, target_save_path and simulation_output_save_path if set.

  • descriptor_save_path (string) – Directory in which to save descriptor data.

  • target_save_path (string) – Directory in which to save target data.

  • simulation_output_save_path (string) – Directory in which to save additional info data.

  • naming_scheme (string) – String detailing the naming scheme for the snapshots. * symbols will be replaced with the snapshot number.

  • starts_at (int) – Number of the first snapshot generated using this approach. Default is 0, but may be set to any integer. This is to ensure consistency in naming when converting e.g. only a certain portion of all available snapshots. If set to e.g. 4, the first snapshot generated will be called snapshot4.

  • file_based_communication (bool) – If True, the LDOS will be gathered using a file based mechanism. This is drastically less performant then using MPI, but may be necessary when memory is scarce. Default is False, i.e., the faster MPI version will be used.

  • target_calculator_kwargs (dict) – Dictionary with additional keyword arguments for the calculation or parsing of the target quantities.

  • descriptor_calculation_kwargs (dict) – Dictionary with additional keyword arguments for the calculation or parsing of the descriptor quantities.

  • use_fp64 (bool) – If True, data is saved with double precision. If False (default), single precision (FP32) is used. This is advantageous, since internally, torch models are constructed with FP32 anyway.