data_handler_base

Base class for all data handling (loading, shuffling, etc.).

class DataHandlerBase(parameters: Parameters, target_calculator=None, descriptor_calculator=None)[source]

Bases: ABC

Base class for all data handling (loading, shuffling, etc.).

Parameters:

parameters (mala.common.parameters.Parameters) – Parameters used to create the data handling object.
descriptor_calculator (mala.descriptors.descriptor.Descriptor) – Used to do unit conversion on input data. If None, then one will be created by this class.
target_calculator (mala.targets.target.Target) – Used to do unit conversion on output data. If None, then one will be created by this class.

descriptor_calculator: Used to do unit conversion on input data.

nr_snapshots

Number of snapshots loaded.

Type:: int

parameters

MALA data handling parameters.

Type:: mala.common.parameters.ParametersData

target_calculator: Used to do unit conversion on output data.

add_snapshot(input_file, input_directory, output_file, output_directory, add_snapshot_as, output_units='1/(eV*A^3)', input_units='None', calculation_output_file='', snapshot_type=None)[source]

Add a snapshot to the data pipeline.

Parameters:

input_file (string) – File with saved numpy input array.
input_directory (string) – Directory containing input_npy_directory.
output_file (string) – File with saved numpy output array.
output_directory (string) – Directory containing output_npy_file.
input_units (string) – Units of input data. See descriptor classes to see which units are supported.
output_units (string) – Units of output data. See target classes to see which units are supported.
calculation_output_file (string) – File with the output of the original snapshot calculation. This is only needed when testing multiple snapshots.
add_snapshot_as (string) – Must be “tr”, “va” or “te”, the snapshot will be added to the snapshot list as training, validation or testing snapshot, respectively.
snapshot_type (string) – Either “numpy” or “openpmd” based on what kind of files you want to operate on.

clear_data()[source]

Reset the entire data pipeline.

Useful when doing multiple investigations in the same python file.

delete_temporary_inputs()[source]

Delete temporary data files.

These may have been created during a training or testing process when using atomic positions for on-the-fly calculation of descriptors rather than precomputed data files.

property input_dimension: Feature dimension of input data.

property output_dimension: Feature dimension of output data.