data_handler_base
Base class for all data handling (loading, shuffling, etc.).
- class DataHandlerBase(parameters: Parameters, target_calculator=None, descriptor_calculator=None)[source]
Bases:
ABC
Base class for all data handling (loading, shuffling, etc.).
- Parameters:
parameters (mala.common.parameters.Parameters) – Parameters used to create the data handling object.
descriptor_calculator (mala.descriptors.descriptor.Descriptor) – Used to do unit conversion on input data. If None, then one will be created by this class.
target_calculator (mala.targets.target.Target) – Used to do unit conversion on output data. If None, then one will be created by this class.
- descriptor_calculator
Used to do unit conversion on input data.
- nr_snapshots
Number of snapshots loaded.
- Type:
int
- parameters
MALA data handling parameters.
- target_calculator
Used to do unit conversion on output data.
- add_snapshot(input_file, input_directory, output_file, output_directory, add_snapshot_as, output_units='1/(eV*A^3)', input_units='None', calculation_output_file='', snapshot_type=None)[source]
Add a snapshot to the data pipeline.
- Parameters:
input_file (string) – File with saved numpy input array.
input_directory (string) – Directory containing input_npy_directory.
output_file (string) – File with saved numpy output array.
output_directory (string) – Directory containing output_npy_file.
input_units (string) – Units of input data. See descriptor classes to see which units are supported.
output_units (string) – Units of output data. See target classes to see which units are supported.
calculation_output_file (string) – File with the output of the original snapshot calculation. This is only needed when testing multiple snapshots.
add_snapshot_as (string) – Must be “tr”, “va” or “te”, the snapshot will be added to the snapshot list as training, validation or testing snapshot, respectively.
snapshot_type (string) – Either “numpy” or “openpmd” based on what kind of files you want to operate on.
- clear_data()[source]
Reset the entire data pipeline.
Useful when doing multiple investigations in the same python file.
- delete_temporary_inputs()[source]
Delete temporary data files.
These may have been created during a training or testing process when using atomic positions for on-the-fly calculation of descriptors rather than precomputed data files.
- property input_dimension
Feature dimension of input data.
- property output_dimension
Feature dimension of output data.