data_handler_base

Base class for all data handling (loading, shuffling, etc.).

class DataHandlerBase(parameters: Parameters, target_calculator=None, descriptor_calculator=None)[source]

Bases: ABC

Base class for all data handling (loading, shuffling, etc.).

Parameters:
descriptor_calculator

Used to do unit conversion on input data.

nr_snapshots

Number of snapshots loaded.

Type:

int

parameters

MALA data handling parameters.

Type:

mala.common.parameters.ParametersData

target_calculator

Used to do unit conversion on output data.

add_snapshot(input_file, input_directory, output_file, output_directory, add_snapshot_as, output_units='1/(eV*A^3)', input_units='None', calculation_output_file='', snapshot_type='numpy')[source]

Add a snapshot to the data pipeline.

Parameters:
  • input_file (string) – File with saved numpy input array.

  • input_directory (string) – Directory containing input_npy_directory.

  • output_file (string) – File with saved numpy output array.

  • output_directory (string) – Directory containing output_npy_file.

  • input_units (string) – Units of input data. See descriptor classes to see which units are supported.

  • output_units (string) – Units of output data. See target classes to see which units are supported.

  • calculation_output_file (string) – File with the output of the original snapshot calculation. This is only needed when testing multiple snapshots.

  • add_snapshot_as (string) – Must be “tr”, “va” or “te”, the snapshot will be added to the snapshot list as training, validation or testing snapshot, respectively.

  • snapshot_type (string) – Either “numpy” or “openpmd” based on what kind of files you want to operate on.

clear_data()[source]

Reset the entire data pipeline.

Useful when doing multiple investigations in the same python file.

property input_dimension

Feature dimension of input data.

property output_dimension

Feature dimension of output data.