data_handler

DataHandler class that loads and scales data.

class DataHandler(parameters: Parameters, target_calculator=None, descriptor_calculator=None, input_data_scaler=None, output_data_scaler=None, clear_data=True)[source]

Bases: DataHandlerBase

Loads and scales data. Can only process numpy arrays at the moment.

Data that is not in a numpy array can be converted using the DataConverter class.

Parameters:
clear_data()[source]

Reset the entire data pipeline.

Useful when doing multiple investigations in the same python file.

get_snapshot_calculation_output(snapshot_number)[source]

Get the path to the output file for a specific snapshot.

Parameters:

snapshot_number (int) – Snapshot for which the calculation output should be returned.

Returns:

calculation_output – Path to the calculation output for this snapshot.

Return type:

string

get_test_input_gradient(snapshot_number)[source]

Get the gradient of the test inputs for an entire snapshot.

This gradient will be returned as scaled Tensor. The reason the gradient is returned (rather then returning the entire inputs themselves) is that by slicing a variable, pytorch no longer considers it a “leaf” variable and will stop tracking and evaluating its gradient. Thus, it is easier to obtain the gradient and then slice it.

Parameters:

snapshot_number (int) – Number of the snapshot for which the entire test inputs.

Returns:

Tensor holding the gradient.

Return type:

torch.Tensor

mix_datasets()[source]

For lazily-loaded data sets, the snapshot ordering is (re-)mixed.

This applies only to the training data set. For the validation and test set it does not matter.

prepare_data(reparametrize_scaler=True)[source]

Prepare the data to be used in a training process.

This includes:

  • Checking snapshots for consistency

  • Parametrizing the DataScalers (if desired)

  • Building DataSet objects.

Parameters:

reparametrize_scaler (bool) – If True (default), the DataScalers are parametrized based on the training data.

prepare_for_testing()[source]

Prepare DataHandler for usage within Tester class.

Ensures that lazily-loaded data sets do not perform unnecessary I/O operations. Only needed in Tester class.

raw_numpy_to_converted_scaled_tensor(numpy_array, data_type, units, convert3Dto1D=False)[source]

Transform a raw numpy array into a scaled torch tensor.

This tensor will also be in the right units, i.e. a tensor that can simply be put into a MALA network.

Parameters:
  • numpy_array (np.array) – Array that is to be converted.

  • data_type (string) – Either “in” or “out”, depending if input or output data is processed.

  • units (string) – Units of the data that is processed.

  • convert3Dto1D (bool) – If True (default: False), then a (x,y,z,dim) array is transformed into a (x*y*z,dim) array.

Returns:

converted_tensor – The fully converted and scaled tensor.

Return type:

torch.Tensor

resize_snapshots_for_debugging(directory='./', naming_scheme_input='test_Al_debug_2k_nr*.in', naming_scheme_output='test_Al_debug_2k_nr*.out')[source]

Resize all snapshots in the list.

Parameters:
  • directory (string) – Directory to which the resized snapshots should be saved.

  • naming_scheme_input (string) – Naming scheme for the resulting input numpy files.

  • naming_scheme_output (string) – Naming scheme for the resulting output numpy files.