data_handler
DataHandler class that loads and scales data.
- class DataHandler(parameters: Parameters, target_calculator=None, descriptor_calculator=None, input_data_scaler=None, output_data_scaler=None, clear_data=True)[source]
Bases:
DataHandlerBase
Loads and scales data. Can only process numpy arrays at the moment.
Data that is not in a numpy array can be converted using the DataConverter class.
- Parameters:
parameters (mala.common.parameters.Parameters) – Parameters used to create the data handling object.
descriptor_calculator (mala.descriptors.descriptor.Descriptor) – Used to do unit conversion on input data. If None, then one will be created by this class.
target_calculator (mala.targets.target.Target) – Used to do unit conversion on output data. If None, then one will be created by this class.
input_data_scaler (mala.datahandling.data_scaler.DataScaler) – Used to scale the input data. If None, then one will be created by this class.
output_data_scaler (mala.datahandling.data_scaler.DataScaler) – Used to scale the output data. If None, then one will be created by this class.
clear_data (bool) – If true (default), the data list will be cleared upon creation of the object.
- clear_data()[source]
Reset the entire data pipeline.
Useful when doing multiple investigations in the same python file.
- get_snapshot_calculation_output(snapshot_number)[source]
Get the path to the output file for a specific snapshot.
- Parameters:
snapshot_number (int) – Snapshot for which the calculation output should be returned.
- Returns:
calculation_output – Path to the calculation output for this snapshot.
- Return type:
string
- get_test_input_gradient(snapshot_number)[source]
Get the gradient of the test inputs for an entire snapshot.
This gradient will be returned as scaled Tensor. The reason the gradient is returned (rather then returning the entire inputs themselves) is that by slicing a variable, pytorch no longer considers it a “leaf” variable and will stop tracking and evaluating its gradient. Thus, it is easier to obtain the gradient and then slice it.
- Parameters:
snapshot_number (int) – Number of the snapshot for which the entire test inputs.
- Returns:
Tensor holding the gradient.
- Return type:
torch.Tensor
- mix_datasets()[source]
For lazily-loaded data sets, the snapshot ordering is (re-)mixed.
This applies only to the training data set. For the validation and test set it does not matter.
- prepare_data(reparametrize_scaler=True)[source]
Prepare the data to be used in a training process.
This includes:
Checking snapshots for consistency
Parametrizing the DataScalers (if desired)
Building DataSet objects.
- Parameters:
reparametrize_scaler (bool) – If True (default), the DataScalers are parametrized based on the training data.
- prepare_for_testing()[source]
Prepare DataHandler for usage within Tester class.
Ensures that lazily-loaded data sets do not perform unnecessary I/O operations. Only needed in Tester class.
- raw_numpy_to_converted_scaled_tensor(numpy_array, data_type, units, convert3Dto1D=False)[source]
Transform a raw numpy array into a scaled torch tensor.
This tensor will also be in the right units, i.e. a tensor that can simply be put into a MALA network.
- Parameters:
numpy_array (np.array) – Array that is to be converted.
data_type (string) – Either “in” or “out”, depending if input or output data is processed.
units (string) – Units of the data that is processed.
convert3Dto1D (bool) – If True (default: False), then a (x,y,z,dim) array is transformed into a (x*y*z,dim) array.
- Returns:
converted_tensor – The fully converted and scaled tensor.
- Return type:
torch.Tensor
- resize_snapshots_for_debugging(directory='./', naming_scheme_input='test_Al_debug_2k_nr*.in', naming_scheme_output='test_Al_debug_2k_nr*.out')[source]
Resize all snapshots in the list.
- Parameters:
directory (string) – Directory to which the resized snapshots should be saved.
naming_scheme_input (string) – Naming scheme for the resulting input numpy files.
naming_scheme_output (string) – Naming scheme for the resulting output numpy files.