lazy_load_dataset

DataSet for lazy-loading.

class LazyLoadDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

DataSet class for lazy loading.

Only loads snapshots in the memory that are currently being processed. Uses a “caching” approach of keeping the last used snapshot in memory, until values from a new ones are used. Therefore, shuffling at DataSampler / DataLoader level is discouraged to the point that it was disabled. Instead, we mix the snapshot load order here ot have some sort of mixing at all.

Parameters:

input_dimension (int) – Dimension of an input vector.
output_dimension (int) – Dimension of an output vector.
input_data_scaler (mala.datahandling.data_scaler.DataScaler) – Used to scale the input data.
output_data_scaler (mala.datahandling.data_scaler.DataScaler) – Used to scale the output data.
descriptor_calculator (mala.descriptors.descriptor.Descriptor) – Used to do unit conversion on input data.
target_calculator (mala.targets.target.Target or derivative) – Used to do unit conversion on output data.
use_ddp (bool) – If true, it is assumed that ddp is used.
input_requires_grad (bool) – If True, then the gradient is stored for the inputs.

currently_loaded_file

Index of currently loaded file.

Type:: int

input_data

Input data tensor.

Type:: torch.Tensor

output_data

Output data tensor.

Type:: torch.Tensor

add_snapshot_to_dataset(snapshot: Snapshot)[source]

Add a snapshot to a DataSet.

Afterwards, the DataSet can and will load this snapshot as needed.

Parameters:: snapshot (mala.datahandling.snapshot.Snapshot) – Snapshot that is to be added to this DataSet.

get_new_data(file_index)[source]

Read a new snapshot into RAM.

Parameters:: file_index (i) – File to be read.

mix_datasets()[source]

Mix the order of the snapshots.

With this, there can be some variance between runs.

property return_outputs_directly

Control whether outputs are actually transformed.

Has to be False for training. In the testing case, Numerical errors are smaller if set to True.