lazy_load_dataset
DataSet for lazy-loading.
- class LazyLoadDataset(*args: Any, **kwargs: Any)[source]
Bases:
Dataset
DataSet class for lazy loading.
Only loads snapshots in the memory that are currently being processed. Uses a “caching” approach of keeping the last used snapshot in memory, until values from a new ones are used. Therefore, shuffling at DataSampler / DataLoader level is discouraged to the point that it was disabled. Instead, we mix the snapshot load order here ot have some sort of mixing at all.
- Parameters:
input_dimension (int) – Dimension of an input vector.
output_dimension (int) – Dimension of an output vector.
input_data_scaler (mala.datahandling.data_scaler.DataScaler) – Used to scale the input data.
output_data_scaler (mala.datahandling.data_scaler.DataScaler) – Used to scale the output data.
descriptor_calculator (mala.descriptors.descriptor.Descriptor) – Used to do unit conversion on input data.
target_calculator (mala.targets.target.Target or derivative) – Used to do unit conversion on output data.
use_ddp (bool) – If true, it is assumed that ddp is used.
input_requires_grad (bool) – If True, then the gradient is stored for the inputs.
- currently_loaded_file
Index of currently loaded file.
- Type:
int
- input_data
Input data tensor.
- Type:
torch.Tensor
- output_data
Output data tensor.
- Type:
torch.Tensor
- add_snapshot_to_dataset(snapshot: Snapshot)[source]
Add a snapshot to a DataSet.
Afterwards, the DataSet can and will load this snapshot as needed.
- Parameters:
snapshot (mala.datahandling.snapshot.Snapshot) – Snapshot that is to be added to this DataSet.
- get_new_data(file_index)[source]
Read a new snapshot into RAM.
- Parameters:
file_index (i) – File to be read.
- mix_datasets()[source]
Mix the order of the snapshots.
With this, there can be some variance between runs.
- property return_outputs_directly
Control whether outputs are actually transformed.
Has to be False for training. In the testing case, Numerical errors are smaller if set to True.