lazy_load_dataset_single
DataSet for lazy-loading.
- class LazyLoadDatasetSingle(*args: Any, **kwargs: Any)[source]
Bases:
Dataset
DataSet class for lazy loading.
Only loads snapshots in the memory that are currently being processed. Uses a “caching” approach of keeping the last used snapshot in memory, until values from a new ones are used. Therefore, shuffling at DataSampler / DataLoader level is discouraged to the point that it was disabled. Instead, we mix the snapshot load order here ot have some sort of mixing at all.
- Parameters:
input_dimension (int) – Dimension of an input vector.
output_dimension (int) – Dimension of an output vector.
input_data_scaler (mala.datahandling.data_scaler.DataScaler) – Used to scale the input data.
output_data_scaler (mala.datahandling.data_scaler.DataScaler) – Used to scale the output data.
descriptor_calculator (mala.descriptors.descriptor.Descriptor) – Used to do unit conversion on input data.
target_calculator (mala.targets.target.Target or derivative) – Used to do unit conversion on output data.
use_ddp (bool) – If true, it is assumed that ddp is used.
input_requires_grad (bool) – If True, then the gradient is stored for the inputs.
Allocate the shared memory buffer for use by prefetching process.
Buffer is sized via numpy metadata.
Deallocate the shared memory buffer used by prefetching process.
- mix_datasets()[source]
Shuffle the data in this data set.
For this class, instead of mixing the datasets, we just shuffle the indices and leave the dataset order unchanged. NOTE: It seems that the shuffled access to the shared memory performance is much reduced (relative to the FastTensorDataset). To regain performance, can rewrite to shuffle the datasets like in the existing LazyLoadDataset. Another option might be to try loading the numpy file in permuted order to avoid the shuffled reads; however, this might require some care to avoid erroneously overwriting shared memory data in cases where a single dataset object is used back to back.