lazy_load_dataset_single

DataSet for lazy-loading.

class LazyLoadDatasetSingle(*args: Any, **kwargs: Any)[source]

Bases: Dataset

DataSet class for lazy loading.

Only loads snapshots in the memory that are currently being processed. Uses a “caching” approach of keeping the last used snapshot in memory, until values from a new ones are used. Therefore, shuffling at DataSampler / DataLoader level is discouraged to the point that it was disabled. Instead, we mix the snapshot load order here ot have some sort of mixing at all.

Parameters:
allocated

True if dataset is allocated.

Type:

bool

currently_loaded_file

Index of currently loaded file

Type:

int

descriptor_calculator

Used to do unit conversion on input data.

Type:

mala.descriptors.descriptor.Descriptor

input_data

Input data tensor.

Type:

torch.Tensor

input_dtype

Input data type.

Type:

numpy.dtype

input_shape

Input data dimensions

Type:

list

input_shm_name

Name of shared memory allocated for input data

Type:

str

loaded

True if data has been loaded to shared memory.

Type:

bool

output_data

Output data tensor.

Type:

torch.Tensor

output_dtype

Output data dtype.

Type:

numpy.dtype

output_shape

Output data dimensions.

Type:

list

output_shm_name

Name of shared memory allocated for output data.

Type:

str

return_outputs_directly

Control whether outputs are actually transformed. Has to be False for training. In the testing case, Numerical errors are smaller if set to True.

Type:

bool

snapshot

Currently loaded snapshot object.

Type:

mala.datahandling.snapshot.Snapshot

target_calculator

Used to do unit conversion on output data.

Type:

mala.targets.target.Target or derivative

allocate_shared_mem()[source]

Allocate the shared memory buffer for use by prefetching process.

Buffer is sized via numpy metadata.

deallocate_shared_mem()[source]

Deallocate the shared memory buffer used by prefetching process.

delete_data()[source]

Free the shared memory buffers.

mix_datasets()[source]

Shuffle the data in this data set.

For this class, instead of mixing the datasets, we just shuffle the indices and leave the dataset order unchanged. NOTE: It seems that the shuffled access to the shared memory performance is much reduced (relative to the FastTensorDataset). To regain performance, can rewrite to shuffle the datasets like in the existing LazyLoadDataset. Another option might be to try loading the numpy file in permuted order to avoid the shuffled reads; however, this might require some care to avoid erroneously overwriting shared memory data in cases where a single dataset object is used back to back.