lazy_load_dataset_single

DataSet for lazy-loading.

class LazyLoadDatasetSingle(*args: Any, **kwargs: Any)[source]

Bases: Dataset

DataSet class for lazy loading.

Only loads snapshots in the memory that are currently being processed. Uses a “caching” approach of keeping the last used snapshot in memory, until values from a new ones are used. Therefore, shuffling at DataSampler / DataLoader level is discouraged to the point that it was disabled. Instead, we mix the snapshot load order here ot have some sort of mixing at all.

Parameters:
allocate_shared_mem()[source]

Allocate the shared memory buffer for use by prefetching process.

Buffer is sized via numpy metadata.

deallocate_shared_mem()[source]

Deallocate the shared memory buffer used by prefetching process.

delete_data()[source]

Free the shared memory buffers.

mix_datasets()[source]

Shuffle the data in this data set.

For this class, instead of mixing the datasets, we just shuffle the indices and leave the dataset order unchanged. NOTE: It seems that the shuffled access to the shared memory performance is much reduced (relative to the FastTensorDataset). To regain performance, can rewrite to shuffle the datasets like in the existing LazyLoadDataset. Another option might be to try loading the numpy file in permuted order to avoid the shuffled reads; however, this might require some care to avoid erroneously overwriting shared memory data in cases where a single dataset object is used back to back.