data_shuffler
Mixes data between snapshots for improved lazy-loading training.
- class DataShuffler(parameters: Parameters, target_calculator=None, descriptor_calculator=None)[source]
Bases:
DataHandlerBase
Mixes data between snapshots for improved lazy-loading training.
This is a DISK operation - new, shuffled snapshots will be created on disk.
- Parameters:
parameters (mala.common.parameters.Parameters) – Parameters used to create the data handling object.
descriptor_calculator (mala.descriptors.descriptor.Descriptor) – Used to do unit conversion on input data. If None, then one will be created by this class.
target_calculator (mala.targets.target.Target) – Used to do unit conversion on output data. If None, then one will be created by this class.
- add_snapshot(input_file, input_directory, output_file, output_directory, snapshot_type='numpy')[source]
Add a snapshot to the data pipeline.
- Parameters:
input_file (string) – File with saved numpy input array.
input_directory (string) – Directory containing input_npy_directory.
output_file (string) – File with saved numpy output array.
output_directory (string) – Directory containing output_npy_file.
snapshot_type (string) – Either “numpy” or “openpmd” based on what kind of files you want to operate on.
- shuffle_snapshots(complete_save_path=None, descriptor_save_path=None, target_save_path=None, save_name='mala_shuffled_snapshot*', number_of_shuffled_snapshots=None)[source]
Shuffle the snapshots into new snapshots.
This saves them to file.
- Parameters:
complete_save_path (string) – If not None: the directory in which all snapshots will be saved. Overwrites descriptor_save_path, target_save_path and additional_info_save_path if set.
descriptor_save_path (string) – Directory in which to save descriptor data.
target_save_path (string) – Directory in which to save target data.
save_name (string) – Name of the snapshots to be shuffled.
number_of_shuffled_snapshots (int) – If not None, this class will attempt to redistribute the data to this amount of snapshots. If None, then the same number of snapshots provided will be used.