data_shuffler

Mixes data between snapshots for improved lazy-loading training.

class DataShuffler(parameters: Parameters, target_calculator=None, descriptor_calculator=None)[source]

Bases: DataHandlerBase

Mixes data between snapshots for improved lazy-loading training.

This is a DISK operation - new, shuffled snapshots will be created on disk.

Parameters:

parameters (mala.common.parameters.Parameters) – Parameters used to create the data handling object.
descriptor_calculator (mala.descriptors.descriptor.Descriptor) – Used to do unit conversion on input data. If None, then one will be created by this class.
target_calculator (mala.targets.target.Target) – Used to do unit conversion on output data. If None, then one will be created by this class.

temporary_shuffled_snapshots

A list containing snapshot objects of temporary, snapshot-like shuffled data files. By default, this list is empty. If the function “shuffle_snapshots_temporary” is used, it will be populated with temporary files saved to hard drive, which can be deleted after model training. Please note that the “snapshot_function”, “input_units”, “output_units” and “calculation_output” fields of the snapshots within this list

Type:: list

add_snapshot(input_file, input_directory, output_file, output_directory, snapshot_type=None)[source]

Add a snapshot to the data pipeline.

Parameters:

input_file (string) – File with saved numpy input array.
input_directory (string) – Directory containing input_npy_directory.
output_file (string) – File with saved numpy output array.
output_directory (string) – Directory containing output_npy_file.
snapshot_type (string) – Either “numpy” or “openpmd” based on what kind of files you want to operate on.

delete_temporary_shuffled_snapshots()[source]

Delete temporary files creating during shuffling of data.

If shuffling has been done with the option “shuffle_to_temporary”, shuffled data will be saved to temporary files which can safely be deleted with this function.

shuffle_snapshots(complete_save_path=None, descriptor_save_path=None, target_save_path=None, save_name='mala_shuffled_snapshot*', number_of_shuffled_snapshots=None, shuffle_to_temporary=False)[source]

Shuffle the snapshots into new snapshots.

This saves them to file.

Parameters:

complete_save_path (string) – If not None: the directory in which all snapshots will be saved. Overwrites descriptor_save_path and target_save_path if set.
descriptor_save_path (string) – Directory in which to save descriptor data.
target_save_path (string) – Directory in which to save target data.
save_name (string) – Name of the snapshots to be shuffled.
number_of_shuffled_snapshots (int) – If not None, this class will attempt to redistribute the data to this amount of snapshots. If None, then the same number of snapshots provided will be used.
shuffle_to_temporary (bool) – If True, shuffled files will be writen to temporary data files. Which paths are used is consistent with non-temporary usage of this class. The path and names of these temporary files can then be found in the class attribute temporary_shuffled_snapshots.