parameters
Collection of all parameter related classes and functions.
- class Parameters[source]
Bases:
object
All parameter MALA needs to perform its various tasks.
- comment
Characterizes a set of parameters (e.g. “experiment_ddmmyy”).
- Type:
string
- network
Contains all parameters necessary for constructing a neural network.
- Type:
- descriptors
Contains all parameters necessary for calculating/parsing descriptors.
- Type:
- targets
Contains all parameters necessary for calculating/parsing output quantites.
- Type:
- data
Contains all parameters necessary for loading and preprocessing data.
- Type:
- running
Contains parameters needed for network runs (train, test or inference).
- Type:
- hyperparameters
Parameters used for hyperparameter optimization.
- manual_seed
If not none, this value is used as manual seed for the neural networks. Can be used to make experiments comparable. Default: None.
- Type:
int
- classmethod load_from_file(file, save_format='json', no_snapshots=False, force_no_ddp=False)[source]
Load a Parameters object from a file.
- Parameters:
file (string or ZipExtFile) – File to which the parameters will be saved to.
save_format (string) – File format which is used for parameter saving. Currently only supported file format is “pickle”.
no_snapshots (bool) – If True, than the snapshot list will be emptied. Useful when performing inference/testing after training a network.
- Returns:
loaded_parameters – The loaded Parameters object.
- Return type:
- classmethod load_from_json(file, no_snapshots=False, force_no_ddp=False)[source]
Load a Parameters object from a json file.
- Parameters:
file (string or ZipExtFile) – File to which the parameters will be saved to.
no_snapshots (bool) – If True, than the snapshot list will be emptied. Useful when performing inference/testing after training a network.
- Returns:
loaded_parameters – The loaded Parameters object.
- Return type:
- classmethod load_from_pickle(file, no_snapshots=False)[source]
Load a Parameters object from a pickle file.
- Parameters:
file (string or ZipExtFile) – File to which the parameters will be saved to.
no_snapshots (bool) – If True, than the snapshot list will be emptied. Useful when performing inference/testing after training a network.
- Returns:
loaded_parameters – The loaded Parameters object.
- Return type:
- optuna_singlenode_setup(wait_time=0)[source]
Set up device and parallelization parameters for Optuna+MPI.
This only needs to be called if multiple MPI ranks are used on one node to run Optuna. Optuna itself does NOT communicate via MPI. Thus, if we allocate e.g. one node with 4 GPUs and start 4 jobs, 3 of those jobs will fail, because currently, we instantiate the cuda devices based on MPI ranks. This functions sets everything up properly. This of course requires MPI. This may be a bit hacky, but it lets us use one script and one MPI command to launch x GPU backed jobs on any node with x GPUs.
- Parameters:
wait_time (int) – If larger than 0, then all processes will wait this many seconds times their rank number after this routine before proceeding. This can be useful when using a file based distribution algorithm.
- save(filename, save_format='json')[source]
Save the Parameters object to a file.
- Parameters:
filename (string) – File to which the parameters will be saved to.
save_format (string) – File format which is used for parameter saving. Currently only supported file format is “pickle”.
- save_as_json(filename)[source]
Save the Parameters object to a json file.
- Parameters:
filename (string) – File to which the parameters will be saved to.
- save_as_pickle(filename)[source]
Save the Parameters object to a pickle file.
- Parameters:
filename (string) – File to which the parameters will be saved to.
- property device
Get the device used by MALA. Read-only.
- property openpmd_configuration
Provide a .toml or .json formatted string to configure OpenPMD.
To load a configuration from a file, add an “@” in front of the file name and put the resulting string here. OpenPMD will then load the file. For further details, see the OpenPMD documentation.
- property openpmd_granularity
Adjust the memory overhead of the OpenPMD interface.
Smallest possible value is 1, meaning smallest memory footprint and slowest I/O. Higher values will introduce some memory penalty, but offer greater speed. The maximum level is the feature dimension of your data set, if you choose a value larger than this feature dimension, it will automatically be set to the feature dimension upon loading.
- property use_atomic_density_formula
Control whether to use the atomic density formula.
This formula uses as a Gaussian representation of the atomic density to calculate the structure factor and with it, the Ewald energy and parts of the exchange-correlation energy. By using it, one can go from N^2 to NlogN scaling, and offloads most of the computational overhead of energy calculation from QE to LAMMPS. This is beneficial since LAMMPS can benefit from GPU acceleration (QE GPU acceleration is not used in the portion of the QE code MALA employs). If set to True, this means MALA will perform another LAMMPS calculation during inference. The hyperparameters for this atomic density calculation are set via the parameters.descriptors object. Default is False, except for when both use_gpu and use_lammps are True, in which case this value will be set to True as well.
- property use_ddp
Control whether ddp is used for parallel training.
- property use_gpu
Control whether a GPU is used (provided there is one).
- property use_lammps
Control whether to use LAMMPS for descriptor calculation.
- property use_mpi
Control whether MPI is used for paralle inference.
- property verbosity
Control the level of output for MALA.
The following options are available:
0: “low”, only essential output will be printed
1: “medium”, most diagnostic output will be printed. (Default)
2: “high”, all information will be printed.
- class ParametersBase[source]
Bases:
JSONSerializable
Base parameter class for MALA.
- classmethod from_json(json_dict)[source]
Read this object from a dictionary saved in a JSON file.
- Parameters:
json_dict (dict) – A dictionary containing all attributes, properties, etc. as saved in the json file.
- Returns:
deserialized_object – The object as read from the JSON file.
- Return type:
- class ParametersData[source]
Bases:
ParametersBase
Parameters necessary for loading and preprocessing data.
- snapshot_directories_list
A list of all added snapshots.
- Type:
list
- data_splitting_type
Specify how the data for validation, test and training is splitted. Currently the only supported option is by_snapshot, which splits the data by snapshot boundaries. It is also the default.
- Type:
string
- input_rescaling_type
Specifies how input quantities are normalized. Options:
“None”: No normalization is applied.
“standard”: Standardization (Scale to mean 0, standard deviation 1)
“normal”: Min-Max scaling (Scale to be in range 0…1)
“feature-wise-standard”: Row Standardization (Scale to mean 0, standard deviation 1)
“feature-wise-normal”: Row Min-Max scaling (Scale to be in range 0…1)
- Type:
string
- output_rescaling_type
Specifies how output quantities are normalized. Options:
“None”: No normalization is applied.
“standard”: Standardization (Scale to mean 0, standard deviation 1)
“normal”: Min-Max scaling (Scale to be in range 0…1)
“feature-wise-standard”: Row Standardization (Scale to mean 0, standard deviation 1)
“feature-wise-normal”: Row Min-Max scaling (Scale to be in range 0…1)
- Type:
string
- use_lazy_loading
If True, data is lazily loaded, i.e. only the snapshots that are currently needed will be kept in memory. This greatly reduces memory demands, but adds additional computational time.
- Type:
bool
- use_lazy_loading_prefetch
If True, will use alternative lazy loading path with prefetching for higher performance
- Type:
bool
- use_fast_tensor_data_set
If True, then the new, fast TensorDataSet implemented by Josh Romero will be used.
- Type:
bool
- shuffling_seed
If not None, a seed that will be used to make the shuffling of the data in the DataShuffler class deterministic.
- Type:
int
- class ParametersDataGeneration[source]
Bases:
ParametersBase
All parameters to help with data generation.
- trajectory_analysis_denoising_width
The distance metric is denoised prior to analysis using a certain width. This should be adjusted if there is reason to believe the trajectory will be noise for some reason.
- Type:
int
- trajectory_analysis_below_average_counter
Number of time steps that have to consecutively below the average of the distance metric curve, before we consider the trajectory to be equilibrated. Usually does not have to be changed.
- Type:
int
- trajectory_analysis_estimated_equilibrium
The analysis of the trajectory builds on the assumption that at some point of the trajectory, the system is equilibrated. For this, we need to provide the fraction of the trajectory (counted from the end). Usually, 10% is a fine assumption. This value usually does not need to be changed.
- Type:
float
- trajectory_analysis_correlation_metric_cutoff
Cutoff value to be used when sampling uncorrelated snapshots during trajectory analysis. If negative, a value will be determined numerically. This value is a cutoff for the minimum euclidean distance between any two ions in two subsequent ionic configurations.
- Type:
float
- trajectory_analysis_temperature_tolerance_percent
Maximum deviation of temperature between snapshot and desired temperature for snapshot to be considered for DFT calculation (in percent)
- Type:
float
- local_psp_path
Path to where the local pseudopotential is stored (for OF-DFT-MD).
- Type:
string
- local_psp_name
Name of the local pseudopotential (for OF-DFT-MD).
- Type:
string
- ofdft_timestep
Timestep of the OF-DFT-MD simulation.
- Type:
int
- ofdft_number_of_timesteps
Number of timesteps for the OF-DFT-MD simulation.
- Type:
int
- ofdft_temperature
Temperature at which to perform the OF-DFT-MD simulation.
- Type:
float
- ofdft_kedf
Kinetic energy functional to be used for the OF-DFT-MD simulation.
- Type:
string
- ofdft_friction
Friction to be added for the Langevin dynamics in the OF-DFT-MD run.
- Type:
float
- class ParametersDescriptors[source]
Bases:
ParametersBase
Parameters necessary for calculating/parsing input descriptors.
- descriptor_type
Type of descriptors that is used to represent the atomic fingerprint. Supported:
‘Bispectrum’: Bispectrum descriptors (formerly called ‘SNAP’).
- ‘Atomic Density’: Atomic density, calculated via Gaussian
descriptors.
- Type:
string
- bispectrum_twojmax
Bispectrum calculation: 2*jmax-parameter used for calculation of bispectrum descriptors. Default value for jmax is 5, so default value for twojmax is 10.
- Type:
int
- lammps_compute_file
Bispectrum calculation: LAMMPS input file that is used to calculate the Bispectrum descriptors. If this string is empty, the standard LAMMPS input file found in this repository will be used (recommended).
- Type:
string
- descriptors_contain_xyz
Legacy option. If True, it is assumed that the first three entries of the descriptor vector are the xyz coordinates and they are cut from the descriptor vector. If False, no such cutting is peformed.
- Type:
bool
- atomic_density_sigma
Sigma used for the calculation of the Gaussian descriptors.
- Type:
float
- property bispectrum_cutoff
Cut off radius for bispectrum calculation.
- property bispectrum_switchflag
Switchflag for the bispectrum calculation.
Can only be 1 or 0. If 1 (default), a switching function will be used to ensure that atomic contributions smoothly go to zero after a certain cutoff. If 0 (old default, which can be problematic in some instances), this is not done, which can lead to discontinuities.
- property use_y_splitting
Control whether a splitting in y-axis is used.
This can only be used in conjunction with a z-splitting, and the option will ignored if z-splitting is disabled. Only has an effect for values larger then 1.
- property use_z_splitting
Control whether splitting across the z-axis is used.
Default is True, since this gives descriptors compatible with QE, for total energy evaluation. However, setting this value to False can, e.g. in the LAMMPS case, improve performance. This is relevant for e.g. preprocessing.
- class ParametersHyperparameterOptimization[source]
Bases:
ParametersBase
Hyperparameter optimization parameters.
- direction
Controls whether to minimize or maximize the loss function. Arguments are “minimize” and “maximize” respectively.
- Type:
string
- n_trials
Controls how many trials are performed (when using optuna). Default: 100.
- Type:
int
- hlist
List containing hyperparameters, that are then passed to optuna. Supported options so far include:
learning_rate (float): learning rate of the training algorithm
layer_activation_xxx (categorical): Activation function used for the feed forward network (see Netwok parameters for supported activation functions). Note that _xxx is only so that optuna will differentiate between variables. No reordering is performed by the; the order depends on the order in the list. _xxx can be essentially anything. Please note further that you need to either only request one acitvation function (for all layers) or one for specifically for each layer.
ff_neurons_layer_xxx(int): Number of neurons per a layer. Note that _xxx is only so that optuna will differentiate between variables. No reordering is performed by MALA; the order depends on the order in the list. _xxx can be essentially anything.
Users normally don’t have to fill this list by hand, the hyperparamer optimizer provide interfaces for this task.
- Type:
list
- hyper_opt_methodstring
Method used for hyperparameter optimization. Currently supported:
“optuna” : Use optuna for the hyperparameter optimization.
“oat” : Use orthogonal array tuning (currently limited to categorical hyperparemeters). Range analysis is currently done by simply choosing the lowest loss.
“naswot” : Using a NAS without training, based on jacobians.
- checkpoints_each_trialint
If not 0, checkpoint files will be saved after each checkpoints_each_trial trials. Currently, this only works with optuna.
- checkpoint_namestring
Name used for the checkpoints. Using this, multiple runs can be performed in the same directory. Currently. this only works with optuna.
- study_namestring
Name used for this study (in optuna#s storage). Necessary when operating with a RDB storage.
- rdb_storagestring
Adress of the RDB storage to be used by optuna.
- rdb_storage_heartbeatint
Heartbeat interval for optuna (in seconds). Default is None. If not None and above 0, optuna will record the heartbeat of intervals. If no action on a RUNNING trial is recognized for longer then this interval, then this trial will be moved to FAILED. In distributed training, setting a heartbeat is currently the only way to achieve a precise number of trials:
https://github.com/optuna/optuna/issues/1883
For optuna versions below 2.8.0, larger heartbeat intervals are detrimental to performance and should be avoided:
https://github.com/optuna/optuna/issues/2685
For MALA, no evidence for decreased performance using smaller heartbeat values could be found. So if this is used, 1s is a reasonable value.
- number_training_per_trialint
Number of network trainings performed per trial. Default is 1, but it makes sense to choose a higher number, to exclude networks that performed by chance (good initilization). Naturally this impedes performance.
- trial_ensemble_evaluationstring
Control how multiple trainings performed during a trial are evaluated. By default, simply “mean” is used. For smaller numbers of training per trial it might make sense to use “mean_std”, which means that the mean of all metrics plus the standard deviation is used, as an estimate of the minimal accuracy to be expected. Currently, “mean” and “mean_std” are allowed.
- use_multivariatebool
If True, the optuna multivariate sampler is used. It is experimental since v2.2.0, but reported to perform very well. http://proceedings.mlr.press/v80/falkner18a.html
- naswot_pruner_cutofffloat
If the surrogate loss algorithm is used as a pruner during a study, this cutoff determines which trials are neglected.
- pruner: string
Pruner type to be used by optuna. Currently supported:
“multi_training”: If multiple trainings are performed per trial, and one returns “inf” for the loss, no further training will be performed. Especially useful if used in conjunction with the band_energy metric.
“naswot”: use the NASWOT algorithm as pruner
- naswot_pruner_batch_sizeint
Batch size for the NASWOT pruner
- number_bad_trials_before_stoppingint
Only applies to optuna studies. If any integer above 0, then if no new best trial is found within number_bad_trials_before trials after the last one, the study will be stopped.
- sqlite_timeoutint
Timeout for the SQLite backend of Optuna. This backend is officially not recommended because it is file based and can lead to errors; With a suitable timeout it can be used somewhat stable though and help in HPC settings.
- show(indent='')[source]
Print name and values of all attributes of this object.
- Parameters:
indent (string) – The indent used in the list with which the parameter shows itself.
- property number_training_per_trial
Control how many trainings are run per optuna trial.
- property rdb_storage_heartbeat
Control whether a heartbeat is used for distributed optuna runs.
- property trial_ensemble_evaluation
Control how multiple trainings performed during a trial are evaluated.
By default, simply “mean” is used. For smaller numbers of training per trial it might make sense to use “mean_std”, which means that the mean of all metrics plus the standard deviation is used, as an estimate of the minimal accuracy to be expected. Currently, “mean” and “mean_std” are allowed.
- class ParametersNetwork[source]
Bases:
ParametersBase
Parameters necessary for constructing a neural network.
- nn_type
Type of the neural network that will be used. Currently supported are
“feed_forward” (default)
“transformer”
“lstm”
“gru”
- Type:
string
- layer_sizes
A list of integers detailing the sizes of the layer of the neural network. Please note that the input layer is included therein. Default: [10,10,0]
- Type:
list
- layer_activations
A list of strings detailing the activation functions to be used by the neural network. If the dimension of layer_activations is smaller than the dimension of layer_sizes-1, than the first entry is used for all layers. Currently supported activation functions are:
Sigmoid (default)
ReLU
LeakyReLU
- Type:
list
- loss_function_type
Loss function for the neural network Currently supported loss functions include:
mse (Mean squared error; default)
- Type:
string
If True hidden and cell state is assigned to zeros for LSTM Network. false will keep the hidden state active Default: False
- Type:
bool
- bidirection
Sets lstm network size based on bidirectional or just one direction Default: False
- Type:
bool
Number of hidden layers to be used in lstm or gru or transformer nets Default: None
- Type:
int
- num_heads
Number of heads to be used in Multi head attention network This should be a divisor of input dimension Default: None
- Type:
int
- class ParametersRunning[source]
Bases:
ParametersBase
Parameters needed for network runs (train, test or inference).
Some of these parameters only apply to either the train or test or inference case.
- optimizer
- Optimizer to be used. Supported options at the moment:
SGD: Stochastic gradient descent.
Adam: Adam Optimization Algorithm
- Type:
string
- learning_rate
Learning rate for chosen optimization algorithm. Default: 0.5.
- Type:
float
- max_number_epochs
Maximum number of epochs to train for. Default: 100.
- Type:
int
- mini_batch_size
Size of the mini batch for the optimization algorihm. Default: 10.
- Type:
int
- early_stopping_epochs
Number of epochs the validation accuracy is allowed to not improve by at leastearly_stopping_threshold, before we terminate. If 0, no early stopping is performed. Default: 0.
- Type:
int
- early_stopping_threshold
Minimum fractional reduction in validation loss required to avoid early stopping, e.g. a value of 0.05 means that validation loss must decrease by 5% within early_stopping_epochs epochs or the training will be stopped early. More explicitly, validation_loss < validation_loss_old * (1-early_stopping_threshold) or the patience counter goes up. Default: 0. Numbers bigger than 0 can make early stopping very aggresive, while numbers less than 0 make the trainer very forgiving of loss increase.
- Type:
float
- learning_rate_scheduler
Learning rate scheduler to be used. If not None, an instance of the corresponding pytorch class will be used to manage the learning rate schedule. Options:
None: No learning rate schedule will be used.
“ReduceLROnPlateau”: The learning rate will be reduced when the validation loss is plateauing.
- Type:
string
- learning_rate_decay
Decay rate to be used in the learning rate (if the chosen scheduler supports that). Default: 0.1
- Type:
float
- learning_rate_patience
Patience parameter used in the learning rate schedule (how long the validation loss has to plateau before the schedule takes effect). Default: 0.
- Type:
int
- num_workers
Number of workers to be used for data loading.
- Type:
int
- use_shuffling_for_samplers
If True, the training data will be shuffled in between epochs. If lazy loading is selected, then this shuffling will be done on a “by snapshot” basis.
- checkpoints_each_epoch
If not 0, checkpoint files will be saved after eac checkpoints_each_epoch epoch.
- Type:
int
- checkpoint_name
Name used for the checkpoints. Using this, multiple runs can be performed in the same directory.
- Type:
string
- logging_dir
Name of the folder that logging files will be saved to.
- Type:
string
- logging_dir_append_date
If True, then upon creating logging files, these will be saved in a subfolder of logging_dir labelled with the starting date of the logging, to avoid having to change input scripts often.
- Type:
bool
- inference_data_grid
List holding the grid to be used for inference in the form of [x,y,z].
- Type:
list
- use_mixed_precision
If True, mixed precision computation (via AMP) will be used.
- Type:
bool
- training_log_interval
Determines how often detailed performance info is printed during training (only has an effect if the verbosity is high enough).
- Type:
int
- profiler_range
- List with two entries determining with which batch/iteration number
the CUDA profiler will start and stop profiling. Please note that this option only holds significance if the nsys profiler is used.
- Type:
list
- property after_training_metric
Get the metric used during training.
Metric for evaluated on the validation and test set before and after training. Default is “LDOS”, meaning that the regular loss on the LDOS will be used as a metric. Possible options are “band_energy” and “total_energy”. For these, the band resp. total energy of the validation snapshots will be calculated and compared to the provided DFT results. Of these, the mean average error in eV/atom will be calculated.
- property during_training_metric
Control the metric used during training.
Metric for evaluated on the validation set during training. Default is “ldos”, meaning that the regular loss on the LDOS will be used as a metric. Possible options are “band_energy” and “total_energy”. For these, the band resp. total energy of the validation snapshots will be calculated and compared to the provided DFT results. Of these, the mean average error in eV/atom will be calculated.
- property use_graphs
Decide whether CUDA graphs are used during training.
Doing so will improve performance, but CUDA graphs are only available from CUDA 11.0 upwards.
- class ParametersTargets[source]
Bases:
ParametersBase
Parameters necessary for calculating/parsing output quantites.
- target_type
Number of points in the energy grid that is used to calculate the (L)DOS.
- Type:
string
- ldos_gridsize
Gridsize of the LDOS.
- Type:
int
- ldos_gridspacing_ev
Gridspacing of the energy grid the (L)DOS is evaluated on [eV].
- Type:
float
- ldos_gridoffset_ev
Lowest energy value on the (L)DOS energy grid [eV].
- Type:
float
- pseudopotential_path
Path at which pseudopotentials are located (for TEM).
- Type:
string
- rdf_parameters
Parameters for calculating the radial distribution function(RDF). The RDF can directly be calculated via a function call, but if it is calculated e.g. during a MD or MC run, these parameters will control how. The following keywords are recognized:
- number_of_binsint
Number of bins used to create the histogram.
- rMaxfloat
Radius up to which to calculate the RDF. None by default; this is the suggested behavior, as MALA will then on its own calculate the maximum radius up until which the calculation of the RDF is indisputably physically meaningful. Larger radii may be specified, e.g. for a Fourier transformation to calculate the static structure factor.
- Type:
dict
- tpcf_parameters
Parameters for calculating the three particle correlation function (TPCF). The TPCF can directly be calculated via a function call, but if it is calculated e.g. during a MD or MC run, these parameters will control how. The following keywords are recognized:
- number_of_binsint
Number of bins used to create the histogram.
- rMaxfloat
Radius up to which to calculate the TPCF. If None, MALA will determine the maximum radius for which the TPCF is indisputably defined. Be advised - this may come at increased computational cost.
- Type:
dict
- ssf_parameters
Parameters for calculating the static structure factor (SSF). The SSF can directly be calculated via a function call, but if it is calculated e.g. during a MD or MC run, these parameters will control how. The following keywords are recognized:
- number_of_binsint
Number of bins used to create the histogram.
- kMaxfloat
Maximum wave vector up to which to calculate the SSF.
- Type:
dict
- assume_two_dimensional
If True, the total energy calculations will be performed without periodic boundary conditions in z-direction, i.e., the cell will be truncated in the z-direction. NOTE: This parameter may be moved up to a global parameter, depending on whether descriptor calculation may benefit from it.
- Type:
bool
- property restrict_targets
Control if and how targets are restricted to physical values.
Can be “zero_out_negative”, i.e. all negative values are set to zero or “absolute_values”, i.e. all negative values are multiplied by -1.