parameters

Collection of all parameter related classes and functions.

class Parameters[source]

Bases: object

All parameter MALA needs to perform its various tasks.

comment

Characterizes a set of parameters (e.g. “experiment_ddmmyy”).

Type:

string

network

Contains all parameters necessary for constructing a neural network.

Type:

ParametersNetwork

descriptors

Contains all parameters necessary for calculating/parsing descriptors.

Type:

ParametersDescriptors

targets

Contains all parameters necessary for calculating/parsing output quantites.

Type:

ParametersTargets

data

Contains all parameters necessary for loading and preprocessing data.

Type:

ParametersData

running

Contains parameters needed for network runs (train, test or inference).

Type:

ParametersRunning

hyperparameters

Parameters used for hyperparameter optimization.

Type:

ParametersHyperparameterOptimization

manual_seed

If not none, this value is used as manual seed for the neural networks. Can be used to make experiments comparable. Default: None.

Type:

int

classmethod load_from_file(file, save_format='json', no_snapshots=False, force_no_ddp=False)[source]

Load a Parameters object from a file.

Parameters:
  • file (string or ZipExtFile) – File to which the parameters will be saved to.

  • save_format (string) – File format which is used for parameter saving. Currently only supported file format is “pickle”.

  • no_snapshots (bool) – If True, than the snapshot list will be emptied. Useful when performing inference/testing after training a network.

Returns:

loaded_parameters – The loaded Parameters object.

Return type:

Parameters

classmethod load_from_json(file, no_snapshots=False, force_no_ddp=False)[source]

Load a Parameters object from a json file.

Parameters:
  • file (string or ZipExtFile) – File to which the parameters will be saved to.

  • no_snapshots (bool) – If True, than the snapshot list will be emptied. Useful when performing inference/testing after training a network.

Returns:

loaded_parameters – The loaded Parameters object.

Return type:

Parameters

classmethod load_from_pickle(file, no_snapshots=False)[source]

Load a Parameters object from a pickle file.

Parameters:
  • file (string or ZipExtFile) – File to which the parameters will be saved to.

  • no_snapshots (bool) – If True, than the snapshot list will be emptied. Useful when performing inference/testing after training a network.

Returns:

loaded_parameters – The loaded Parameters object.

Return type:

Parameters

optuna_singlenode_setup(wait_time=0)[source]

Set up device and parallelization parameters for Optuna+MPI.

This only needs to be called if multiple MPI ranks are used on one node to run Optuna. Optuna itself does NOT communicate via MPI. Thus, if we allocate e.g. one node with 4 GPUs and start 4 jobs, 3 of those jobs will fail, because currently, we instantiate the cuda devices based on MPI ranks. This functions sets everything up properly. This of course requires MPI. This may be a bit hacky, but it lets us use one script and one MPI command to launch x GPU backed jobs on any node with x GPUs.

Parameters:

wait_time (int) – If larger than 0, then all processes will wait this many seconds times their rank number after this routine before proceeding. This can be useful when using a file based distribution algorithm.

save(filename, save_format='json')[source]

Save the Parameters object to a file.

Parameters:
  • filename (string) – File to which the parameters will be saved to.

  • save_format (string) – File format which is used for parameter saving. Currently only supported file format is “pickle”.

save_as_json(filename)[source]

Save the Parameters object to a json file.

Parameters:

filename (string) – File to which the parameters will be saved to.

save_as_pickle(filename)[source]

Save the Parameters object to a pickle file.

Parameters:

filename (string) – File to which the parameters will be saved to.

show()[source]

Print name and values of all attributes of this object.

property device

Get the device used by MALA. Read-only.

property openpmd_configuration

Provide a .toml or .json formatted string to configure OpenPMD.

To load a configuration from a file, add an “@” in front of the file name and put the resulting string here. OpenPMD will then load the file. For further details, see the OpenPMD documentation.

property openpmd_granularity

Adjust the memory overhead of the OpenPMD interface.

Smallest possible value is 1, meaning smallest memory footprint and slowest I/O. Higher values will introduce some memory penalty, but offer greater speed. The maximum level is the feature dimension of your data set, if you choose a value larger than this feature dimension, it will automatically be set to the feature dimension upon loading.

property use_atomic_density_formula

Control whether to use the atomic density formula.

This formula uses as a Gaussian representation of the atomic density to calculate the structure factor and with it, the Ewald energy and parts of the exchange-correlation energy. By using it, one can go from N^2 to NlogN scaling, and offloads most of the computational overhead of energy calculation from QE to LAMMPS. This is beneficial since LAMMPS can benefit from GPU acceleration (QE GPU acceleration is not used in the portion of the QE code MALA employs). If set to True, this means MALA will perform another LAMMPS calculation during inference. The hyperparameters for this atomic density calculation are set via the parameters.descriptors object. Default is False, except for when both use_gpu and use_lammps are True, in which case this value will be set to True as well.

property use_ddp

Control whether ddp is used for parallel training.

property use_gpu

Control whether a GPU is used (provided there is one).

property use_lammps

Control whether to use LAMMPS for descriptor calculation.

property use_mpi

Control whether MPI is used for paralle inference.

property verbosity

Control the level of output for MALA.

The following options are available:

  • 0: “low”, only essential output will be printed

  • 1: “medium”, most diagnostic output will be printed. (Default)

  • 2: “high”, all information will be printed.

class ParametersBase[source]

Bases: JSONSerializable

Base parameter class for MALA.

classmethod from_json(json_dict)[source]

Read this object from a dictionary saved in a JSON file.

Parameters:

json_dict (dict) – A dictionary containing all attributes, properties, etc. as saved in the json file.

Returns:

deserialized_object – The object as read from the JSON file.

Return type:

JSONSerializable

show(indent='')[source]

Print name and values of all attributes of this object.

Parameters:

indent (string) – The indent used in the list with which the parameter shows itself.

to_json()[source]

Convert this object to a dictionary that can be saved in a JSON file.

Returns:

json_dict – The object as dictionary for export to JSON.

Return type:

dict

class ParametersData[source]

Bases: ParametersBase

Parameters necessary for loading and preprocessing data.

snapshot_directories_list

A list of all added snapshots.

Type:

list

data_splitting_type

Specify how the data for validation, test and training is splitted. Currently the only supported option is by_snapshot, which splits the data by snapshot boundaries. It is also the default.

Type:

string

input_rescaling_type

Specifies how input quantities are normalized. Options:

  • “None”: No normalization is applied.

  • “standard”: Standardization (Scale to mean 0, standard deviation 1)

  • “normal”: Min-Max scaling (Scale to be in range 0…1)

  • “feature-wise-standard”: Row Standardization (Scale to mean 0, standard deviation 1)

  • “feature-wise-normal”: Row Min-Max scaling (Scale to be in range 0…1)

Type:

string

output_rescaling_type

Specifies how output quantities are normalized. Options:

  • “None”: No normalization is applied.

  • “standard”: Standardization (Scale to mean 0, standard deviation 1)

  • “normal”: Min-Max scaling (Scale to be in range 0…1)

  • “feature-wise-standard”: Row Standardization (Scale to mean 0, standard deviation 1)

  • “feature-wise-normal”: Row Min-Max scaling (Scale to be in range 0…1)

Type:

string

use_lazy_loading

If True, data is lazily loaded, i.e. only the snapshots that are currently needed will be kept in memory. This greatly reduces memory demands, but adds additional computational time.

Type:

bool

use_lazy_loading_prefetch

If True, will use alternative lazy loading path with prefetching for higher performance

Type:

bool

use_fast_tensor_data_set

If True, then the new, fast TensorDataSet implemented by Josh Romero will be used.

Type:

bool

shuffling_seed

If not None, a seed that will be used to make the shuffling of the data in the DataShuffler class deterministic.

Type:

int

class ParametersDataGeneration[source]

Bases: ParametersBase

All parameters to help with data generation.

trajectory_analysis_denoising_width

The distance metric is denoised prior to analysis using a certain width. This should be adjusted if there is reason to believe the trajectory will be noise for some reason.

Type:

int

trajectory_analysis_below_average_counter

Number of time steps that have to consecutively below the average of the distance metric curve, before we consider the trajectory to be equilibrated. Usually does not have to be changed.

Type:

int

trajectory_analysis_estimated_equilibrium

The analysis of the trajectory builds on the assumption that at some point of the trajectory, the system is equilibrated. For this, we need to provide the fraction of the trajectory (counted from the end). Usually, 10% is a fine assumption. This value usually does not need to be changed.

Type:

float

trajectory_analysis_correlation_metric_cutoff

Cutoff value to be used when sampling uncorrelated snapshots during trajectory analysis. If negative, a value will be determined numerically. This value is a cutoff for the minimum euclidean distance between any two ions in two subsequent ionic configurations.

Type:

float

trajectory_analysis_temperature_tolerance_percent

Maximum deviation of temperature between snapshot and desired temperature for snapshot to be considered for DFT calculation (in percent)

Type:

float

local_psp_path

Path to where the local pseudopotential is stored (for OF-DFT-MD).

Type:

string

local_psp_name

Name of the local pseudopotential (for OF-DFT-MD).

Type:

string

ofdft_timestep

Timestep of the OF-DFT-MD simulation.

Type:

int

ofdft_number_of_timesteps

Number of timesteps for the OF-DFT-MD simulation.

Type:

int

ofdft_temperature

Temperature at which to perform the OF-DFT-MD simulation.

Type:

float

ofdft_kedf

Kinetic energy functional to be used for the OF-DFT-MD simulation.

Type:

string

ofdft_friction

Friction to be added for the Langevin dynamics in the OF-DFT-MD run.

Type:

float

class ParametersDescriptors[source]

Bases: ParametersBase

Parameters necessary for calculating/parsing input descriptors.

descriptor_type

Type of descriptors that is used to represent the atomic fingerprint. Supported:

  • ‘Bispectrum’: Bispectrum descriptors (formerly called ‘SNAP’).

  • ‘Atomic Density’: Atomic density, calculated via Gaussian

    descriptors.

Type:

string

bispectrum_twojmax

Bispectrum calculation: 2*jmax-parameter used for calculation of bispectrum descriptors. Default value for jmax is 5, so default value for twojmax is 10.

Type:

int

lammps_compute_file

Bispectrum calculation: LAMMPS input file that is used to calculate the Bispectrum descriptors. If this string is empty, the standard LAMMPS input file found in this repository will be used (recommended).

Type:

string

descriptors_contain_xyz

Legacy option. If True, it is assumed that the first three entries of the descriptor vector are the xyz coordinates and they are cut from the descriptor vector. If False, no such cutting is peformed.

Type:

bool

atomic_density_sigma

Sigma used for the calculation of the Gaussian descriptors.

Type:

float

property bispectrum_cutoff

Cut off radius for bispectrum calculation.

property bispectrum_switchflag

Switchflag for the bispectrum calculation.

Can only be 1 or 0. If 1 (default), a switching function will be used to ensure that atomic contributions smoothly go to zero after a certain cutoff. If 0 (old default, which can be problematic in some instances), this is not done, which can lead to discontinuities.

property use_y_splitting

Control whether a splitting in y-axis is used.

This can only be used in conjunction with a z-splitting, and the option will ignored if z-splitting is disabled. Only has an effect for values larger then 1.

property use_z_splitting

Control whether splitting across the z-axis is used.

Default is True, since this gives descriptors compatible with QE, for total energy evaluation. However, setting this value to False can, e.g. in the LAMMPS case, improve performance. This is relevant for e.g. preprocessing.

class ParametersHyperparameterOptimization[source]

Bases: ParametersBase

Hyperparameter optimization parameters.

direction

Controls whether to minimize or maximize the loss function. Arguments are “minimize” and “maximize” respectively.

Type:

string

n_trials

Controls how many trials are performed (when using optuna). Default: 100.

Type:

int

hlist

List containing hyperparameters, that are then passed to optuna. Supported options so far include:

  • learning_rate (float): learning rate of the training algorithm

  • layer_activation_xxx (categorical): Activation function used for the feed forward network (see Netwok parameters for supported activation functions). Note that _xxx is only so that optuna will differentiate between variables. No reordering is performed by the; the order depends on the order in the list. _xxx can be essentially anything. Please note further that you need to either only request one acitvation function (for all layers) or one for specifically for each layer.

  • ff_neurons_layer_xxx(int): Number of neurons per a layer. Note that _xxx is only so that optuna will differentiate between variables. No reordering is performed by MALA; the order depends on the order in the list. _xxx can be essentially anything.

Users normally don’t have to fill this list by hand, the hyperparamer optimizer provide interfaces for this task.

Type:

list

hyper_opt_methodstring

Method used for hyperparameter optimization. Currently supported:

  • “optuna” : Use optuna for the hyperparameter optimization.

  • “oat” : Use orthogonal array tuning (currently limited to categorical hyperparemeters). Range analysis is currently done by simply choosing the lowest loss.

  • “naswot” : Using a NAS without training, based on jacobians.

checkpoints_each_trialint

If not 0, checkpoint files will be saved after each checkpoints_each_trial trials. Currently, this only works with optuna.

checkpoint_namestring

Name used for the checkpoints. Using this, multiple runs can be performed in the same directory. Currently. this only works with optuna.

study_namestring

Name used for this study (in optuna#s storage). Necessary when operating with a RDB storage.

rdb_storagestring

Adress of the RDB storage to be used by optuna.

rdb_storage_heartbeatint

Heartbeat interval for optuna (in seconds). Default is None. If not None and above 0, optuna will record the heartbeat of intervals. If no action on a RUNNING trial is recognized for longer then this interval, then this trial will be moved to FAILED. In distributed training, setting a heartbeat is currently the only way to achieve a precise number of trials:

https://github.com/optuna/optuna/issues/1883

For optuna versions below 2.8.0, larger heartbeat intervals are detrimental to performance and should be avoided:

https://github.com/optuna/optuna/issues/2685

For MALA, no evidence for decreased performance using smaller heartbeat values could be found. So if this is used, 1s is a reasonable value.

number_training_per_trialint

Number of network trainings performed per trial. Default is 1, but it makes sense to choose a higher number, to exclude networks that performed by chance (good initilization). Naturally this impedes performance.

trial_ensemble_evaluationstring

Control how multiple trainings performed during a trial are evaluated. By default, simply “mean” is used. For smaller numbers of training per trial it might make sense to use “mean_std”, which means that the mean of all metrics plus the standard deviation is used, as an estimate of the minimal accuracy to be expected. Currently, “mean” and “mean_std” are allowed.

use_multivariatebool

If True, the optuna multivariate sampler is used. It is experimental since v2.2.0, but reported to perform very well. http://proceedings.mlr.press/v80/falkner18a.html

naswot_pruner_cutofffloat

If the surrogate loss algorithm is used as a pruner during a study, this cutoff determines which trials are neglected.

pruner: string

Pruner type to be used by optuna. Currently supported:

  • “multi_training”: If multiple trainings are performed per trial, and one returns “inf” for the loss, no further training will be performed. Especially useful if used in conjunction with the band_energy metric.

  • “naswot”: use the NASWOT algorithm as pruner

naswot_pruner_batch_sizeint

Batch size for the NASWOT pruner

number_bad_trials_before_stoppingint

Only applies to optuna studies. If any integer above 0, then if no new best trial is found within number_bad_trials_before trials after the last one, the study will be stopped.

sqlite_timeoutint

Timeout for the SQLite backend of Optuna. This backend is officially not recommended because it is file based and can lead to errors; With a suitable timeout it can be used somewhat stable though and help in HPC settings.

show(indent='')[source]

Print name and values of all attributes of this object.

Parameters:

indent (string) – The indent used in the list with which the parameter shows itself.

property number_training_per_trial

Control how many trainings are run per optuna trial.

property rdb_storage_heartbeat

Control whether a heartbeat is used for distributed optuna runs.

property trial_ensemble_evaluation

Control how multiple trainings performed during a trial are evaluated.

By default, simply “mean” is used. For smaller numbers of training per trial it might make sense to use “mean_std”, which means that the mean of all metrics plus the standard deviation is used, as an estimate of the minimal accuracy to be expected. Currently, “mean” and “mean_std” are allowed.

class ParametersNetwork[source]

Bases: ParametersBase

Parameters necessary for constructing a neural network.

nn_type

Type of the neural network that will be used. Currently supported are

  • “feed_forward” (default)

  • “transformer”

  • “lstm”

  • “gru”

Type:

string

layer_sizes

A list of integers detailing the sizes of the layer of the neural network. Please note that the input layer is included therein. Default: [10,10,0]

Type:

list

layer_activations

A list of strings detailing the activation functions to be used by the neural network. If the dimension of layer_activations is smaller than the dimension of layer_sizes-1, than the first entry is used for all layers. Currently supported activation functions are:

  • Sigmoid (default)

  • ReLU

  • LeakyReLU

Type:

list

loss_function_type

Loss function for the neural network Currently supported loss functions include:

  • mse (Mean squared error; default)

Type:

string

no_hidden_state

If True hidden and cell state is assigned to zeros for LSTM Network. false will keep the hidden state active Default: False

Type:

bool

bidirection

Sets lstm network size based on bidirectional or just one direction Default: False

Type:

bool

num_hidden_layers

Number of hidden layers to be used in lstm or gru or transformer nets Default: None

Type:

int

num_heads

Number of heads to be used in Multi head attention network This should be a divisor of input dimension Default: None

Type:

int

class ParametersRunning[source]

Bases: ParametersBase

Parameters needed for network runs (train, test or inference).

Some of these parameters only apply to either the train or test or inference case.

optimizer
Optimizer to be used. Supported options at the moment:
  • SGD: Stochastic gradient descent.

  • Adam: Adam Optimization Algorithm

Type:

string

learning_rate

Learning rate for chosen optimization algorithm. Default: 0.5.

Type:

float

max_number_epochs

Maximum number of epochs to train for. Default: 100.

Type:

int

mini_batch_size

Size of the mini batch for the optimization algorihm. Default: 10.

Type:

int

early_stopping_epochs

Number of epochs the validation accuracy is allowed to not improve by at leastearly_stopping_threshold, before we terminate. If 0, no early stopping is performed. Default: 0.

Type:

int

early_stopping_threshold

Minimum fractional reduction in validation loss required to avoid early stopping, e.g. a value of 0.05 means that validation loss must decrease by 5% within early_stopping_epochs epochs or the training will be stopped early. More explicitly, validation_loss < validation_loss_old * (1-early_stopping_threshold) or the patience counter goes up. Default: 0. Numbers bigger than 0 can make early stopping very aggresive, while numbers less than 0 make the trainer very forgiving of loss increase.

Type:

float

learning_rate_scheduler

Learning rate scheduler to be used. If not None, an instance of the corresponding pytorch class will be used to manage the learning rate schedule. Options:

  • None: No learning rate schedule will be used.

  • “ReduceLROnPlateau”: The learning rate will be reduced when the validation loss is plateauing.

Type:

string

learning_rate_decay

Decay rate to be used in the learning rate (if the chosen scheduler supports that). Default: 0.1

Type:

float

learning_rate_patience

Patience parameter used in the learning rate schedule (how long the validation loss has to plateau before the schedule takes effect). Default: 0.

Type:

int

num_workers

Number of workers to be used for data loading.

Type:

int

use_shuffling_for_samplers

If True, the training data will be shuffled in between epochs. If lazy loading is selected, then this shuffling will be done on a “by snapshot” basis.

checkpoints_each_epoch

If not 0, checkpoint files will be saved after eac checkpoints_each_epoch epoch.

Type:

int

checkpoint_name

Name used for the checkpoints. Using this, multiple runs can be performed in the same directory.

Type:

string

logging_dir

Name of the folder that logging files will be saved to.

Type:

string

logging_dir_append_date

If True, then upon creating logging files, these will be saved in a subfolder of logging_dir labelled with the starting date of the logging, to avoid having to change input scripts often.

Type:

bool

inference_data_grid

List holding the grid to be used for inference in the form of [x,y,z].

Type:

list

use_mixed_precision

If True, mixed precision computation (via AMP) will be used.

Type:

bool

training_log_interval

Determines how often detailed performance info is printed during training (only has an effect if the verbosity is high enough).

Type:

int

profiler_range
List with two entries determining with which batch/iteration number

the CUDA profiler will start and stop profiling. Please note that this option only holds significance if the nsys profiler is used.

Type:

list

property after_training_metric

Get the metric used during training.

Metric for evaluated on the validation and test set before and after training. Default is “LDOS”, meaning that the regular loss on the LDOS will be used as a metric. Possible options are “band_energy” and “total_energy”. For these, the band resp. total energy of the validation snapshots will be calculated and compared to the provided DFT results. Of these, the mean average error in eV/atom will be calculated.

property during_training_metric

Control the metric used during training.

Metric for evaluated on the validation set during training. Default is “ldos”, meaning that the regular loss on the LDOS will be used as a metric. Possible options are “band_energy” and “total_energy”. For these, the band resp. total energy of the validation snapshots will be calculated and compared to the provided DFT results. Of these, the mean average error in eV/atom will be calculated.

property use_graphs

Decide whether CUDA graphs are used during training.

Doing so will improve performance, but CUDA graphs are only available from CUDA 11.0 upwards.

class ParametersTargets[source]

Bases: ParametersBase

Parameters necessary for calculating/parsing output quantites.

target_type

Number of points in the energy grid that is used to calculate the (L)DOS.

Type:

string

ldos_gridsize

Gridsize of the LDOS.

Type:

int

ldos_gridspacing_ev

Gridspacing of the energy grid the (L)DOS is evaluated on [eV].

Type:

float

ldos_gridoffset_ev

Lowest energy value on the (L)DOS energy grid [eV].

Type:

float

pseudopotential_path

Path at which pseudopotentials are located (for TEM).

Type:

string

rdf_parameters

Parameters for calculating the radial distribution function(RDF). The RDF can directly be calculated via a function call, but if it is calculated e.g. during a MD or MC run, these parameters will control how. The following keywords are recognized:

number_of_binsint

Number of bins used to create the histogram.

rMaxfloat

Radius up to which to calculate the RDF. None by default; this is the suggested behavior, as MALA will then on its own calculate the maximum radius up until which the calculation of the RDF is indisputably physically meaningful. Larger radii may be specified, e.g. for a Fourier transformation to calculate the static structure factor.

Type:

dict

tpcf_parameters

Parameters for calculating the three particle correlation function (TPCF). The TPCF can directly be calculated via a function call, but if it is calculated e.g. during a MD or MC run, these parameters will control how. The following keywords are recognized:

number_of_binsint

Number of bins used to create the histogram.

rMaxfloat

Radius up to which to calculate the TPCF. If None, MALA will determine the maximum radius for which the TPCF is indisputably defined. Be advised - this may come at increased computational cost.

Type:

dict

ssf_parameters

Parameters for calculating the static structure factor (SSF). The SSF can directly be calculated via a function call, but if it is calculated e.g. during a MD or MC run, these parameters will control how. The following keywords are recognized:

number_of_binsint

Number of bins used to create the histogram.

kMaxfloat

Maximum wave vector up to which to calculate the SSF.

Type:

dict

assume_two_dimensional

If True, the total energy calculations will be performed without periodic boundary conditions in z-direction, i.e., the cell will be truncated in the z-direction. NOTE: This parameter may be moved up to a global parameter, depending on whether descriptor calculation may benefit from it.

Type:

bool

property restrict_targets

Control if and how targets are restricted to physical values.

Can be “zero_out_negative”, i.e. all negative values are set to zero or “absolute_values”, i.e. all negative values are multiplied by -1.