parameters

file (string or ZipExtFile) – File to which the parameters will be saved to.
save_format (string) – File format which is used for parameter saving. Currently only supported file format is “pickle”.
no_snapshots (bool) – If True, than the snapshot list will be emptied. Useful when performing inference/testing after training a network.

Returns:

loaded_parameters – The loaded Parameters object.

Return type:

Parameters

classmethod load_from_json(file, no_snapshots=False, force_no_ddp=False)[source]

Load a Parameters object from a json file.

Parameters:

file (string or ZipExtFile) – File to which the parameters will be saved to.
no_snapshots (bool) – If True, than the snapshot list will be emptied. Useful when performing inference/testing after training a network.

Returns:

loaded_parameters – The loaded Parameters object.

Return type:

Parameters

classmethod load_from_pickle(file, no_snapshots=False)[source]

Load a Parameters object from a pickle file.

Parameters:

file (string or ZipExtFile) – File to which the parameters will be saved to.
no_snapshots (bool) – If True, than the snapshot list will be emptied. Useful when performing inference/testing after training a network.

Returns:

loaded_parameters – The loaded Parameters object.

Return type:

Parameters

optuna_singlenode_setup(wait_time=0)[source]

Set up device and parallelization parameters for Optuna+MPI.

This only needs to be called if multiple MPI ranks are used on one node to run Optuna. Optuna itself does NOT communicate via MPI. Thus, if we allocate e.g. one node with 4 GPUs and start 4 jobs, 3 of those jobs will fail, because currently, we instantiate the cuda devices based on MPI ranks. This functions sets everything up properly. This of course requires MPI. This may be a bit hacky, but it lets us use one script and one MPI command to launch x GPU backed jobs on any node with x GPUs.

Parameters:: wait_time (int) – If larger than 0, then all processes will wait this many seconds times their rank number after this routine before proceeding. This can be useful when using a file based distribution algorithm.

save(filename, save_format='json')[source]

Save the Parameters object to a file.

Parameters:

filename (string) – File to which the parameters will be saved to.
save_format (string) – File format which is used for parameter saving. Currently only supported file format is “pickle”.

save_as_json(filename)[source]

Save the Parameters object to a json file.

Parameters:: filename (string) – File to which the parameters will be saved to.

save_as_pickle(filename)[source]

Save the Parameters object to a pickle file.

Parameters:: filename (string) – File to which the parameters will be saved to.

show()[source]: Print name and values of all attributes of this object.

property device: Get the device used by MALA. Read-only.

property manual_seed

If not none, this value is used as manual seed for the neural networks.

Can be used to make experiments comparable. Default: None.

property openpmd_configuration

Provide a .toml or .json formatted string to configure OpenPMD.

To load a configuration from a file, add an “@” in front of the file name and put the resulting string here. OpenPMD will then load the file. For further details, see the OpenPMD documentation.

property openpmd_granularity

Adjust the memory overhead of the OpenPMD interface.

Smallest possible value is 1, meaning smallest memory footprint and slowest I/O. Higher values will introduce some memory penalty, but offer greater speed. The maximum level is the feature dimension of your data set, if you choose a value larger than this feature dimension, it will automatically be set to the feature dimension upon loading.

property use_atomic_density_formula

Control whether to use the atomic density formula.

This formula uses as a Gaussian representation of the atomic density to calculate the structure factor and with it, the Ewald energy and parts of the exchange-correlation energy. By using it, one can go from N^2 to NlogN scaling, and offloads most of the computational overhead of energy calculation from QE to LAMMPS. This is beneficial since LAMMPS can benefit from GPU acceleration (QE GPU acceleration is not used in the portion of the QE code MALA employs). If set to True, this means MALA will perform another LAMMPS calculation during inference. The hyperparameters for this atomic density calculation are set via the parameters.descriptors object. Default is False, except for when both use_gpu and use_lammps are True, in which case this value will be set to True as well.

property use_ddp: Control whether ddp is used for parallel training.

property use_gpu: Control whether a GPU is used (provided there is one).

property use_lammps: Control whether to use LAMMPS for descriptor calculation.

property use_mpi: Control whether MPI is used for paralle inference.

property verbosity

Control the level of output for MALA.

The following options are available:

0: “low”, only essential output will be printed

1: “medium”, most diagnostic output will be printed. (Default)

2: “high”, all information will be printed.

class ParametersBase[source]

Bases: JSONSerializable

Base parameter class for MALA.

classmethod from_json(json_dict)[source]

Read parameters from a dictionary saved in a JSON file.

Parameters:: json_dict (dict) – A dictionary containing all attributes, properties, etc. as saved in the json file.
Returns:: deserialized_object – The object as read from the JSON file.
Return type:: JSONSerializable

show(indent='')[source]

Print name and values of all attributes of this object.

Parameters:: indent (string) – The indent used in the list with which the parameter shows itself.

to_json()[source]

Convert this object to a dictionary that can be saved in a JSON file.

Returns:: json_dict – The object as dictionary for export to JSON.
Return type:: dict

class ParametersData[source]

Bases: ParametersBase

Parameters necessary for loading and preprocessing data.

snapshot_directories_list

A list of all added snapshots.

Type:: list

data_splitting_type

Specify how the data for validation, test and training is splitted. Currently the only supported option is by_snapshot, which splits the data by snapshot boundaries. It is also the default.

Type:: string

input_rescaling_type

Specifies how input quantities are normalized. Options:

“None”: No scaling is applied.

“standard”: Standardization (Scale to mean 0, standard deviation 1) is applied to the entire array.

“minmax”: Min-Max scaling (Scale to be in range 0…1) is applied to the entire array.

“feature-wise-standard”: Standardization (Scale to mean 0, standard deviation 1) is applied to each feature dimension individually. I.e., if your training data has dimensions (d,f), then each of the f columns with d entries is scaled indiviually.

“feature-wise-minmax”: Min-Max scaling (Scale to be in range 0…1) is applied to each feature dimension individually. I.e., if your training data has dimensions (d,f), then each of the f columns with d entries is scaled indiviually.

“normal”: (DEPRECATED) Old name for “minmax”.

“feature-wise-normal”: (DEPRECATED) Old name for “feature-wise-minmax”

Type:: string

output_rescaling_type

Specifies how output quantities are normalized. Options:

“None”: No scaling is applied.

“standard”: Standardization (Scale to mean 0, standard deviation 1) is applied to the entire array.

“minmax”: Min-Max scaling (Scale to be in range 0…1) is applied to the entire array.

“feature-wise-standard”: Standardization (Scale to mean 0, standard deviation 1) is applied to each feature dimension individually. I.e., if your training data has dimensions (d,f), then each of the f columns with d entries is scaled indiviually.

“feature-wise-minmax”: Min-Max scaling (Scale to be in range 0…1) is applied to each feature dimension individually. I.e., if your training data has dimensions (d,f), then each of the f columns with d entries is scaled indiviually.

“normal”: (DEPRECATED) Old name for “minmax”.

“feature-wise-normal”: (DEPRECATED) Old name for “feature-wise-minmax”

Type:: string

use_lazy_loading

If True, data is lazily loaded, i.e. only the snapshots that are currently needed will be kept in memory. This greatly reduces memory demands, but adds additional computational time.

Type:: bool

use_lazy_loading_prefetch

If True, will use alternative lazy loading path with prefetching for higher performance

Type:: bool

use_fast_tensor_data_set

If True, then the new, fast TensorDataSet implemented by Josh Romero will be used.

Type:: bool

shuffling_seed

If not None, a seed that will be used to make the shuffling of the data in the DataShuffler class deterministic.

Type:: int

class ParametersDataGeneration[source]

Bases: ParametersBase

All parameters to help with data generation.

trajectory_analysis_denoising_width

The distance metric is denoised prior to analysis using a certain width. This should be adjusted if there is reason to believe the trajectory will be noise for some reason.

Type:: int

trajectory_analysis_below_average_counter

Number of time steps that have to consecutively below the average of the distance metric curve, before we consider the trajectory to be equilibrated. Usually does not have to be changed.

Type:: int

trajectory_analysis_estimated_equilibrium

The analysis of the trajectory builds on the assumption that at some point of the trajectory, the system is equilibrated. For this, we need to provide the fraction of the trajectory (counted from the end). Usually, 10% is a fine assumption. This value usually does not need to be changed.

Type:: float

trajectory_analysis_correlation_metric_cutoff

Cutoff value to be used when sampling uncorrelated snapshots during trajectory analysis. If negative, a value will be determined numerically. This value is a cutoff for the minimum euclidean distance between any two ions in two subsequent ionic configurations.

Type:: float

trajectory_analysis_temperature_tolerance_percent

Maximum deviation of temperature between snapshot and desired temperature for snapshot to be considered for DFT calculation (in percent)

Type:: float

local_psp_path

Path to where the local pseudopotential is stored (for OF-DFT-MD).

Type:: string

local_psp_name

Name of the local pseudopotential (for OF-DFT-MD).

Type:: string

ofdft_timestep

Timestep of the OF-DFT-MD simulation.

Type:: int

ofdft_number_of_timesteps

Number of timesteps for the OF-DFT-MD simulation.

Type:: int

ofdft_temperature

Temperature at which to perform the OF-DFT-MD simulation.

Type:: float

ofdft_kedf

Kinetic energy functional to be used for the OF-DFT-MD simulation.

Type:: string

ofdft_friction

Friction to be added for the Langevin dynamics in the OF-DFT-MD run.

Type:: float

class ParametersDescriptors[source]

Bases: ParametersBase

Parameters necessary for calculating/parsing input descriptors.

descriptor_type

Type of descriptors that is used to represent the atomic fingerprint. Supported:

‘Bispectrum’: Bispectrum descriptors (formerly called ‘SNAP’).

‘Atomic Density’: Atomic density, calculated via Gaussian
descriptors.

Type:: string

bispectrum_twojmax

Bispectrum calculation: 2*jmax-parameter used for calculation of bispectrum descriptors. Default value for jmax is 5, so default value for twojmax is 10.

Type:: int

descriptors_contain_xyz

Legacy option. If True, it is assumed that the first three entries of the descriptor vector are the xyz coordinates and they are cut from the descriptor vector. If False, no such cutting is peformed.

Type:: bool

atomic_density_sigma

Sigma (=width) used for the calculation of the Gaussian descriptors. Explicitly setting this value is discouraged if the atomic density is used only during the total energy calculation and, e.g., bispectrum descriptors are used for models. In this case, the width will automatically be set correctly during inference based on model parameters. This parameter mainly exists for debugging purposes. If the atomic density is instead used for model training itself, this parameter needs to be set.

Type:: float

atomic_density_cutoff

Cutoff radius used for atomic density calculation. Explicitly setting this value is discouraged if the atomic density is used only during the total energy calculation and, e.g., bispectrum descriptors are used for models. In this case, the cutoff will automatically be set correctly during inference based on model parameters. This parameter mainly exists for debugging purposes. If the atomic density is instead used for model training itself, this parameter needs to be set.

Type:: float

custom_lammps_compute_file

Path to a LAMMPS compute file for the descriptor calculation. MALA has its own collection of compute files which are used by default, i.e., when this string is empty. Setting this parameter is thus not necessarys for model training and inference, and it exists mainly for debugging purposes.

Type:: str

minterpy_cutoff_cube_size

WILL BE DEPRECATED IN MALA v1.4.0 - size of cube for minterpy descriptor calculation.

Type:: float

minterpy_lp_norm

WILL BE DEPRECATED IN MALA v1.4.0 - LP norm for minterpy descriptor calculation.

Type:: int

minterpy_point_list

WILL BE DEPRECATED IN MALA v1.4.0 - list of points for minterpy descriptor calculation.

Type:: list

minterpy_polynomial_degree

WILL BE DEPRECATED IN MALA v1.4.0 - polynomial degree for minterpy descriptor calculation.

Type:: int

ace_included_expansion_ranks

List of all included expansion ranks for the ACE descriptors. These expansion ranks correspond to the many body order in the expansion of the atomic energy in many body terms. The list does can exclude terms, i.e., [1,2,4] is a valid option. Lengths have to be consistent between ace_included_expansion_ranks, ace_maximum_n_per_rank, ace_maximum_l_per_rank and ace_minimum_l_per_rank.

Type:: list

ace_maximum_n_per_rank

Maximum n for each expansion rank in the ACE descriptors. These n correspond to the n starting from equation 27 in the original ACE paper (doi.org/10.1103/PhysRevB.99.014104) Lengths have to be consistent between ace_included_expansion_ranks, ace_maximum_n_per_rank, ace_maximum_l_per_rank and ace_minimum_l_per_rank.

Type:: list

ace_maximum_l_per_rank

Maximum l for each expansion rank in the ACE descriptors. These n correspond to the n starting from equation 27 in the original ACE paper (doi.org/10.1103/PhysRevB.99.014104). Lengths have to be consistent between ace_included_expansion_ranks, ace_maximum_n_per_rank, ace_maximum_l_per_rank and ace_minimum_l_per_rank.

Type:: list

ace_minimum_l_per_rank

Minimum l for each expansion rank in the ACE descriptors. These n correspond to the n starting from equation 27 in the original ACE paper (doi.org/10.1103/PhysRevB.99.014104) Lengths have to be consistent between ace_included_expansion_ranks, ace_maximum_n_per_rank, ace_maximum_l_per_rank and ace_minimum_l_per_rank.

Type:: list

ace_balance_cutoff_radii_for_elements

If True, cutoff radii will be balanced between element types. This is helpful when dealing with elements varying drastically in size.

Type:: bool

ace_larger_cutoff_for_metals

If True (default) a slightly larger cutoff is used for metals. This is recommended.

Type:: list

ace_use_maximum_cutoff_per_element

If True, the maximum chemically reasonable cutoff will be used for all bonds. These maximum cutoff radii are based on the Van-der-Waals radii. Note that this may increase computation time!

Type:: list

ace_coupling_coefficients_type

Coupling type used for reduction of spherical harmonic products. These come into play starting from equation 28 in the original ACE paper (doi.org/10.1103/PhysRevB.99.014104). Can be “clebsch_gordan” or “wigner_3j”. This parameter usually does not have to be changed. The default is “clebsch_gordan”.

Type:: str

ace_coupling_coefficients_maximum_l

The maximum l up to which to precompute the Clebsch-Gordan/Wigner 3j symbols. These are precomputed within MALA to reduce overall computation time, but to save on storage space, precomputation is only done to a certain l (for the meaning of l, refer to the original ACE paper, doi.org/10.1103/PhysRevB.99.014104, page 5). MALA automatically recomputes the coefficients if ace_coupling_coefficients_maximum_l is increased.

Type:: int

property ace_cutoff_factor

Cutoff radius factor for ACE descriptor calculation.

This is NOT a cutoff radius itself. Rather, ACE computes on cutoff radius for every bond between element types (with grid points counting as an element type). These cutoff radii are then multiplied by this factor to get the actual cutoff radii. This factor is a global factor, and by default 2.0. Chage it carefully, since changing it may lead to an increase in computation time.

property bispectrum_cutoff: Cut off radius for bispectrum calculation.

property bispectrum_element_weights

Element species weights for the bispectrum calculation.

They are provided as an ordered list, and will be assigned to the elements alphabetically, i.e., the first entry will go to the element coming first in the alphabet and so on. Weights are always relative, so the list will be rescaled such that the largest value is 1 and all the other ones are scaled accordingly.

property bispectrum_switchflag

Switchflag for the bispectrum calculation.

Can only be 1 or 0. If 1 (default), a switching function will be used to ensure that atomic contributions smoothly go to zero after a certain cutoff. If 0 (old default, which can be problematic in some instances), this is not done, which can lead to discontinuities.

property use_y_splitting

Control whether a splitting in y-axis is used.

This can only be used in conjunction with a z-splitting, and the option will ignored if z-splitting is disabled. Only has an effect for values larger then 1.

property use_z_splitting

Control whether splitting across the z-axis is used.

Default is True, since this gives descriptors compatible with QE, for total energy evaluation. However, setting this value to False can, e.g. in the LAMMPS case, improve performance. This is relevant for e.g. preprocessing.

class ParametersHyperparameterOptimization[source]

Bases: ParametersBase

Hyperparameter optimization parameters.

direction

Controls whether to minimize or maximize the loss function. Arguments are “minimize” and “maximize” respectively.

Type:: string

n_trials

Controls how many trials are performed (when using optuna). Default: 100.

Type:: int

hlist

List containing hyperparameters, that are then passed to optuna. Supported options so far include:

learning_rate (float): learning rate of the training algorithm

layer_activation_xxx (categorical): Activation function used for the feed forward network (see Netwok parameters for supported activation functions). Note that _xxx is only so that optuna will differentiate between variables. No reordering is performed by the; the order depends on the order in the list. _xxx can be essentially anything. Please note further that you need to either only request one acitvation function (for all layers) or one for specifically for each layer.

ff_neurons_layer_xxx(int): Number of neurons per a layer. Note that _xxx is only so that optuna will differentiate between variables. No reordering is performed by MALA; the order depends on the order in the list. _xxx can be essentially anything.

Users normally don’t have to fill this list by hand, the hyperparamer optimizer provide interfaces for this task.

Type:: list

hyper_opt_methodstring

Method used for hyperparameter optimization. Currently supported:

“optuna” : Use optuna for the hyperparameter optimization.

“oat” : Use orthogonal array tuning (currently limited to categorical hyperparemeters). Range analysis is currently done by simply choosing the lowest loss.

“naswot” : Using a NAS without training, based on jacobians.

checkpoints_each_trialint

If not 0, checkpoint files will be saved after each checkpoints_each_trial trials. Currently, this only works with optuna.

checkpoint_namestring

Name used for the checkpoints. Using this, multiple runs can be performed in the same directory. Currently. this only works with optuna.

study_namestring

Name used for this study (in optuna#s storage). Necessary when operating with a RDB storage.

rdb_storagestring

Adress of the RDB storage to be used by optuna.

rdb_storage_heartbeatint

Heartbeat interval for optuna (in seconds). Default is None. If not None and above 0, optuna will record the heartbeat of intervals. If no action on a RUNNING trial is recognized for longer then this interval, then this trial will be moved to FAILED. In distributed training, setting a heartbeat is currently the only way to achieve a precise number of trials:

https://github.com/optuna/optuna/issues/1883

For optuna versions below 2.8.0, larger heartbeat intervals are detrimental to performance and should be avoided:

https://github.com/optuna/optuna/issues/2685

For MALA, no evidence for decreased performance using smaller heartbeat values could be found. So if this is used, 1s is a reasonable value.

number_training_per_trialint

Number of network trainings performed per trial. Default is 1, but it makes sense to choose a higher number, to exclude networks that performed by chance (good initilization). Naturally this impedes performance.

trial_ensemble_evaluationstring

Control how multiple trainings performed during a trial are evaluated. By default, simply “mean” is used. For smaller numbers of training per trial it might make sense to use “mean_std”, which means that the mean of all metrics plus the standard deviation is used, as an estimate of the minimal accuracy to be expected. Currently, “mean” and “mean_std” are allowed.

use_multivariatebool

If True, the optuna multivariate sampler is used. It is experimental since v2.2.0, but reported to perform very well. http://proceedings.mlr.press/v80/falkner18a.html

naswot_pruner_cutofffloat

If the surrogate loss algorithm is used as a pruner during a study, this cutoff determines which trials are neglected.

pruner: string

Pruner type to be used by optuna. Currently supported:

“multi_training”: If multiple trainings are performed per trial, and one returns “inf” for the loss, no further training will be performed. Especially useful if used in conjunction with the band_energy metric.

“naswot”: use the NASWOT algorithm as pruner

naswot_pruner_batch_sizeint

Batch size for the NASWOT pruner

number_bad_trials_before_stoppingint

Only applies to optuna studies. If any integer above 0, then if no new best trial is found within number_bad_trials_before trials after the last one, the study will be stopped.

sqlite_timeoutint

Timeout for the SQLite backend of Optuna. This backend is officially not recommended because it is file based and can lead to errors; With a suitable timeout it can be used somewhat stable though and help in HPC settings.

acsd_pointsint

Parameter of the ACSD HyperparamterOptimization scheme. Controls the number of point-pairs which are used to compute the ACSD. An array of acsd_points*acsd_points will be computed, i.e., if acsd_points=100, 100 points will be drawn at random, and thereafter each of these 100 points will be compared with a new, random set of 100 points, leading to 10000 points in total for the calculation of the ACSD.

show(indent='')[source]

Print name and values of all attributes of this object.

Parameters:: indent (string) – The indent used in the list with which the parameter shows itself.

property number_training_per_trial: Control how many trainings are run per optuna trial.

property rdb_storage_heartbeat: Control whether a heartbeat is used for distributed optuna runs.

property trial_ensemble_evaluation

Control how multiple trainings performed during a trial are evaluated.

By default, simply “mean” is used. For smaller numbers of training per trial it might make sense to use “mean_std”, which means that the mean of all metrics plus the standard deviation is used, as an estimate of the minimal accuracy to be expected. Currently, “mean” and “mean_std” are allowed.

class ParametersNetwork[source]

Bases: ParametersBase

Parameters necessary for constructing a neural network.

nn_type

Type of the neural network that will be used. Currently supported are

“feed_forward” (default)

“transformer”

“lstm”

“gru”

Type:: string

layer_sizes

A list of integers detailing the sizes of the layer of the neural network. Please note that the input layer is included therein. Default: [10,10,0]

Type:: list

layer_activations

Detailing the activation functions to be used by the neural network. If a single object is supplied, then this activation function is used for all layers (whether this applies to the output layer is controlled by layer_activations_include_output_layer). Otherwise, the activation functions are added layer by layer. Note that no activation function is applied between input layer and first hidden layer! The items in the list can either be strings (=names of torch.nn.Module activation functions), which MALA will map to the correct activation functions, torch.nn.Module objects, torch.nn.Module classes (which MALA will instantiate) OR None, in which case no activation function is used. The None can be ommitted at the end, but is useful when layers without activation functions are to be skipped in the middle. Note that output from the output layer is by default restricted to only have positive values via restrict_targets in the ParameterTargets subclass. This is similar to having a ReLU function as a final activation function and ensures the physicality of the outputs (since the (L)DOS can never be negative).

Type:: list or str or class or nn.Module

layer_activations_include_output_layer

If False, no activation function is added to the output layer. This can of course also be done by supplying just the right amount of activation functions and this parameter mainly exist to control the last layer of activation functions in the case of using layer_activations with only a single object.

Type:: bool

loss_function_type

Loss function for the neural network Currently supported loss functions include:

mse (Mean squared error; default)

Type:: string

no_hidden_state

If True hidden and cell state is assigned to zeros for LSTM Network. false will keep the hidden state active Default: False

Type:: bool

bidirection

Sets lstm network size based on bidirectional or just one direction Default: False

Type:: bool

num_hidden_layers

Number of hidden layers to be used in lstm or gru or transformer nets Default: None

Type:: int

num_heads

Number of heads to be used in Multi head attention network This should be a divisor of input dimension Default: None

Type:: int

dropout

Dropout rate for positional encoding in transformer. Default: 0.1

Type:: float

class ParametersRunning[source]

Bases: ParametersBase

Parameters needed for network runs (train, test or inference).

Some of these parameters only apply to either the train or test or inference case.

optimizer

Optimizer to be used. Supported options at the moment:

SGD: Stochastic gradient descent.
Adam: Adam Optimization Algorithm

Type:: string

learning_rate

Learning rate for chosen optimization algorithm. Default: 0.5.

Type:: float

max_number_epochs

Maximum number of epochs to train for. Default: 100.

Type:: int

mini_batch_size

Size of the mini batch for the optimization algorihm. Default: 10.

Type:: int

early_stopping_epochs

Number of epochs the validation accuracy is allowed to not improve by at leastearly_stopping_threshold, before we terminate. If 0, no early stopping is performed. Default: 0.

Type:: int

early_stopping_threshold

Minimum fractional reduction in validation loss required to avoid early stopping, e.g. a value of 0.05 means that validation loss must decrease by 5% within early_stopping_epochs epochs or the training will be stopped early. More explicitly, validation_loss < validation_loss_old * (1-early_stopping_threshold) or the patience counter goes up. Default: 0. Numbers bigger than 0 can make early stopping very aggresive, while numbers less than 0 make the trainer very forgiving of loss increase.

Type:: float

learning_rate_scheduler

Learning rate scheduler to be used. If not None, an instance of the corresponding pytorch class will be used to manage the learning rate schedule. Options:

None: No learning rate schedule will be used.

“ReduceLROnPlateau”: The learning rate will be reduced when the validation loss is plateauing.

Type:: string

learning_rate_decay

Decay rate to be used in the learning rate (if the chosen scheduler supports that). Default: 0.1

Type:: float

learning_rate_patience

Patience parameter used in the learning rate schedule (how long the validation loss has to plateau before the schedule takes effect). Default: 0.

Type:: int

num_workers

Number of workers to be used for data loading.

Type:: int

use_shuffling_for_samplers: If True, the training data will be shuffled in between epochs. If lazy loading is selected, then this shuffling will be done on a “by snapshot” basis.

checkpoints_each_epoch

If not 0, checkpoint files will be saved after each checkpoints_each_epoch epoch.

Type:: int

checkpoint_name

Name used for the checkpoints. Using this, multiple runs can be performed in the same directory.

Type:: string

checkpoint_path

Path where the checkpoints will be saved (and loaded from)

Type:: string

run_name

Name of the run used for logging.

Type:: string

logging_dir

Name of the folder that logging files will be saved to.

Type:: string

logging_dir_append_date

If True, then upon creating logging files, these will be saved in a subfolder of logging_dir labelled with the starting date of the logging, to avoid having to change input scripts often.

Type:: bool

logger

Name of the logger to be used. Currently supported are:

“tensorboard”: Tensorboard logger.

“wandb”: Weights and Biases logger.

Type:: string

logging_metrics

List of metrics to be used for logging. Default is [“ldos”]. Possible options are:

“ldos”: MSE of the LDOS.

“band_energy”: Band energy.

“band_energy_actual_fe”: Band energy computed with ground truth Fermi energy.

“total_energy”: Total energy.

“total_energy_actual_fe”: Total energy computed with ground truth Fermi energy.

“fermi_energy”: Fermi energy.

“density”: Electron density.

“density_relative”: Electron density (MAPE).

“dos”: Density of states.

“dos_relative”: Density of states (MAPE).

The units for energy metrics are meV/atom. Selected metrics are evalauted every logging_metrics_interval (see below) epochs. To use the energy metrics the validation snapshots need not be shuffled. Note that evaluating the energy metrics takes considerably longer than just LDOS and therefore it is discouraged.

Type:: list

log_metrics_on_train_set

Whether to also log metrics evaluated on the training set. Default is False.

Type:: bool

logging_metrics_interval

Determines how often (in the unit of epochs) metrics are logged. Default is 1.

Type:: int

training_log_interval

Determines how often detailed performance info is printed during training (only has an effect if the verbosity is high enough).

Type:: int

profiler_range

List with two entries determining with which batch/iteration number: the CUDA profiler will start and stop profiling. Please note that this option only holds significance if the nsys profiler is used.

Type:: list

inference_data_grid

Grid dimensions used during inference. Typically, these are automatically determined by DFT reference data, and this parameter does not need to be set. Thus, this parameter mainly exists for debugging purposes.

Type:: list

use_mixed_precision

If True, mixed precision computation (via AMP) will be used.

Type:: bool

l2_regularization

Weight decay rate for NN optimizer.

Type:: float

dropout

Dropout rate for positional encoding in transformer net.

Type:: float

property final_validation_metric

Metric for final model evaluation.

This metric is evaluated on the validation set after training. Available options are the same as for validation_metric. Default is “LDOS”, meaning that MSE of the LDOS will be used as a metric. The final validation metric is used as a target for hyperparameter optimization.

property use_graphs

Decide whether CUDA graphs are used during training.

Doing so will improve performance, but CUDA graphs are only available from CUDA 11.0 upwards.

property validation_metric

Control the metric used for validation.

Metric to be evaluated on the validation set during training. Default is “ldos”, meaning that the regular loss on the LDOS will be used as a metric.

Possible options are:

“ldos”: MSE of the LDOS.

“band_energy”: Band energy.

“band_energy_actual_fe”: Band energy computed with ground truth Fermi energy.

“total_energy”: Total energy.

“total_energy_actual_fe”: Total energy computed with ground truth Fermi energy.

“fermi_energy”: Fermi energy.

“density”: Electron density.

“density_relative”: Electron density (MAPE).

“dos”: Density of states.

“dos_relative”: Density of states (MAPE).

The units for energy metrics are meV/atom. Selected metric is evalauted after every epoch on the validation set. The validation metric is used as a criterion for early stopping and also for checkpointing the best model. Note that evaluating the energy metrics takes considerably longer than LDOS and therefore it is discouraged.

class ParametersTargets[source]

Bases: ParametersBase

Parameters necessary for calculating/parsing output quantites.

target_type

Number of points in the energy grid that is used to calculate the (L)DOS.

Type:: string

ldos_gridsize

Gridsize of the LDOS. Can either be an int or a list of ints, in which case splitting of the (L)DOS along the energy axis is assumed. Note that this splitting feature is currently experimental and the interface may change in the future. Further, if this type of splitting is used, please make sure that ldos_gridsize, ldos_gridspacing_ev and ldos_gridoffset_ev are lists of the same length.

Type:: int or list

ldos_gridspacing_ev

Gridspacing of the energy grid the (L)DOS is evaluated on [eV]. Can either be a float or a list of floats, in which case splitting of the (L)DOS along the energy axis is assumed. Note that this splitting feature is currently experimental and the interface may change in the future. Further, if this type of splitting is used, please make sure that ldos_gridsize, ldos_gridspacing_ev and ldos_gridoffset_ev are lists of the same length.

Type:: float or list

ldos_gridoffset_ev

Lowest energy value on the (L)DOS energy grid [eV]. Can either be a float or a list of floats, in which case splitting of the (L)DOS along the energy axis is assumed. Note that this splitting feature is currently experimental and the interface may change in the future. Further, if this type of splitting is used, please make sure that ldos_gridsize, ldos_gridspacing_ev and ldos_gridoffset_ev are lists of the same length.

Type:: float or list

pseudopotential_path

Path at which pseudopotentials are located (for TEM).

Type:: string

rdf_parameters

Parameters for calculating the radial distribution function(RDF). The RDF can directly be calculated via a function call, but if it is calculated e.g. during a MD or MC run, these parameters will control how. The following keywords are recognized:

number_of_binsint: Number of bins used to create the histogram.
rMaxfloat: Radius up to which to calculate the RDF. None by default; this is the suggested behavior, as MALA will then on its own calculate the maximum radius up until which the calculation of the RDF is indisputably physically meaningful. Larger radii may be specified, e.g. for a Fourier transformation to calculate the static structure factor.

Type:: dict

tpcf_parameters

Parameters for calculating the three particle correlation function (TPCF). The TPCF can directly be calculated via a function call, but if it is calculated e.g. during a MD or MC run, these parameters will control how. The following keywords are recognized:

number_of_binsint: Number of bins used to create the histogram.
rMaxfloat: Radius up to which to calculate the TPCF. If None, MALA will determine the maximum radius for which the TPCF is indisputably defined. Be advised - this may come at increased computational cost.

Type:: dict

ssf_parameters

Parameters for calculating the static structure factor (SSF). The SSF can directly be calculated via a function call, but if it is calculated e.g. during a MD or MC run, these parameters will control how. The following keywords are recognized:

number_of_binsint: Number of bins used to create the histogram.
kMaxfloat: Maximum wave vector up to which to calculate the SSF.

Type:: dict

assume_two_dimensional

If True, the total energy calculations will be performed without periodic boundary conditions in z-direction, i.e., the cell will be truncated in the z-direction. NOTE: This parameter may be moved up to a global parameter, depending on whether descriptor calculation may benefit from it.

Type:: bool

property restrict_targets

Control if and how targets are restricted to physical values.

Can be “zero_out_negative”, i.e. all negative values are set to zero or “absolute_values”, i.e. all negative values are multiplied by -1.