Improved hyperparameter optimization

Advanced optimization algorithms

As discussed in the MALA publication on hyperparameter optimization, advanced hyperparameter optimization strategies have been evaluated for ML-DFT models with MALA. Namely

  • NASWOT (Neural architecture search without training): A training-free hyperparameter optimization technique. It works by correlating the capability of a network to distinguish between data points at NN initialization with performance after training.

  • OAT (Orthogonal array tuning): This technique requires network training, but constructs an optimal set of trials based on orthogonal arrays (a concept from optimal design theory) from which to extract a maximum of information with a limited number of training overhead.

Both methods can easily be enabled without changing the familiar hyperparameter optimization workflow, as shown in the file advanced/ex07_advanced_hyperparameter_optimization.

These optimization algorithms are activated via the Parameters object:

# Use NASWOT
parameters.hyperparameters.hyper_opt_method = "naswot"
# Use OAT
parameters.hyperparameters.hyper_opt_method = "oat"

Both techniques are fully compatible with other MALA capabilities, with a few exceptions:

  • NASWOT: Can only be used with hyperparameters related to network architecture (layer sizes, activation functions, etc.); training related hyperparameters will be ignored, and a warning to this effect will be printed. Only "categorical" hyperparameters are supported. Can be run in parallel by setting parameters.use_mpi=True.

  • OAT: Can currently not be run in parallel. Only "categorical" hyperparameters are supported.

For more details on the mathematical background of these methods, please refer to the aforementioned publication.