data_scaler

DataScaler class for scaling DFT data.

class DataScaler(typestring, use_ddp=False)[source]

Bases: object

Scales input and output data.

Sort of emulates the functionality of the scikit-learn library, but by implementing the class by ourselves we have more freedom.

Parameters:
  • typestring (string) –

    Specifies how scaling should be performed. Options:

    • ”None”: No normalization is applied.

    • ”standard”: Standardization (Scale to mean 0, standard deviation 1)

    • ”normal”: Min-Max scaling (Scale to be in range 0…1)

    • ”feature-wise-standard”: Row Standardization (Scale to mean 0, standard deviation 1)

    • ”feature-wise-normal”: Row Min-Max scaling (Scale to be in range 0…1)

  • use_ddp (bool) – If True, the DataScaler will use ddp to check that data is only saved on the root process in parallel execution.

finish_incremental_fitting()[source]

Indicate that all data has been added to the incremental calculation.

This is necessary for lazy loading.

fit(unscaled)[source]

Compute the quantities necessary for scaling.

Parameters:

unscaled (torch.Tensor) – Data that on which the scaling will be calculated.

incremental_fit(unscaled)[source]

Add data to the incremental calculation of scaling parameters.

This is necessary for lazy loading.

Parameters:

unscaled (torch.Tensor) – Data that is to be added to the fit.

inverse_transform(scaled, as_numpy=False)[source]

Transform data from scaled to unscaled.

Unscaled means real world data, scaled means data as is used in the network.

Parameters:
  • scaled (torch.Tensor) – Scaled data.

  • as_numpy (bool) – If True, a numpy array is returned, otherwsie.

Returns:

unscaled – Real world data.

Return type:

torch.Tensor

classmethod load_from_file(file, save_format='pickle')[source]

Load a saved Scaler object.

Parameters:
  • file (string or ZipExtFile) – File from which the parameters will be read.

  • save_format – File format which was used for saving.

Returns:

data_scaler – DataScaler which was read from the file.

Return type:

DataScaler

save(filename, save_format='pickle')[source]

Save the Scaler object so that it can be accessed again later.

Parameters:
  • filename (string) – File in which the parameters will be saved.

  • save_format – File format which will be used for saving.

start_incremental_fitting()[source]

Start the incremental calculation of scaling parameters.

This is necessary for lazy loading.

transform(unscaled)[source]

Transform data from unscaled to scaled.

Unscaled means real world data, scaled means data as is used in the network. Data is transformed in-place.

Parameters:

unscaled (torch.Tensor) – Real world data.

Returns:

scaled – Scaled data.

Return type:

torch.Tensor