data_scaler

DataScaler class for scaling DFT data.

class DataScaler(typestring, use_ddp=False)[source]

Bases: object

Scales input and output data.

Sort of emulates the functionality of the scikit-learn library, but by implementing the class by ourselves we have more freedom. Specifically assumes data of the form (d,f), where d=x*y*z, i.e., the product of spatial dimensions, and f is the feature dimension.

Parameters:
  • typestring (string) –

    Specifies how scaling should be performed. Options:

    • ”None”: No scaling is applied.

    • ”standard”: Standardization (Scale to mean 0, standard deviation 1) is applied to the entire array.

    • ”minmax”: Min-Max scaling (Scale to be in range 0…1) is applied to the entire array.

    • ”feature-wise-standard”: Standardization (Scale to mean 0, standard deviation 1) is applied to each feature dimension individually. I.e., if your training data has dimensions (d,f), then each of the f columns with d entries is scaled indiviually.

    • ”feature-wise-minmax”: Min-Max scaling (Scale to be in range 0…1) is applied to each feature dimension individually. I.e., if your training data has dimensions (d,f), then each of the f columns with d entries is scaled indiviually.

    • ”normal”: (DEPRECATED) Old name for “minmax”.

    • ”feature-wise-normal”: (DEPRECATED) Old name for “feature-wise-minmax”

  • use_ddp (bool) – If True, the DataScaler will use ddp to check that data is only saved on the root process in parallel execution.

cantransform

If True, this scaler is set up to perform scaling.

Type:

bool

feature_wise

(Managed internally, not set to private due to legacy issues)

Type:

bool

maxs

(Managed internally, not set to private due to legacy issues)

Type:

torch.Tensor

means

(Managed internally, not set to private due to legacy issues)

Type:

torch.Tensor

mins

(Managed internally, not set to private due to legacy issues)

Type:

torch.Tensor

scale_minmax

(Managed internally, not set to private due to legacy issues)

Type:

bool

scale_standard

(Managed internally, not set to private due to legacy issues)

Type:

bool

stds

(Managed internally, not set to private due to legacy issues)

Type:

torch.Tensor

total_data_count

(Managed internally, not set to private due to legacy issues)

Type:

int

total_max

(Managed internally, not set to private due to legacy issues)

Type:

float

total_mean

(Managed internally, not set to private due to legacy issues)

Type:

float

total_min

(Managed internally, not set to private due to legacy issues)

Type:

float

total_std

(Managed internally, not set to private due to legacy issues)

Type:

float

typestring

(Managed internally, not set to private due to legacy issues)

Type:

str

use_ddp

(Managed internally, not set to private due to legacy issues)

Type:

bool

fit(unscaled)[source]

Compute the quantities necessary for scaling.

Parameters:

unscaled (torch.Tensor) – Data that on which the scaling will be calculated.

inverse_transform(scaled, copy=False, as_numpy=False)[source]

Transform data from scaled to unscaled.

Unscaled means real world data, scaled means data as is used in the network.

Parameters:
  • scaled (torch.Tensor) – Scaled data.

  • as_numpy (bool) – If True, a numpy array is returned, otherwise a torch tensor.

  • copy (bool) – If False, data is modified in-place. If True, a copy of the data is modified. Default is False.

Returns:

unscaled – Real world data.

Return type:

torch.Tensor

classmethod load_from_file(file, save_format='pickle')[source]

Load a saved Scaler object.

Parameters:
  • file (string or ZipExtFile) – File from which the parameters will be read.

  • save_format – File format which was used for saving.

Returns:

data_scaler – DataScaler which was read from the file.

Return type:

DataScaler

partial_fit(unscaled)[source]

Add data to the incremental calculation of scaling parameters.

This is necessary for lazy loading.

Parameters:

unscaled (torch.Tensor) – Data that is to be added to the fit.

reset()[source]

Start the incremental calculation of scaling parameters.

This is necessary for lazy loading.

save(filename, save_format='pickle')[source]

Save the Scaler object so that it can be accessed again later.

Parameters:
  • filename (string) – File in which the parameters will be saved.

  • save_format – File format which will be used for saving.

transform(unscaled, copy=False)[source]

Transform data from unscaled to scaled.

Unscaled means real world data, scaled means data as is used in the network. Data is transformed in-place.

Parameters:
  • unscaled (torch.Tensor) – Real world data.

  • copy (bool) – If False, data is modified in-place. If True, a copy of the data is modified. Default is False.

Returns:

scaled – Scaled data.

Return type:

torch.Tensor