data_scaler

DataScaler class for scaling DFT data.

class DataScaler(typestring, use_ddp=False)[source]

Bases: object

Scales input and output data.

Sort of emulates the functionality of the scikit-learn library, but by implementing the class by ourselves we have more freedom. Specifically assumes data of the form (d,f), where d=x*y*z, i.e., the product of spatial dimensions, and f is the feature dimension.

Parameters:

typestring (string) –
Specifies how scaling should be performed. Options:
- ”None”: No scaling is applied.
- ”standard”: Standardization (Scale to mean 0, standard deviation 1) is applied to the entire array.
- ”minmax”: Min-Max scaling (Scale to be in range 0…1) is applied to the entire array.
- ”feature-wise-standard”: Standardization (Scale to mean 0, standard deviation 1) is applied to each feature dimension individually. I.e., if your training data has dimensions (d,f), then each of the f columns with d entries is scaled indiviually.
- ”feature-wise-minmax”: Min-Max scaling (Scale to be in range 0…1) is applied to each feature dimension individually. I.e., if your training data has dimensions (d,f), then each of the f columns with d entries is scaled indiviually.
- ”normal”: (DEPRECATED) Old name for “minmax”.
- ”feature-wise-normal”: (DEPRECATED) Old name for “feature-wise-minmax”
use_ddp (bool) – If True, the DataScaler will use ddp to check that data is only saved on the root process in parallel execution.

cantransform

If True, this scaler is set up to perform scaling.

Type:: bool

feature_wise

(Managed internally, not set to private due to legacy issues)

Type:: bool

maxs

(Managed internally, not set to private due to legacy issues)

Type:: torch.Tensor

means

(Managed internally, not set to private due to legacy issues)

Type:: torch.Tensor

mins

(Managed internally, not set to private due to legacy issues)

Type:: torch.Tensor

scale_minmax

(Managed internally, not set to private due to legacy issues)

Type:: bool

scale_standard

(Managed internally, not set to private due to legacy issues)

Type:: bool

stds

(Managed internally, not set to private due to legacy issues)

Type:: torch.Tensor

total_data_count

(Managed internally, not set to private due to legacy issues)

Type:: int

total_max

(Managed internally, not set to private due to legacy issues)

Type:: float

total_mean

(Managed internally, not set to private due to legacy issues)

Type:: float

total_min

(Managed internally, not set to private due to legacy issues)

Type:: float

total_std

(Managed internally, not set to private due to legacy issues)

Type:: float

typestring

(Managed internally, not set to private due to legacy issues)

Type:: str

use_ddp

(Managed internally, not set to private due to legacy issues)

Type:: bool

fit(unscaled)[source]

Compute the quantities necessary for scaling.

Parameters:: unscaled (torch.Tensor) – Data that on which the scaling will be calculated.

inverse_transform(scaled, copy=False, as_numpy=False)[source]

Transform data from scaled to unscaled.

Unscaled means real world data, scaled means data as is used in the network.

Parameters:

scaled (torch.Tensor) – Scaled data.
as_numpy (bool) – If True, a numpy array is returned, otherwise a torch tensor.
copy (bool) – If False, data is modified in-place. If True, a copy of the data is modified. Default is False.

Returns:

unscaled – Real world data.

Return type:

torch.Tensor

classmethod load_from_file(file, save_format='json', auto_convert=True)[source]

Load a saved Scaler object.

Parameters:

file (string or ZipExtFile) – File from which the parameters will be read.
save_format – File format which was used for saving.
auto_convert (bool) – If True and loading from pickle format, automatically save as JSON for future use.

Returns:

data_scaler – DataScaler which was read from the file.

Return type:

partial_fit(unscaled)[source]

Add data to the incremental calculation of scaling parameters.

This is necessary for lazy loading.

Parameters:: unscaled (torch.Tensor) – Data that is to be added to the fit.

reset()[source]

Start the incremental calculation of scaling parameters.

This is necessary for lazy loading.

save(filename, save_format='json')[source]

Save the Scaler object so that it can be accessed again later.

Parameters:

filename (string) – File in which the parameters will be saved.
save_format – File format which will be used for saving. Default is “json”. Pickle format is deprecated and will be removed in future versions.

transform(unscaled, copy=False)[source]

Transform data from unscaled to scaled.

Unscaled means real world data, scaled means data as is used in the network. Data is transformed in-place.

Parameters:

unscaled (torch.Tensor) – Real world data.
copy (bool) – If False, data is modified in-place. If True, a copy of the data is modified. Default is False.

Returns:

scaled – Scaled data.

Return type:

torch.Tensor