data_scaler
DataScaler class for scaling DFT data.
- class DataScaler(typestring, use_ddp=False)[source]
Bases:
object
Scales input and output data.
Sort of emulates the functionality of the scikit-learn library, but by implementing the class by ourselves we have more freedom. Specifically assumes data of the form (d,f), where d=x*y*z, i.e., the product of spatial dimensions, and f is the feature dimension.
- Parameters:
typestring (string) –
Specifies how scaling should be performed. Options:
”None”: No scaling is applied.
”standard”: Standardization (Scale to mean 0, standard deviation 1) is applied to the entire array.
”minmax”: Min-Max scaling (Scale to be in range 0…1) is applied to the entire array.
”feature-wise-standard”: Standardization (Scale to mean 0, standard deviation 1) is applied to each feature dimension individually. I.e., if your training data has dimensions (d,f), then each of the f columns with d entries is scaled indiviually.
”feature-wise-minmax”: Min-Max scaling (Scale to be in range 0…1) is applied to each feature dimension individually. I.e., if your training data has dimensions (d,f), then each of the f columns with d entries is scaled indiviually.
”normal”: (DEPRECATED) Old name for “minmax”.
”feature-wise-normal”: (DEPRECATED) Old name for “feature-wise-minmax”
use_ddp (bool) – If True, the DataScaler will use ddp to check that data is only saved on the root process in parallel execution.
- cantransform
If True, this scaler is set up to perform scaling.
- Type:
bool
- feature_wise
(Managed internally, not set to private due to legacy issues)
- Type:
bool
- maxs
(Managed internally, not set to private due to legacy issues)
- Type:
torch.Tensor
- means
(Managed internally, not set to private due to legacy issues)
- Type:
torch.Tensor
- mins
(Managed internally, not set to private due to legacy issues)
- Type:
torch.Tensor
- scale_minmax
(Managed internally, not set to private due to legacy issues)
- Type:
bool
- scale_standard
(Managed internally, not set to private due to legacy issues)
- Type:
bool
- stds
(Managed internally, not set to private due to legacy issues)
- Type:
torch.Tensor
- total_data_count
(Managed internally, not set to private due to legacy issues)
- Type:
int
- total_max
(Managed internally, not set to private due to legacy issues)
- Type:
float
- total_mean
(Managed internally, not set to private due to legacy issues)
- Type:
float
- total_min
(Managed internally, not set to private due to legacy issues)
- Type:
float
- total_std
(Managed internally, not set to private due to legacy issues)
- Type:
float
- typestring
(Managed internally, not set to private due to legacy issues)
- Type:
str
- use_ddp
(Managed internally, not set to private due to legacy issues)
- Type:
bool
- fit(unscaled)[source]
Compute the quantities necessary for scaling.
- Parameters:
unscaled (torch.Tensor) – Data that on which the scaling will be calculated.
- inverse_transform(scaled, copy=False, as_numpy=False)[source]
Transform data from scaled to unscaled.
Unscaled means real world data, scaled means data as is used in the network.
- Parameters:
scaled (torch.Tensor) – Scaled data.
as_numpy (bool) – If True, a numpy array is returned, otherwise a torch tensor.
copy (bool) – If False, data is modified in-place. If True, a copy of the data is modified. Default is False.
- Returns:
unscaled – Real world data.
- Return type:
torch.Tensor
- classmethod load_from_file(file, save_format='pickle')[source]
Load a saved Scaler object.
- Parameters:
file (string or ZipExtFile) – File from which the parameters will be read.
save_format – File format which was used for saving.
- Returns:
data_scaler – DataScaler which was read from the file.
- Return type:
- partial_fit(unscaled)[source]
Add data to the incremental calculation of scaling parameters.
This is necessary for lazy loading.
- Parameters:
unscaled (torch.Tensor) – Data that is to be added to the fit.
- reset()[source]
Start the incremental calculation of scaling parameters.
This is necessary for lazy loading.
- save(filename, save_format='pickle')[source]
Save the Scaler object so that it can be accessed again later.
- Parameters:
filename (string) – File in which the parameters will be saved.
save_format – File format which will be used for saving.
- transform(unscaled, copy=False)[source]
Transform data from unscaled to scaled.
Unscaled means real world data, scaled means data as is used in the network. Data is transformed in-place.
- Parameters:
unscaled (torch.Tensor) – Real world data.
copy (bool) – If False, data is modified in-place. If True, a copy of the data is modified. Default is False.
- Returns:
scaled – Scaled data.
- Return type:
torch.Tensor