pcntoolkit.math_functions.scaler#
Data scaling and normalization module for PCNToolkit.
This module provides various scaling implementations for data preprocessing, including standardization, min-max scaling, robust scaling, and identity scaling. All scalers implement a common interface defined by the abstract base class Scaler.
- The module supports the following scaling operations:
Standardization (zero mean, unit variance)
Min-max scaling (to [0,1] range)
Robust min-max scaling (using percentiles)
Identity scaling (no transformation)
- Each scaler supports:
Fitting to training data
Transforming new data
Inverse transforming scaled data
Serialization to/from dictionaries
Optional outlier adjustment
Available Classes#
- ScalerABC
Abstract base class defining the scaler interface
- StandardScaler
Standardizes features to zero mean and unit variance
- MinMaxScaler
Scales features to a fixed range [0, 1]
- RobustMinMaxScaler
Scales features using robust statistics based on percentiles
- IdentityScaler
Passes data through unchanged
Examples
>>> from pcntoolkit.dataio.scaler import StandardScaler
>>> import numpy as np
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> scaler = StandardScaler()
>>> X_scaled = scaler.fit_transform(X)
Notes
All scalers support both fitting to the entire dataset and transforming specific indices of features, allowing for flexible scaling strategies. The scalers can be serialized to dictionaries for saving/loading trained parameters.
See also
pcntoolkit.normative_modelUses scalers for data preprocessing
pcntoolkit.dataio.basis_expansionsComplementary data transformations
Classes#
A scaler that returns the input unchanged. |
|
Scale features to a fixed range (0, 1). |
|
Scale features using robust statistics based on percentiles. |
|
Abstract base class for data scaling operations. |
|
Standardize features by removing the mean and scaling to unit variance. |
Module Contents#
- class IdentityScaler(adjust_outliers: bool = False)#
Bases:
ScalerA scaler that returns the input unchanged.
This scaler is useful as a placeholder when no scaling is desired but a scaler object is required by the API.
- Parameters:
adjust_outliers (
bool, optional) – Has no effect for this scaler, included for API compatibility
Examples
>>> import numpy as np >>> from pcntoolkit.dataio.scaler import IdentityScaler >>> X = np.array([[1, 2], [3, 4]]) >>> scaler = IdentityScaler() >>> X_scaled = scaler.fit_transform(X) >>> np.array_equal(X, X_scaled) True
- fit(X: numpy.typing.NDArray) None#
Compute the parameters needed for scaling.
- Parameters:
X (
NDArray) – Training data to fit the scaler on, shape (n_samples, n_features)
- classmethod from_dict(my_dict: Dict[str, bool | str | float | List[float]], version: str | None = None) IdentityScaler#
Create a scaler instance from a dictionary.
- Parameters:
- Returns:
Instance of the appropriate scaler subclass
- Return type:
- Raises:
ValueError – If scaler_type is missing or invalid
- inverse_transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#
Inverse transform scaled data back to original scale.
- Parameters:
X (
NDArray) – Data to inverse transform, shape (n_samples, n_features)index (
Optional[NDArray], optional) – Indices of features to inverse transform, by default None (transform all)
- Returns:
Inverse transformed data
- Return type:
NDArray- Raises:
ValueError – If the scaler has not been fitted
- transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#
Transform the data using the fitted scaler.
- Parameters:
X (
NDArray) – Data to transform, shape (n_samples, n_features)index (
Optional[NDArray], optional) – Indices of features to transform, by default None (transform all)
- Returns:
Transformed data
- Return type:
NDArray- Raises:
ValueError – If the scaler has not been fitted
- class MinMaxScaler(adjust_outliers: bool = False)#
Bases:
ScalerScale features to a fixed range (0, 1).
Transforms features by scaling each feature to a given range (default [0, 1]): X_scaled = (X - X_min) / (X_max - X_min)
- Parameters:
adjust_outliers (
bool, optional) – Whether to clip transformed values to [0, 1], by default True
- min#
Minimum value for each feature from training data
- Type:
Optional[NDArray]
- max#
Maximum value for each feature from training data
- Type:
Optional[NDArray]
Examples
>>> import numpy as np >>> from pcntoolkit.dataio.scaler import MinMaxScaler >>> X = np.array([[1, 2], [3, 4], [5, 6]]) >>> scaler = MinMaxScaler() >>> X_scaled = scaler.fit_transform(X) >>> print(X_scaled.min(axis=0)) # [0, 0] >>> print(X_scaled.max(axis=0)) # [1, 1]
- fit(X: numpy.typing.NDArray) None#
Compute the parameters needed for scaling.
- Parameters:
X (
NDArray) – Training data to fit the scaler on, shape (n_samples, n_features)
- classmethod from_dict(my_dict: Dict[str, bool | str | float | List[float]], version: str | None = None) MinMaxScaler#
Create a scaler instance from a dictionary.
- Parameters:
- Returns:
Instance of the appropriate scaler subclass
- Return type:
- Raises:
ValueError – If scaler_type is missing or invalid
- inverse_transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#
Inverse transform scaled data back to original scale.
- Parameters:
X (
NDArray) – Data to inverse transform, shape (n_samples, n_features)index (
Optional[NDArray], optional) – Indices of features to inverse transform, by default None (transform all)
- Returns:
Inverse transformed data
- Return type:
NDArray- Raises:
ValueError – If the scaler has not been fitted
- transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#
Transform the data using the fitted scaler.
- Parameters:
X (
NDArray) – Data to transform, shape (n_samples, n_features)index (
Optional[NDArray], optional) – Indices of features to transform, by default None (transform all)
- Returns:
Transformed data
- Return type:
NDArray- Raises:
ValueError – If the scaler has not been fitted
- class RobustMinMaxScaler(adjust_outliers: bool = False, tail: float = 0.05)#
Bases:
MinMaxScalerScale features using robust statistics based on percentiles.
Similar to MinMaxScaler but uses percentile-based statistics to be robust to outliers.
- Parameters:
- min#
Robust minimum for each feature from training data
- Type:
Optional[NDArray]
- max#
Robust maximum for each feature from training data
- Type:
Optional[NDArray]
Examples
>>> import numpy as np >>> from pcntoolkit.dataio.scaler import RobustMinMaxScaler >>> X = np.array([[1, 2], [3, 4], [100, 6]]) # with outlier >>> scaler = RobustMinMaxScaler(tail=0.1) >>> X_scaled = scaler.fit_transform(X)
- fit(X: numpy.typing.NDArray) None#
Compute the parameters needed for scaling.
- Parameters:
X (
NDArray) – Training data to fit the scaler on, shape (n_samples, n_features)
- classmethod from_dict(my_dict: Dict[str, bool | str | float | List[float]], version: str | None = None) RobustMinMaxScaler#
Create a scaler instance from a dictionary.
- Parameters:
- Returns:
Instance of the appropriate scaler subclass
- Return type:
- Raises:
ValueError – If scaler_type is missing or invalid
- tail = 0.05#
- class Scaler(adjust_outliers: bool = False)#
Bases:
abc.ABCAbstract base class for data scaling operations.
This class defines the interface for all scaling operations in PCNToolkit. Concrete implementations must implement fit, transform, inverse_transform, and to_dict methods.
- Parameters:
adjust_outliers (
bool, optional) – Whether to clip values to valid ranges, by default True
Notes
All scaling operations support both fitting to the entire dataset and transforming specific indices of features, allowing for flexible scaling strategies.
- abstractmethod fit(X: numpy.typing.NDArray) None#
Compute the parameters needed for scaling.
- Parameters:
X (
NDArray) – Training data to fit the scaler on, shape (n_samples, n_features)
- fit_transform(X: numpy.typing.NDArray) numpy.typing.NDArray#
Fit the scaler and transform the data in one step.
- Parameters:
X (
NDArray) – Data to fit and transform, shape (n_samples, n_features)- Returns:
Transformed data
- Return type:
NDArray
- classmethod from_dict(my_dict: Dict[str, bool | str | float | List[float]], version: str | None = None) Scaler#
Create a scaler instance from a dictionary.
- Parameters:
- Returns:
Instance of the appropriate scaler subclass
- Return type:
- Raises:
ValueError – If scaler_type is missing or invalid
- static from_string(scaler_type: str, **kwargs: Any) Scaler#
Create a scaler instance from a string identifier.
- Parameters:
- Returns:
Instance of the appropriate scaler class
- Return type:
- abstractmethod inverse_transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#
Inverse transform scaled data back to original scale.
- Parameters:
X (
NDArray) – Data to inverse transform, shape (n_samples, n_features)index (
Optional[NDArray], optional) – Indices of features to inverse transform, by default None (transform all)
- Returns:
Inverse transformed data
- Return type:
NDArray- Raises:
ValueError – If the scaler has not been fitted
- abstractmethod to_dict() Dict[str, bool | str | float | List[float]]#
Convert scaler instance to dictionary.
- abstractmethod transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#
Transform the data using the fitted scaler.
- Parameters:
X (
NDArray) – Data to transform, shape (n_samples, n_features)index (
Optional[NDArray], optional) – Indices of features to transform, by default None (transform all)
- Returns:
Transformed data
- Return type:
NDArray- Raises:
ValueError – If the scaler has not been fitted
- adjust_outliers = False#
- class StandardScaler(adjust_outliers: bool = False)#
Bases:
ScalerStandardize features by removing the mean and scaling to unit variance.
This scaler transforms the data to have zero mean and unit variance: z = (x - μ) / σ
- Parameters:
adjust_outliers (
bool, optional) – Whether to clip extreme values, by default True
- m#
Mean of the training data
- Type:
Optional[NDArray]
- s#
Standard deviation of the training data
- Type:
Optional[NDArray]
Examples
>>> import numpy as np >>> from pcntoolkit.dataio.scaler import StandardScaler >>> X = np.array([[1, 2], [3, 4], [5, 6]]) >>> scaler = StandardScaler() >>> X_scaled = scaler.fit_transform(X) >>> print(X_scaled.mean(axis=0)) # approximately [0, 0] >>> print(X_scaled.std(axis=0)) # approximately [1, 1]
- fit(X: numpy.typing.NDArray) None#
Compute the parameters needed for scaling.
- Parameters:
X (
NDArray) – Training data to fit the scaler on, shape (n_samples, n_features)
- classmethod from_dict(my_dict: Dict[str, bool | str | float | List[float]], version: str | None = None) StandardScaler#
Create a scaler instance from a dictionary.
- Parameters:
- Returns:
Instance of the appropriate scaler subclass
- Return type:
- Raises:
ValueError – If scaler_type is missing or invalid
- inverse_transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#
Inverse transform scaled data back to original scale.
- Parameters:
X (
NDArray) – Data to inverse transform, shape (n_samples, n_features)index (
Optional[NDArray], optional) – Indices of features to inverse transform, by default None (transform all)
- Returns:
Inverse transformed data
- Return type:
NDArray- Raises:
ValueError – If the scaler has not been fitted
- transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#
Transform the data using the fitted scaler.
- Parameters:
X (
NDArray) – Data to transform, shape (n_samples, n_features)index (
Optional[NDArray], optional) – Indices of features to transform, by default None (transform all)
- Returns:
Transformed data
- Return type:
NDArray- Raises:
ValueError – If the scaler has not been fitted