pcntoolkit.math_functions.scaler#

Data scaling and normalization module for PCNToolkit.

This module provides various scaling implementations for data preprocessing, including standardization, min-max scaling, robust scaling, and identity scaling. All scalers implement a common interface defined by the abstract base class Scaler.

The module supports the following scaling operations:
  • Standardization (zero mean, unit variance)

  • Min-max scaling (to [0,1] range)

  • Robust min-max scaling (using percentiles)

  • Identity scaling (no transformation)

Each scaler supports:
  • Fitting to training data

  • Transforming new data

  • Inverse transforming scaled data

  • Serialization to/from dictionaries

  • Optional outlier adjustment

Available Classes#

ScalerABC

Abstract base class defining the scaler interface

StandardScaler

Standardizes features to zero mean and unit variance

MinMaxScaler

Scales features to a fixed range [0, 1]

RobustMinMaxScaler

Scales features using robust statistics based on percentiles

IdentityScaler

Passes data through unchanged

Examples

>>> from pcntoolkit.dataio.scaler import StandardScaler
>>> import numpy as np
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> scaler = StandardScaler()
>>> X_scaled = scaler.fit_transform(X)

Notes

All scalers support both fitting to the entire dataset and transforming specific indices of features, allowing for flexible scaling strategies. The scalers can be serialized to dictionaries for saving/loading trained parameters.

See also

pcntoolkit.normative_model

Uses scalers for data preprocessing

pcntoolkit.dataio.basis_expansions

Complementary data transformations

Classes#

IdentityScaler

A scaler that returns the input unchanged.

MinMaxScaler

Scale features to a fixed range (0, 1).

RobustMinMaxScaler

Scale features using robust statistics based on percentiles.

Scaler

Abstract base class for data scaling operations.

StandardScaler

Standardize features by removing the mean and scaling to unit variance.

Module Contents#

class IdentityScaler(adjust_outliers: bool = False)#

Bases: Scaler

A scaler that returns the input unchanged.

This scaler is useful as a placeholder when no scaling is desired but a scaler object is required by the API.

Parameters:

adjust_outliers (bool, optional) – Has no effect for this scaler, included for API compatibility

Examples

>>> import numpy as np
>>> from pcntoolkit.dataio.scaler import IdentityScaler
>>> X = np.array([[1, 2], [3, 4]])
>>> scaler = IdentityScaler()
>>> X_scaled = scaler.fit_transform(X)
>>> np.array_equal(X, X_scaled)
True
fit(X: numpy.typing.NDArray) None#

Compute the parameters needed for scaling.

Parameters:

X (NDArray) – Training data to fit the scaler on, shape (n_samples, n_features)

classmethod from_dict(my_dict: Dict[str, bool | str | float | List[float]], version: str | None = None) IdentityScaler#

Create a scaler instance from a dictionary.

Parameters:
  • my_dict (Dict[str, Union[bool, str, float, List[float]]]) – Dictionary containing scaler parameters. Must include ‘scaler_type’ key.

  • version (str | None, optional) – The ptk_version of the saved model, by default None. Used to apply any registered Scaler migrations.

Returns:

Instance of the appropriate scaler subclass

Return type:

Scaler

Raises:

ValueError – If scaler_type is missing or invalid

inverse_transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#

Inverse transform scaled data back to original scale.

Parameters:
  • X (NDArray) – Data to inverse transform, shape (n_samples, n_features)

  • index (Optional[NDArray], optional) – Indices of features to inverse transform, by default None (transform all)

Returns:

Inverse transformed data

Return type:

NDArray

Raises:

ValueError – If the scaler has not been fitted

to_dict() Dict[str, bool | str | float | List[float]]#

Convert scaler instance to dictionary.

transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#

Transform the data using the fitted scaler.

Parameters:
  • X (NDArray) – Data to transform, shape (n_samples, n_features)

  • index (Optional[NDArray], optional) – Indices of features to transform, by default None (transform all)

Returns:

Transformed data

Return type:

NDArray

Raises:

ValueError – If the scaler has not been fitted

class MinMaxScaler(adjust_outliers: bool = False)#

Bases: Scaler

Scale features to a fixed range (0, 1).

Transforms features by scaling each feature to a given range (default [0, 1]): X_scaled = (X - X_min) / (X_max - X_min)

Parameters:

adjust_outliers (bool, optional) – Whether to clip transformed values to [0, 1], by default True

min#

Minimum value for each feature from training data

Type:

Optional[NDArray]

max#

Maximum value for each feature from training data

Type:

Optional[NDArray]

Examples

>>> import numpy as np
>>> from pcntoolkit.dataio.scaler import MinMaxScaler
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> scaler = MinMaxScaler()
>>> X_scaled = scaler.fit_transform(X)
>>> print(X_scaled.min(axis=0))  # [0, 0]
>>> print(X_scaled.max(axis=0))  # [1, 1]
fit(X: numpy.typing.NDArray) None#

Compute the parameters needed for scaling.

Parameters:

X (NDArray) – Training data to fit the scaler on, shape (n_samples, n_features)

classmethod from_dict(my_dict: Dict[str, bool | str | float | List[float]], version: str | None = None) MinMaxScaler#

Create a scaler instance from a dictionary.

Parameters:
  • my_dict (Dict[str, Union[bool, str, float, List[float]]]) – Dictionary containing scaler parameters. Must include ‘scaler_type’ key.

  • version (str | None, optional) – The ptk_version of the saved model, by default None. Used to apply any registered Scaler migrations.

Returns:

Instance of the appropriate scaler subclass

Return type:

Scaler

Raises:

ValueError – If scaler_type is missing or invalid

inverse_transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#

Inverse transform scaled data back to original scale.

Parameters:
  • X (NDArray) – Data to inverse transform, shape (n_samples, n_features)

  • index (Optional[NDArray], optional) – Indices of features to inverse transform, by default None (transform all)

Returns:

Inverse transformed data

Return type:

NDArray

Raises:

ValueError – If the scaler has not been fitted

to_dict() Dict[str, bool | str | float | List[float]]#

Convert scaler instance to dictionary.

transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#

Transform the data using the fitted scaler.

Parameters:
  • X (NDArray) – Data to transform, shape (n_samples, n_features)

  • index (Optional[NDArray], optional) – Indices of features to transform, by default None (transform all)

Returns:

Transformed data

Return type:

NDArray

Raises:

ValueError – If the scaler has not been fitted

max: numpy.typing.NDArray | None = None#
min: numpy.typing.NDArray | None = None#
class RobustMinMaxScaler(adjust_outliers: bool = False, tail: float = 0.05)#

Bases: MinMaxScaler

Scale features using robust statistics based on percentiles.

Similar to MinMaxScaler but uses percentile-based statistics to be robust to outliers.

Parameters:
  • adjust_outliers (bool, optional) – Whether to clip transformed values to [0, 1], by default True

  • tail (float, optional) – The percentile to use for computing robust min/max, by default 0.05 (5th and 95th percentiles)

min#

Robust minimum for each feature from training data

Type:

Optional[NDArray]

max#

Robust maximum for each feature from training data

Type:

Optional[NDArray]

tail#

The percentile value used for robust statistics

Type:

float

Examples

>>> import numpy as np
>>> from pcntoolkit.dataio.scaler import RobustMinMaxScaler
>>> X = np.array([[1, 2], [3, 4], [100, 6]])  # with outlier
>>> scaler = RobustMinMaxScaler(tail=0.1)
>>> X_scaled = scaler.fit_transform(X)
fit(X: numpy.typing.NDArray) None#

Compute the parameters needed for scaling.

Parameters:

X (NDArray) – Training data to fit the scaler on, shape (n_samples, n_features)

classmethod from_dict(my_dict: Dict[str, bool | str | float | List[float]], version: str | None = None) RobustMinMaxScaler#

Create a scaler instance from a dictionary.

Parameters:
  • my_dict (Dict[str, Union[bool, str, float, List[float]]]) – Dictionary containing scaler parameters. Must include ‘scaler_type’ key.

  • version (str | None, optional) – The ptk_version of the saved model, by default None. Used to apply any registered Scaler migrations.

Returns:

Instance of the appropriate scaler subclass

Return type:

Scaler

Raises:

ValueError – If scaler_type is missing or invalid

to_dict() Dict[str, bool | str | float | List[float]]#

Convert scaler instance to dictionary.

tail = 0.05#
class Scaler(adjust_outliers: bool = False)#

Bases: abc.ABC

Abstract base class for data scaling operations.

This class defines the interface for all scaling operations in PCNToolkit. Concrete implementations must implement fit, transform, inverse_transform, and to_dict methods.

Parameters:

adjust_outliers (bool, optional) – Whether to clip values to valid ranges, by default True

Notes

All scaling operations support both fitting to the entire dataset and transforming specific indices of features, allowing for flexible scaling strategies.

abstractmethod fit(X: numpy.typing.NDArray) None#

Compute the parameters needed for scaling.

Parameters:

X (NDArray) – Training data to fit the scaler on, shape (n_samples, n_features)

fit_transform(X: numpy.typing.NDArray) numpy.typing.NDArray#

Fit the scaler and transform the data in one step.

Parameters:

X (NDArray) – Data to fit and transform, shape (n_samples, n_features)

Returns:

Transformed data

Return type:

NDArray

classmethod from_dict(my_dict: Dict[str, bool | str | float | List[float]], version: str | None = None) Scaler#

Create a scaler instance from a dictionary.

Parameters:
  • my_dict (Dict[str, Union[bool, str, float, List[float]]]) – Dictionary containing scaler parameters. Must include ‘scaler_type’ key.

  • version (str | None, optional) – The ptk_version of the saved model, by default None. Used to apply any registered Scaler migrations.

Returns:

Instance of the appropriate scaler subclass

Return type:

Scaler

Raises:

ValueError – If scaler_type is missing or invalid

static from_string(scaler_type: str, **kwargs: Any) Scaler#

Create a scaler instance from a string identifier.

Parameters:
  • scaler_type (str) – The type of scaling to apply. Options are: - “standardize”: zero mean, unit variance - “minmax”: scale to range [0,1] - “robminmax”: robust minmax scaling using percentiles - “id” or “none”: no scaling

  • **kwargs (dict) – Additional arguments to pass to the scaler constructor

Returns:

Instance of the appropriate scaler class

Return type:

Scaler

abstractmethod inverse_transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#

Inverse transform scaled data back to original scale.

Parameters:
  • X (NDArray) – Data to inverse transform, shape (n_samples, n_features)

  • index (Optional[NDArray], optional) – Indices of features to inverse transform, by default None (transform all)

Returns:

Inverse transformed data

Return type:

NDArray

Raises:

ValueError – If the scaler has not been fitted

abstractmethod to_dict() Dict[str, bool | str | float | List[float]]#

Convert scaler instance to dictionary.

abstractmethod transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#

Transform the data using the fitted scaler.

Parameters:
  • X (NDArray) – Data to transform, shape (n_samples, n_features)

  • index (Optional[NDArray], optional) – Indices of features to transform, by default None (transform all)

Returns:

Transformed data

Return type:

NDArray

Raises:

ValueError – If the scaler has not been fitted

adjust_outliers = False#
class StandardScaler(adjust_outliers: bool = False)#

Bases: Scaler

Standardize features by removing the mean and scaling to unit variance.

This scaler transforms the data to have zero mean and unit variance: z = (x - μ) / σ

Parameters:

adjust_outliers (bool, optional) – Whether to clip extreme values, by default True

m#

Mean of the training data

Type:

Optional[NDArray]

s#

Standard deviation of the training data

Type:

Optional[NDArray]

Examples

>>> import numpy as np
>>> from pcntoolkit.dataio.scaler import StandardScaler
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> scaler = StandardScaler()
>>> X_scaled = scaler.fit_transform(X)
>>> print(X_scaled.mean(axis=0))  # approximately [0, 0]
>>> print(X_scaled.std(axis=0))  # approximately [1, 1]
fit(X: numpy.typing.NDArray) None#

Compute the parameters needed for scaling.

Parameters:

X (NDArray) – Training data to fit the scaler on, shape (n_samples, n_features)

classmethod from_dict(my_dict: Dict[str, bool | str | float | List[float]], version: str | None = None) StandardScaler#

Create a scaler instance from a dictionary.

Parameters:
  • my_dict (Dict[str, Union[bool, str, float, List[float]]]) – Dictionary containing scaler parameters. Must include ‘scaler_type’ key.

  • version (str | None, optional) – The ptk_version of the saved model, by default None. Used to apply any registered Scaler migrations.

Returns:

Instance of the appropriate scaler subclass

Return type:

Scaler

Raises:

ValueError – If scaler_type is missing or invalid

inverse_transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#

Inverse transform scaled data back to original scale.

Parameters:
  • X (NDArray) – Data to inverse transform, shape (n_samples, n_features)

  • index (Optional[NDArray], optional) – Indices of features to inverse transform, by default None (transform all)

Returns:

Inverse transformed data

Return type:

NDArray

Raises:

ValueError – If the scaler has not been fitted

to_dict() Dict[str, bool | str | float | List[float]]#

Convert scaler instance to dictionary.

transform(X: numpy.typing.NDArray, index: numpy.typing.NDArray | None = None) numpy.typing.NDArray#

Transform the data using the fitted scaler.

Parameters:
  • X (NDArray) – Data to transform, shape (n_samples, n_features)

  • index (Optional[NDArray], optional) – Indices of features to transform, by default None (transform all)

Returns:

Transformed data

Return type:

NDArray

Raises:

ValueError – If the scaler has not been fitted

m: numpy.typing.NDArray | None = None#
s: numpy.typing.NDArray | None = None#