pcntoolkit.regression_model.blr

Bayesian Linear Regression (BLR) implementation.

This module implements Bayesian Linear Regression with support for: - L1/L2 regularization - Automatic Relevance Determination (ARD) - Heteroskedastic noise modeling - Multiple optimization methods (CG, Powell, Nelder-Mead, L-BFGS-B)

The implementation follows standard Bayesian formulation with Gaussian priors and supports both homoskedastic and heteroskedastic noise models.

Classes

BLR

Bayesian Linear Regression model implementation.

Functions

create_design_matrix(...)

Create design matrix for the model.

Module Contents

class BLR(name: str = 'template', fixed_effect: bool = False, fixed_effect_slope: bool = False, fixed_effect_slope_indices: list[int] | Literal['all'] = None, heteroskedastic: bool = False, fixed_effect_var: bool = False, fixed_effect_var_slope: bool = False, fixed_effect_var_slope_indices: list[int] | Literal['all'] = None, warp_name: pcntoolkit.math_functions.warp.Optional[str] = None, warp_reparam: bool = False, basis_function_mean: pcntoolkit.math_functions.basis_function.BasisFunction = None, basis_function_var: pcntoolkit.math_functions.basis_function.BasisFunction = None, n_iter: int = 100, tol: float = 0.001, ard: bool = False, optimizer: str = 'l-bfgs-b', l_bfgs_b_l: float = 0.1, l_bfgs_b_epsilon: float = 0.1, l_bfgs_b_norm: str = 'l2', hyp0: pcntoolkit.math_functions.warp.np.ndarray | None = None, is_fitted: bool = False, is_from_dict: bool = False)

Bases: pcntoolkit.regression_model.regression_model.RegressionModel

Bayesian Linear Regression model implementation.

This class implements Bayesian Linear Regression with various features including automatic relevance determination (ARD), heteroskedastic noise modeling, and multiple optimization methods.

This class implements Bayesian Linear Regression with various features including automatic relevance determination (ARD), heteroskedastic noise modeling, and multiple optimization methods.

Parameters:
  • name (str) – Unique identifier for the model instance

  • fixed_effect (bool, optional) – Whether to model a fixed effect in the intercept of the mean, by default False

  • fixed_effect_slope (bool, optional) – Whether to model a fixed effect in the slope of the mean, by default False

  • fixed_effect_slope_indices (list[int] | "all", optional) – If fixed_effect_slope is True, the indices of the columns in the design matrix for which to model a fixed effect in the slope ofthe mean. By default this is [0], so a fixed effect is learned on the first column of the design matrix. Set to “all” to model a fixed effect on all columns of the design matrix.

  • heteroskedastic (bool, optional) – Whether to use heteroskedastic noise modeling, by default False

  • fixed_effect_var (bool, optional) – Whether to model a fixed effect in the intercept of the variance, by default False

  • fixed_effect_var_slope (bool, optional) – Whether to model a fixed effect in the slope of the variance, by default False

  • fixed_effect_var_slope_indices (list[int] | "all", optional) – If fixed_effect_slope is True, the indices of the columns in the design matrix for which to model a fixed effect in the slope of the variance. By default this is [0], so a fixed effect is learned on the first column of the design matrix. Set to “all” to model a fixed effect on all columns of the design matrix.

  • warp_name (str, optional) – Name of the warp function to use, by default None. Can be one of “WarpSinhArcsinh”, “WarpLog”, “WarpBoxCox”, “WarpAffine”, “WarpCompose”

  • warp_reparam (bool, optional) – Whether to use a reparameterized warp function, by default False

  • basis_function_mean (BasisFunction, optional) – Basis function for the mean, by default None

  • basis_function_var (BasisFunction, optional) – Basis function for the variance, by default None

  • n_iter (int, optional) – Number of iterations for the optimization, by default 300

  • tol (float, optional) – Tolerance for the optimization, by default 1e-5

  • ard (bool, optional) – Whether to use automatic relevance determination, by default False

  • optimizer (str, optional) – Optimizer to use for the optimization, by default “l-bfgs-b”

  • l_bfgs_b_l (float, optional) – L-BFGS-B parameter, by default 0.1

  • l_bfgs_b_epsilon (float, optional) – L-BFGS-B parameter, by default 0.1

  • l_bfgs_b_norm (str, optional) – L-BFGS-B parameter, by default “l2”

  • hyp0 (np.ndarray, optional) – Initial hyperparameters, by default None

  • is_fitted (bool, optional) – Whether the model has been fitted, by default False

  • is_from_dict (bool, optional) – Whether the model was created from a dictionary, by default False

Phi_Phi_var(X: pcntoolkit.math_functions.warp.np.ndarray, be: pcntoolkit.math_functions.warp.np.ndarray) tuple[pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray]
backward(X: xarray.DataArray, be: xarray.DataArray, Z: xarray.DataArray) xarray.DataArray

Map Z values to Y space using BLR.

Parameters:
  • X (xr.DataArray) – Covariate data

  • be (xr.) – Batch effect data

  • Z (xr.DataArray) – Z-score data

Returns:

Z-values mapped to Y space

Return type:

xr.DataArray

be_idx_gen(be: xarray.DataArray, be_maps: dict[str, dict[str, int]]) Generator[tuple[dict[str, object], pcntoolkit.math_functions.warp.np.ndarray]]

Yield encoded batch-effect combinations and their masks.

Parameters:
  • be (xr.DataArray) – Encoded batch-effect values for each observation.

  • be_maps (dict[str, dict[str, int]]) – Mapping from batch-effect labels to encoded integer ids.

Yields:

tuple[dict[str, object], np.ndarray] – Encoded batch-effect combination together with its mask.

dloglik(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.np.ndarray) pcntoolkit.math_functions.warp.np.ndarray

Function to compute derivatives

elemwise_logp(X: xarray.DataArray, be: xarray.DataArray, Y: xarray.DataArray) xarray.DataArray

Compute log-probabilities for each observation in the data.

Parameters:
  • X (xr.DataArray) – Covariate data

  • be (xr.DataArray) – Batch effect data

  • be_maps (dict[str, dict[str, int]]) – Batch effect maps

  • Y (xr.DataArray) – Response variable data

Returns:

Log-probabilities of the data

Return type:

xr.DataArray

fit(X: xarray.DataArray, be: xarray.DataArray, be_maps: dict[str, dict[str, int]], Y: xarray.DataArray) None

Fit the Bayesian Linear Regression model to the data.

Parameters:
  • X (xr.DataArray) – Covariate data

  • be (xr.DataArray) – Batch effect data

  • be_maps (dict[str, dict[str, int]]) – Batch effect maps

  • Y (xr.DataArray) – Response variable data

Return type:

None

forward(X: xarray.DataArray, be: xarray.DataArray, Y: xarray.DataArray) xarray.DataArray

Map Y values to Z space using BLR.

Parameters:
  • X (xr.DataArray) – Covariate data

  • be (xr.DataArray) – Batch effect data

  • Y (xr.DataArray) – Response variable data

Returns:

Z-values mapped to Y space

Return type:

xr.DataArray

classmethod from_args(name: str, args: dict) BLR

Creates a configuration from command line arguments

classmethod from_dict(my_dict: dict, path: str | None = None) BLR

Creates a configuration from a dictionary.

get_warp(warp: str | None) pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.WarpBase]
init_hyp() pcntoolkit.math_functions.warp.np.ndarray

Initialize model hyperparameters.

Parameters:

data (BLRData) – Training data containing features and targets

Returns:

Initialized hyperparameter vector

Return type:

np.ndarray

initialize_warp() None
loglik(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None) float

Compute the negative log likelihood.

Parameters:
  • hyp (np.ndarray) – Hyperparameter vector.

  • X (np.ndarray) – Covariates.

  • y (np.ndarray) – Responses.

  • var_X (np.ndarray) – Variance of covariates.

Returns:

Negative log likelihood.

Return type:

float

model_specific_evaluation(path: str) None

Save model-specific evaluation metrics.

parse_hyps(hyp: pcntoolkit.math_functions.warp.np.ndarray, Phi: pcntoolkit.math_functions.warp.np.ndarray, Phi_var: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None) tuple[pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray]

Parse hyperparameters into model parameters.

Parameters:
  • hyp (np.ndarray) – Hyperparameter vector.

  • Phi (np.ndarray) – Covariates.

  • Phi_var (np.ndarray (Optional)) – Variance of covariates.

Returns:

Parsed alpha, beta and gamma parameters.

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray]

penalized_loglik(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None, regularizer_strength: float = 0.1, norm: Literal['L1', 'L2'] = 'L1') float

Compute the penalized log likelihood with L1 or L2 regularization.

Parameters:
  • hyp (np.ndarray) – Hyperparameter vector

  • X (np.ndarray) – Feature matrix

  • y (np.ndarray) – Target vector

  • var_X (np.ndarray) – Variance of features

  • regularizer_strength (float, optional) – Regularization strength, by default 0.1

  • norm ({"L1", "L2"}, optional) – Type of regularization norm, by default “L1”

Returns:

Penalized negative log likelihood value

Return type:

float

Raises:

ValueError – If norm is not “L1” or “L2”

post(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None) None

Compute the posterior distribution.

Parameters:
  • hyp (np.ndarray) – Hyperparameter vector.

  • X (np.ndarray) – Covariates.

  • y (np.ndarray) – Responses.

  • var_X (np.ndarray) – Variance of covariates.

predict_and_adjust(hyp, X, y, Xs=None, ys=None, var_groups_test=None, var_groups_adapt=None, **kwargs)

Function to transfer the model to a new site. This is done by first making predictions on the adaptation data given by X, adjusting by the residuals with respect to y.

Parameters:
  • hyp – hyperparameter vector

  • X – covariates for adaptation (i.e. calibration) data

  • y – responses for adaptation data

  • Xs – covariate data (for which predictions should be adjusted)

  • ys – true response variables (to be adjusted)

  • var_groups_test – variance groups (e.g. sites) for test data

  • var_groups_adapt – variance groups for adaptation data

There are two possible ways of using this function, depending on whether ys or Xs is specified

If ys is specified, this is applied directly to the data, which is assumed to be in the input space (i.e. not warped). In this case the adjusted true data points are returned in the same space

Alternatively, Xs is specified, then the predictions are made and adjusted. In this case the predictive variance are returned in the warped (i.e. Gaussian) space.

This function needs to know which sites are associated with which data points, which provided by var_groups_xxx, which is a list or array of scalar ids .

to_dict(path: str | None = None) dict

Convert model instance to dictionary representation.

Used for saving models to disk.

Parameters:

path (str | None, optional) – Path to save any associated files, by default None

Returns:

Dictionary containing model parameters and configuration

Return type:

dict

transfer(X: xarray.DataArray, be: xarray.DataArray, be_maps: dict[str, dict[str, int]], Y: xarray.DataArray, **kwargs) BLR

Transfer the model to a new dataset.

Parameters:
  • X (xr.DataArray containing covariates)

  • be (xr.DataArray containing batch effects)

  • be_maps (dictionary of dictionaries mapping batch effect to indices)

  • Y (xr.DataArray containing covariates)

Returns:

New instance of the regression model, transfered to the new dataset

Return type:

RegressionModel

ys_s2(X: pcntoolkit.math_functions.warp.np.ndarray, be: pcntoolkit.math_functions.warp.np.ndarray) tuple[pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray]
ard = False
basis_function_mean
basis_function_var
fixed_effect = False
fixed_effect_slope = False
fixed_effect_slope_indices = None
fixed_effect_var = False
fixed_effect_var_slope = False
fixed_effect_var_slope_indices = None
gamma: pcntoolkit.math_functions.warp.np.ndarray = None
property has_batch_effect: bool

Check if model includes batch effects.

Returns:

True if model includes batch effects, False otherwise

Return type:

bool

heteroskedastic = False
hyp: pcntoolkit.math_functions.warp.np.ndarray = None
hyp0 = None
l_bfgs_b_epsilon = 0.1
l_bfgs_b_l = 0.1
l_bfgs_b_norm = 'l2'
models_variance = False
n_iter = 100
optimizer = 'l-bfgs-b'
tol = 0.001
warp_name = None
warp_reparam = False
create_design_matrix(X: pcntoolkit.math_functions.warp.np.ndarray, be: pcntoolkit.math_functions.warp.np.ndarray, be_maps: dict[str, dict[str, int]], linear: bool = False, intercept: bool = False, fixed_effect: bool = False, fixed_effect_slope: bool = False, fixed_effect_slope_indices: list[int] | Literal['all'] = None) pcntoolkit.math_functions.warp.np.ndarray

Create design matrix for the model.

Parameters:
  • data (NormData) – Input data containing features and batch effects.

  • linear (bool, default False) – Include linear terms in the design matrix.

  • intercept (bool, default False) – Include intercept term in the design matrix.

  • fixed_effect (bool, default False) – Include fixed effect intercept for batch effects.

  • fixef_effect_slope (bool, default False) – Include fixed effect slope for batch effects.

Returns:

Design matrix combining all requested components.

Return type:

np.ndarray

Raises:

ValueError – If no components are selected for the design matrix.