pcntoolkit.regression_model.blr#

Bayesian Linear Regression (BLR) implementation.

This module implements Bayesian Linear Regression with support for: - L1/L2 regularization - Automatic Relevance Determination (ARD) - Heteroskedastic noise modeling - Multiple optimization methods (CG, Powell, Nelder-Mead, L-BFGS-B)

The implementation follows standard Bayesian formulation with Gaussian priors and supports both homoskedastic and heteroskedastic noise models.

Classes#

BLR

Bayesian Linear Regression model implementation.

Functions#

create_design_matrix(...)

Create design matrix for the model.

Module Contents#

class BLR(name: str = 'template', fixed_effect: bool = False, fixed_effect_slope: bool = False, fixed_effect_slope_indices: list[int] | Literal['all'] = None, heteroskedastic: bool = False, fixed_effect_var: bool = False, fixed_effect_var_slope: bool = False, fixed_effect_var_slope_indices: list[int] | Literal['all'] = None, warp_name: pcntoolkit.math_functions.warp.Optional[str] = None, warp_reparam: bool = False, basis_function_mean: pcntoolkit.math_functions.basis_function.BasisFunction = None, basis_function_var: pcntoolkit.math_functions.basis_function.BasisFunction = None, n_iter: int = 100, tol: float = 0.001, ard: bool = False, optimizer: str = 'l-bfgs-b', l_bfgs_b_l: float = 0.1, l_bfgs_b_epsilon: float = 0.1, l_bfgs_b_norm: str = 'l2', hyp0: pcntoolkit.math_functions.warp.np.ndarray | None = None, is_fitted: bool = False, is_from_dict: bool = False)#

Bases: pcntoolkit.regression_model.regression_model.RegressionModel

Bayesian Linear Regression model implementation.

This class implements Bayesian Linear Regression with various features including automatic relevance determination (ARD), heteroskedastic noise modeling, and multiple optimization methods.

Parameters:

name (str) – Unique identifier for the model instance
fixed_effect (bool, optional) – Whether to model a fixed effect in the intercept of the mean, by default False
fixed_effect_slope (bool, optional) – Whether to model a fixed effect in the slope of the mean, by default False
fixed_effect_slope_indices (list[int] | "all", optional) – If fixed_effect_slope is True, the indices of the columns in the design matrix for which to model a fixed effect in the slope ofthe mean. By default this is [0], so a fixed effect is learned on the first column of the design matrix. Set to “all” to model a fixed effect on all columns of the design matrix.
heteroskedastic (bool, optional) – Whether to use heteroskedastic noise modeling, by default False
fixed_effect_var (bool, optional) – Whether to model a fixed effect in the intercept of the variance, by default False
fixed_effect_var_slope (bool, optional) – Whether to model a fixed effect in the slope of the variance, by default False
fixed_effect_var_slope_indices (list[int] | "all", optional) – If fixed_effect_slope is True, the indices of the columns in the design matrix for which to model a fixed effect in the slope of the variance. By default this is [0], so a fixed effect is learned on the first column of the design matrix. Set to “all” to model a fixed effect on all columns of the design matrix.
warp_name (str, optional) – Name of the warp function to use, by default None. Can be one of “WarpSinhArcsinh”, “WarpLog”, “WarpBoxCox”, “WarpAffine”, “WarpCompose”
warp_reparam (bool, optional) – Whether to use a reparameterized warp function, by default False
basis_function_mean (BasisFunction, optional) – Basis function for the mean, by default None
basis_function_var (BasisFunction, optional) – Basis function for the variance, by default None
n_iter (int, optional) – Number of iterations for the optimization, by default 300
tol (float, optional) – Tolerance for the optimization, by default 1e-5
ard (bool, optional) – Whether to use automatic relevance determination, by default False
optimizer (str, optional) – Optimizer to use for the optimization, by default “l-bfgs-b”
l_bfgs_b_l (float, optional) – L-BFGS-B parameter, by default 0.1
l_bfgs_b_epsilon (float, optional) – L-BFGS-B parameter, by default 0.1
l_bfgs_b_norm (str, optional) – L-BFGS-B parameter, by default “l2”
hyp0 (np.ndarray, optional) – Initial hyperparameters, by default None
is_fitted (bool, optional) – Whether the model has been fitted, by default False
is_from_dict (bool, optional) – Whether the model was created from a dictionary, by default False

Phi_Phi_var(X: pcntoolkit.math_functions.warp.np.ndarray, be: pcntoolkit.math_functions.warp.np.ndarray) → tuple[pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray]#

backward(X: xarray.DataArray, be: xarray.DataArray, Z: xarray.DataArray) → xarray.DataArray#

Map Z values to Y space using BLR.

Parameters:

X (xr.DataArray) – Covariate data
be (xr.) – Batch effect data
Z (xr.DataArray) – Z-score data

Returns:

Z-values mapped to Y space

Return type:

xr.DataArray

be_idx_gen(be: xarray.DataArray, be_maps: dict[str, dict[str, int]]) → Generator[tuple[dict[str, object], pcntoolkit.math_functions.warp.np.ndarray]]#

Yield encoded batch-effect combinations and their masks.

Parameters:

be (xr.DataArray) – Encoded batch-effect values for each observation.
be_maps (dict[str, dict[str, int]]) – Mapping from batch-effect labels to encoded integer ids.

Yields:

tuple[dict[str, object], np.ndarray] – Encoded batch-effect combination together with its mask.

dloglik(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.np.ndarray) → pcntoolkit.math_functions.warp.np.ndarray#: Function to compute derivatives

elemwise_logp(X: xarray.DataArray, be: xarray.DataArray, Y: xarray.DataArray) → xarray.DataArray#

Compute log-probabilities for each observation in the data.

Parameters:

X (xr.DataArray) – Covariate data
be (xr.DataArray) – Batch effect data
be_maps (dict[str, dict[str, int]]) – Batch effect maps
Y (xr.DataArray) – Response variable data

Returns:

Log-probabilities of the data

Return type:

xr.DataArray

fit(X: xarray.DataArray, be: xarray.DataArray, be_maps: dict[str, dict[str, int]], Y: xarray.DataArray) → None#

Fit the Bayesian Linear Regression model to the data.

Parameters:

X (xr.DataArray) – Covariate data
be (xr.DataArray) – Batch effect data
be_maps (dict[str, dict[str, int]]) – Batch effect maps
Y (xr.DataArray) – Response variable data

Return type:

None

forward(X: xarray.DataArray, be: xarray.DataArray, Y: xarray.DataArray) → xarray.DataArray#

Map Y values to Z space using BLR.

Parameters:

X (xr.DataArray) – Covariate data
be (xr.DataArray) – Batch effect data
Y (xr.DataArray) – Response variable data

Returns:

Z-values mapped to Y space

Return type:

xr.DataArray

classmethod from_args(name: str, args: dict) → BLR#: Creates a configuration from command line arguments

classmethod from_dict(my_dict: dict, path: str | None = None) → BLR#: Creates a configuration from a dictionary.

get_warp(warp: str | None) → pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.WarpBase]#

init_hyp() → pcntoolkit.math_functions.warp.np.ndarray#

Initialize model hyperparameters.

Parameters:: data (BLRData) – Training data containing features and targets
Returns:: Initialized hyperparameter vector
Return type:: np.ndarray

initialize_warp() → None#

loglik(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None) → float#

Compute the negative log likelihood.

Parameters:

hyp (np.ndarray) – Hyperparameter vector.
X (np.ndarray) – Covariates.
y (np.ndarray) – Responses.
var_X (np.ndarray) – Variance of covariates.

Returns:

Negative log likelihood.

Return type:

float

model_specific_evaluation(path: str) → None#: Save model-specific evaluation metrics.

parse_hyps(hyp: pcntoolkit.math_functions.warp.np.ndarray, Phi: pcntoolkit.math_functions.warp.np.ndarray, Phi_var: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None) → tuple[pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray]#

Parse hyperparameters into model parameters.

Parameters:

hyp (np.ndarray) – Hyperparameter vector.
Phi (np.ndarray) – Covariates.
Phi_var (np.ndarray (Optional)) – Variance of covariates.

Returns:

Parsed alpha, beta and gamma parameters.

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray]

penalized_loglik(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None, regularizer_strength: float = 0.1, norm: Literal['L1', 'L2'] = 'L1') → float#

Compute the penalized log likelihood with L1 or L2 regularization.

Parameters:

hyp (np.ndarray) – Hyperparameter vector
X (np.ndarray) – Feature matrix
y (np.ndarray) – Target vector
var_X (np.ndarray) – Variance of features
regularizer_strength (float, optional) – Regularization strength, by default 0.1
norm ({"L1", "L2"}, optional) – Type of regularization norm, by default “L1”

Returns:

Penalized negative log likelihood value

Return type:

float

Raises:

ValueError – If norm is not “L1” or “L2”

post(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None) → None#

Compute the posterior distribution.

Parameters:

hyp (np.ndarray) – Hyperparameter vector.
X (np.ndarray) – Covariates.
y (np.ndarray) – Responses.
var_X (np.ndarray) – Variance of covariates.

predict_and_adjust(hyp, X, y, Xs=None, ys=None, var_groups_test=None, var_groups_adapt=None, **kwargs)#

Function to transfer the model to a new site. This is done by first making predictions on the adaptation data given by X, adjusting by the residuals with respect to y.

Parameters:

hyp – hyperparameter vector
X – covariates for adaptation (i.e. calibration) data
y – responses for adaptation data
Xs – covariate data (for which predictions should be adjusted)
ys – true response variables (to be adjusted)
var_groups_test – variance groups (e.g. sites) for test data
var_groups_adapt – variance groups for adaptation data

There are two possible ways of using this function, depending on whether ys or Xs is specified

If ys is specified, this is applied directly to the data, which is assumed to be in the input space (i.e. not warped). In this case the adjusted true data points are returned in the same space

Alternatively, Xs is specified, then the predictions are made and adjusted. In this case the predictive variance are returned in the warped (i.e. Gaussian) space.

This function needs to know which sites are associated with which data points, which provided by var_groups_xxx, which is a list or array of scalar ids .

to_dict(path: str | None = None) → dict#

Convert model instance to dictionary representation.

Used for saving models to disk.

Parameters:: path (str | None, optional) – Path to save any associated files, by default None
Returns:: Dictionary containing model parameters and configuration
Return type:: dict

transfer(X: xarray.DataArray, be: xarray.DataArray, be_maps: dict[str, dict[str, int]], Y: xarray.DataArray, **kwargs) → BLR#

Transfer the model to a new dataset.

Parameters:

X (xr.DataArray containing covariates)
be (xr.DataArray containing batch effects)
be_maps (dictionary of dictionaries mapping batch effect to indices)
Y (xr.DataArray containing covariates)

Returns:

New instance of the regression model, transfered to the new dataset

Return type:

RegressionModel

ys_s2(X: pcntoolkit.math_functions.warp.np.ndarray, be: pcntoolkit.math_functions.warp.np.ndarray) → tuple[pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray]#

ard = False#

basis_function_mean#

basis_function_var#

fixed_effect = False#

fixed_effect_slope = False#

fixed_effect_slope_indices = None#

fixed_effect_var = False#

fixed_effect_var_slope = False#

fixed_effect_var_slope_indices = None#

gamma: pcntoolkit.math_functions.warp.np.ndarray = None#

property has_batch_effect: bool#

Check if model includes batch effects.

Returns:: True if model includes batch effects, False otherwise
Return type:: bool

heteroskedastic = False#

hyp: pcntoolkit.math_functions.warp.np.ndarray = None#

hyp0 = None#

l_bfgs_b_epsilon = 0.1#

l_bfgs_b_l = 0.1#

l_bfgs_b_norm = 'l2'#

models_variance = False#

n_iter = 100#

optimizer = 'l-bfgs-b'#

tol = 0.001#

warp_name = None#

warp_reparam = False#

create_design_matrix(X: pcntoolkit.math_functions.warp.np.ndarray, be: pcntoolkit.math_functions.warp.np.ndarray, be_maps: dict[str, dict[str, int]], linear: bool = False, intercept: bool = False, fixed_effect: bool = False, fixed_effect_slope: bool = False, fixed_effect_slope_indices: list[int] | Literal['all'] = None) → pcntoolkit.math_functions.warp.np.ndarray#

Create design matrix for the model.

Parameters:

data (NormData) – Input data containing features and batch effects.
linear (bool, default False) – Include linear terms in the design matrix.
intercept (bool, default False) – Include intercept term in the design matrix.
fixed_effect (bool, default False) – Include fixed effect intercept for batch effects.
fixef_effect_slope (bool, default False) – Include fixed effect slope for batch effects.

Returns:

Design matrix combining all requested components.

Return type:

np.ndarray

Raises:

ValueError – If no components are selected for the design matrix.