pcntoolkit.regression_model.blr#
Bayesian Linear Regression (BLR) implementation.
This module implements Bayesian Linear Regression with support for: - L1/L2 regularization - Automatic Relevance Determination (ARD) - Heteroskedastic noise modeling - Multiple optimization methods (CG, Powell, Nelder-Mead, L-BFGS-B)
The implementation follows standard Bayesian formulation with Gaussian priors and supports both homoskedastic and heteroskedastic noise models.
Classes#
Bayesian Linear Regression model implementation. |
Functions#
|
Create design matrix for the model. |
Module Contents#
- class BLR(name: str = 'template', fixed_effect: bool = False, fixed_effect_slope: bool = False, fixed_effect_slope_indices: list[int] | Literal['all'] = None, heteroskedastic: bool = False, fixed_effect_var: bool = False, fixed_effect_var_slope: bool = False, fixed_effect_var_slope_indices: list[int] | Literal['all'] = None, warp_name: pcntoolkit.math_functions.warp.Optional[str] = None, warp_reparam: bool = False, basis_function_mean: pcntoolkit.math_functions.basis_function.BasisFunction = None, basis_function_var: pcntoolkit.math_functions.basis_function.BasisFunction = None, n_iter: int = 100, tol: float = 0.001, ard: bool = False, optimizer: str = 'l-bfgs-b', l_bfgs_b_l: float = 0.1, l_bfgs_b_epsilon: float = 0.1, l_bfgs_b_norm: str = 'l2', hyp0: pcntoolkit.math_functions.warp.np.ndarray | None = None, is_fitted: bool = False, is_from_dict: bool = False)#
Bases:
pcntoolkit.regression_model.regression_model.RegressionModelBayesian Linear Regression model implementation.
This class implements Bayesian Linear Regression with various features including automatic relevance determination (ARD), heteroskedastic noise modeling, and multiple optimization methods.
This class implements Bayesian Linear Regression with various features including automatic relevance determination (ARD), heteroskedastic noise modeling, and multiple optimization methods.
- Parameters:
name (
str) – Unique identifier for the model instancefixed_effect (
bool, optional) – Whether to model a fixed effect in the intercept of the mean, by default Falsefixed_effect_slope (
bool, optional) – Whether to model a fixed effect in the slope of the mean, by default Falsefixed_effect_slope_indices (list[int] |
"all", optional) – If fixed_effect_slope is True, the indices of the columns in the design matrix for which to model a fixed effect in the slope ofthe mean. By default this is [0], so a fixed effect is learned on the first column of the design matrix. Set to “all” to model a fixed effect on all columns of the design matrix.heteroskedastic (
bool, optional) – Whether to use heteroskedastic noise modeling, by default Falsefixed_effect_var (
bool, optional) – Whether to model a fixed effect in the intercept of the variance, by default Falsefixed_effect_var_slope (
bool, optional) – Whether to model a fixed effect in the slope of the variance, by default Falsefixed_effect_var_slope_indices (list[int] |
"all", optional) – If fixed_effect_slope is True, the indices of the columns in the design matrix for which to model a fixed effect in the slope of the variance. By default this is [0], so a fixed effect is learned on the first column of the design matrix. Set to “all” to model a fixed effect on all columns of the design matrix.warp_name (
str, optional) – Name of the warp function to use, by default None. Can be one of “WarpSinhArcsinh”, “WarpLog”, “WarpBoxCox”, “WarpAffine”, “WarpCompose”warp_reparam (
bool, optional) – Whether to use a reparameterized warp function, by default Falsebasis_function_mean (
BasisFunction, optional) – Basis function for the mean, by default Nonebasis_function_var (
BasisFunction, optional) – Basis function for the variance, by default Nonen_iter (
int, optional) – Number of iterations for the optimization, by default 300tol (
float, optional) – Tolerance for the optimization, by default 1e-5ard (
bool, optional) – Whether to use automatic relevance determination, by default Falseoptimizer (
str, optional) – Optimizer to use for the optimization, by default “l-bfgs-b”l_bfgs_b_l (
float, optional) – L-BFGS-B parameter, by default 0.1l_bfgs_b_epsilon (
float, optional) – L-BFGS-B parameter, by default 0.1l_bfgs_b_norm (
str, optional) – L-BFGS-B parameter, by default “l2”hyp0 (
np.ndarray, optional) – Initial hyperparameters, by default Noneis_fitted (
bool, optional) – Whether the model has been fitted, by default Falseis_from_dict (
bool, optional) – Whether the model was created from a dictionary, by default False
- Phi_Phi_var(X: pcntoolkit.math_functions.warp.np.ndarray, be: pcntoolkit.math_functions.warp.np.ndarray) tuple[pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray]#
- backward(X: xarray.DataArray, be: xarray.DataArray, Z: xarray.DataArray) xarray.DataArray#
Map Z values to Y space using BLR.
- Parameters:
X (
xr.DataArray) – Covariate databe (
xr.) – Batch effect dataZ (
xr.DataArray) – Z-score data
- Returns:
Z-values mapped to Y space
- Return type:
xr.DataArray
- be_idx_gen(be: xarray.DataArray, be_maps: dict[str, dict[str, int]]) Generator[tuple[dict[str, object], pcntoolkit.math_functions.warp.np.ndarray]]#
Yield encoded batch-effect combinations and their masks.
- Parameters:
be (
xr.DataArray) – Encoded batch-effect values for each observation.be_maps (
dict[str,dict[str,int]]) – Mapping from batch-effect labels to encoded integer ids.
- Yields:
tuple[dict[str,object],np.ndarray]– Encoded batch-effect combination together with its mask.
- dloglik(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.np.ndarray) pcntoolkit.math_functions.warp.np.ndarray#
Function to compute derivatives
- elemwise_logp(X: xarray.DataArray, be: xarray.DataArray, Y: xarray.DataArray) xarray.DataArray#
Compute log-probabilities for each observation in the data.
- Parameters:
X (
xr.DataArray) – Covariate databe (
xr.DataArray) – Batch effect databe_maps (
dict[str,dict[str,int]]) – Batch effect mapsY (
xr.DataArray) – Response variable data
- Returns:
Log-probabilities of the data
- Return type:
xr.DataArray
- fit(X: xarray.DataArray, be: xarray.DataArray, be_maps: dict[str, dict[str, int]], Y: xarray.DataArray) None#
Fit the Bayesian Linear Regression model to the data.
- Parameters:
X (
xr.DataArray) – Covariate databe (
xr.DataArray) – Batch effect databe_maps (
dict[str,dict[str,int]]) – Batch effect mapsY (
xr.DataArray) – Response variable data
- Return type:
- forward(X: xarray.DataArray, be: xarray.DataArray, Y: xarray.DataArray) xarray.DataArray#
Map Y values to Z space using BLR.
- Parameters:
X (
xr.DataArray) – Covariate databe (
xr.DataArray) – Batch effect dataY (
xr.DataArray) – Response variable data
- Returns:
Z-values mapped to Y space
- Return type:
xr.DataArray
- classmethod from_args(name: str, args: dict) BLR#
Creates a configuration from command line arguments
- classmethod from_dict(my_dict: dict, path: str | None = None) BLR#
Creates a configuration from a dictionary.
- get_warp(warp: str | None) pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.WarpBase]#
- init_hyp() pcntoolkit.math_functions.warp.np.ndarray#
Initialize model hyperparameters.
- Parameters:
data (
BLRData) – Training data containing features and targets- Returns:
Initialized hyperparameter vector
- Return type:
np.ndarray
- loglik(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None) float#
Compute the negative log likelihood.
- Parameters:
hyp (
np.ndarray) – Hyperparameter vector.X (
np.ndarray) – Covariates.y (
np.ndarray) – Responses.var_X (
np.ndarray) – Variance of covariates.
- Returns:
Negative log likelihood.
- Return type:
- parse_hyps(hyp: pcntoolkit.math_functions.warp.np.ndarray, Phi: pcntoolkit.math_functions.warp.np.ndarray, Phi_var: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None) tuple[pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray]#
Parse hyperparameters into model parameters.
- Parameters:
hyp (
np.ndarray) – Hyperparameter vector.Phi (
np.ndarray) – Covariates.Phi_var (
np.ndarray (Optional)) – Variance of covariates.
- Returns:
Parsed alpha, beta and gamma parameters.
- Return type:
tuple[np.ndarray,np.ndarray,np.ndarray]
- penalized_loglik(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None, regularizer_strength: float = 0.1, norm: Literal['L1', 'L2'] = 'L1') float#
Compute the penalized log likelihood with L1 or L2 regularization.
- Parameters:
hyp (
np.ndarray) – Hyperparameter vectorX (
np.ndarray) – Feature matrixy (
np.ndarray) – Target vectorvar_X (
np.ndarray) – Variance of featuresregularizer_strength (
float, optional) – Regularization strength, by default 0.1norm (
{"L1", "L2"}, optional) – Type of regularization norm, by default “L1”
- Returns:
Penalized negative log likelihood value
- Return type:
- Raises:
ValueError – If norm is not “L1” or “L2”
- post(hyp: pcntoolkit.math_functions.warp.np.ndarray, X: pcntoolkit.math_functions.warp.np.ndarray, y: pcntoolkit.math_functions.warp.np.ndarray, var_X: pcntoolkit.math_functions.warp.Optional[pcntoolkit.math_functions.warp.np.ndarray] = None) None#
Compute the posterior distribution.
- Parameters:
hyp (
np.ndarray) – Hyperparameter vector.X (
np.ndarray) – Covariates.y (
np.ndarray) – Responses.var_X (
np.ndarray) – Variance of covariates.
- predict_and_adjust(hyp, X, y, Xs=None, ys=None, var_groups_test=None, var_groups_adapt=None, **kwargs)#
Function to transfer the model to a new site. This is done by first making predictions on the adaptation data given by X, adjusting by the residuals with respect to y.
- Parameters:
hyp – hyperparameter vector
X – covariates for adaptation (i.e. calibration) data
y – responses for adaptation data
Xs – covariate data (for which predictions should be adjusted)
ys – true response variables (to be adjusted)
var_groups_test – variance groups (e.g. sites) for test data
var_groups_adapt – variance groups for adaptation data
There are two possible ways of using this function, depending on whether ys or Xs is specified
If ys is specified, this is applied directly to the data, which is assumed to be in the input space (i.e. not warped). In this case the adjusted true data points are returned in the same space
Alternatively, Xs is specified, then the predictions are made and adjusted. In this case the predictive variance are returned in the warped (i.e. Gaussian) space.
This function needs to know which sites are associated with which data points, which provided by var_groups_xxx, which is a list or array of scalar ids .
- to_dict(path: str | None = None) dict#
Convert model instance to dictionary representation.
Used for saving models to disk.
- Parameters:
path (
str | None, optional) – Path to save any associated files, by default None- Returns:
Dictionary containing model parameters and configuration
- Return type:
- transfer(X: xarray.DataArray, be: xarray.DataArray, be_maps: dict[str, dict[str, int]], Y: xarray.DataArray, **kwargs) BLR#
Transfer the model to a new dataset.
- Parameters:
X (
xr.DataArray containing covariates)be (
xr.DataArray containing batch effects)be_maps (
dictionaryofdictionaries mapping batch effecttoindices)Y (
xr.DataArray containing covariates)
- Returns:
New instance of the regression model, transfered to the new dataset
- Return type:
RegressionModel
- ys_s2(X: pcntoolkit.math_functions.warp.np.ndarray, be: pcntoolkit.math_functions.warp.np.ndarray) tuple[pcntoolkit.math_functions.warp.np.ndarray, pcntoolkit.math_functions.warp.np.ndarray]#
- ard = False#
- basis_function_mean#
- basis_function_var#
- fixed_effect = False#
- fixed_effect_slope = False#
- fixed_effect_slope_indices = None#
- fixed_effect_var = False#
- fixed_effect_var_slope = False#
- fixed_effect_var_slope_indices = None#
- gamma: pcntoolkit.math_functions.warp.np.ndarray = None#
- property has_batch_effect: bool#
Check if model includes batch effects.
- Returns:
True if model includes batch effects, False otherwise
- Return type:
- heteroskedastic = False#
- hyp: pcntoolkit.math_functions.warp.np.ndarray = None#
- hyp0 = None#
- l_bfgs_b_epsilon = 0.1#
- l_bfgs_b_l = 0.1#
- l_bfgs_b_norm = 'l2'#
- models_variance = False#
- n_iter = 100#
- optimizer = 'l-bfgs-b'#
- tol = 0.001#
- warp_name = None#
- warp_reparam = False#
- create_design_matrix(X: pcntoolkit.math_functions.warp.np.ndarray, be: pcntoolkit.math_functions.warp.np.ndarray, be_maps: dict[str, dict[str, int]], linear: bool = False, intercept: bool = False, fixed_effect: bool = False, fixed_effect_slope: bool = False, fixed_effect_slope_indices: list[int] | Literal['all'] = None) pcntoolkit.math_functions.warp.np.ndarray#
Create design matrix for the model.
- Parameters:
data (
NormData) – Input data containing features and batch effects.linear (
bool, defaultFalse) – Include linear terms in the design matrix.intercept (
bool, defaultFalse) – Include intercept term in the design matrix.fixed_effect (
bool, defaultFalse) – Include fixed effect intercept for batch effects.fixef_effect_slope (
bool, defaultFalse) – Include fixed effect slope for batch effects.
- Returns:
Design matrix combining all requested components.
- Return type:
np.ndarray- Raises:
ValueError – If no components are selected for the design matrix.