pcntoolkit.normative_model#

Module providing the NormativeModel class, which is the main class for building and using normative models.

Classes#

NormativeModel

This class provides the foundation for building normative models, handling multiple

Module Contents#

class NormativeModel(template_regression_model: pcntoolkit.regression_model.regression_model.RegressionModel, savemodel: bool = True, evaluate_model: bool = True, saveresults: bool = True, saveplots: bool = True, save_dir: str | None = None, inscaler: str = 'standardize', outscaler: str = 'standardize', y_transform: str | None = None, name: str | None = None)#

This class provides the foundation for building normative models, handling multiple response variables through separate regression models. It manages data preprocessing, model fitting, prediction, and evaluation.

Parameters:
  • template_reg_model (RegressionModel) – Regression model used as a template to create all regression models.

  • savemodel (bool) – Whether to save the model.

  • evaluate_model (bool) – Whether to evaluate the model.

  • saveresults (bool) – Whether to save the results.

  • saveplots (bool) – Whether to save the plots.

  • save_dir (str) – Directory to save the model, results, and plots.

  • inscaler (str) – Input (X/covariates) scaler to use.

  • outscaler (str) – Output (Y/response_vars) scaler to use.

  • y_transform (str or None) – Optional transform applied to Y before fitting and inverted after prediction. Currently supported: - "log1p" applies log(Y+1) - "log" applies natural log(Y) This is useful for phenotypes that cannot be negative. Default is None (no transform).

  • name (str) – Name of the model

__getitem__(key: str) pcntoolkit.regression_model.regression_model.RegressionModel#
__setitem__(key: str, value: pcntoolkit.regression_model.regression_model.RegressionModel) None#
check_compatibility(data: pcntoolkit.dataio.norm_data.NormData) bool#

Check if the data is compatible with the model.

Parameters:

data (NormData) – Data to check compatibility with.

Returns:

True if compatible, False otherwise

Return type:

bool

compute_baseline_logp(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#

Computes the log-probability of the data under a simple Gaussian model.

The baseline model is a Gaussian with mean and standard deviation computed from the scaled Y data. This serves as a baseline model to evaluate for example the MSLL (Mean Standardized Log Loss) of our fitted model.

Parameters:

data (NormData) – Test data containing response variables (Y).

Returns:

Data with baseline_logp computed for each response variable.

Return type:

NormData

compute_centiles(data: pcntoolkit.dataio.norm_data.NormData, centiles: List[float] | numpy.ndarray | None = None, **kwargs) pcntoolkit.dataio.norm_data.NormData#

Computes the centiles for each response variable in the data.

Parameters:
  • data (NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).

  • centiles (np.ndarray, optional) – The centiles to compute. Defaults to [0.05, 0.25, 0.5, 0.75, 0.95].

Returns:

Prediction results containing: - Centiles: centiles of the response variables

Return type:

NormData

compute_correlation_matrix(data, bandwidth=5, covariate='age')#
compute_logp(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#

Computes the log-probability of the data under the fitted model.

Parameters:

data (NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).

Returns:

Prediction results containing: - logp: log-probability of the response variables per datapoint under the fitted model

Return type:

NormData

compute_thrivelines(data: pcntoolkit.dataio.norm_data.NormData, span: int = 5, step: int = 1, z_thrive: float = 0.0, covariate='age', **kwargs) pcntoolkit.dataio.norm_data.NormData#

Computes the thrivelines for each responsevar in the data

compute_yhat(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#

Computes the predicted values for each response variable in the data.

compute_zscores(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#

Computes Z-scores for each response variable using fitted regression models.

Parameters:

data (NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).

Returns:

Prediction results containing: - Zscores: z-scores of the response variables

Return type:

NormData

static elemwise_logp_baseline_model(y_scaled: numpy.ndarray) numpy.ndarray#

Compute log-probability for each observation under a baseline Gaussian model.

Parameters:

y_scaled (np.ndarray) – Scaled response variable values.

Returns:

Log-probability

Return type:

np.ndarray

evaluate(data: pcntoolkit.dataio.norm_data.NormData) None#

Evaluates the model performance on the data. This method performs the following steps: 1. Preprocesses the data

  1. Evaluates the model performance

  2. Postprocesses the data

extend(data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, n_synth_samples: int | None = None) NormativeModel#

Extends the model to a new dataset.

extend_predict(extend_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, n_synth_samples: int | None = None) NormativeModel#

Extends the model to a new dataset and predicts the data.

extract_data(data: pcntoolkit.dataio.norm_data.NormData) Tuple[xarray.DataArray, xarray.DataArray, dict[str, dict[str, int]], xarray.DataArray, xarray.DataArray]#

Returns a 5-tuple of covariates, batch effects, batch effect maps, response vars, Z-scores. If the variable is not available, returns None instead of the variable.

fit(data: pcntoolkit.dataio.norm_data.NormData) None#

Fits a regression model for each response variable in the data.

Parameters:

data (NormData) – Training data containing covariates (X), batch effects (batch_effects), and response variables (Y). Must be a valid NormData object with properly formatted dimensions: - X: (n_samples, n_covariates) - batch_effects: (n_samples, n_batch_effects) - Y: (n_samples, n_response_vars)

fit_predict(fit_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#

Combines model.fit and model.predict in a single operation.

classmethod from_args(**kwargs) NormativeModel#

Create a new normative model from command line arguments.

Parameters:

args (dict[str, str]) – A dictionary of command line arguments.

Returns:

An instance of a normative model.

Return type:

NormBase

Raises:

ValueError – If the regression model specified in the arguments is unknown.

harmonize(data: pcntoolkit.dataio.norm_data.NormData, reference_batch_effect: dict[str, str] | None = None) pcntoolkit.dataio.norm_data.NormData#

Harmonizes the data to a reference batch effect. Harmonizes to the provided reference batch effect if provided, otherwise, harmonizes to the first batch effect alphabetically.

Parameters:
  • data (NormData) – Data to harmonize.

  • reference_batch_effect (dict[str, str]) – Reference batch effect.

classmethod load(path: str, into: NormativeModel | None = None) NormativeModel#

Load a normative model from a path.

Parameters:
  • path (str) – The path to the normative model.

  • into (NormBase, optional) – The normative model to load the data into. If None, a new normative model is created. This is useful if you want to load a normative model into an existing normative model, for example in the runner.

map_batch_effects(batch_effects: xarray.DataArray) xarray.DataArray#
classmethod merge(save_dir: str, models: list[NormativeModel | str]) NormativeModel#

Merges multiple models into a single model.

model_specific_evaluation() None#

Save model-specific evaluation metrics.

postprocess(data: pcntoolkit.dataio.norm_data.NormData) None#

Apply postprocessing to the data.

First unscales, then applies the inverse response transform (e.g. expm1).

Args:

data (NormData): Data to postprocess.

predict(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#

Computes Z-scores, centiles, logp, yhat for each observation using fitted regression models.

preprocess(data: pcntoolkit.dataio.norm_data.NormData) None#

Applies preprocessing transformations to the input data.

First applies an optional response transform (e.g. log1p), then scales.

Args:

data (NormData): Data to preprocess.

register_batch_effects(data: pcntoolkit.dataio.norm_data.NormData) None#
register_data_info(data: pcntoolkit.dataio.norm_data.NormData) None#
sample_batch_effects(n_samples: int) xarray.DataArray#

Sample the batch effects from the estimated distribution.

sample_covariates(bes: xarray.DataArray, covariate_range_per_batch_effect: bool = False) xarray.DataArray#

Sample the covariates from the estimated distribution.

Uses ranges of observed covariates matched with batch effects to create a representative sample

save(path: str | None = None) None#

Save the model to a file.

Args:

path (str, optional): The path to save the model to. If None, the model is saved to the save_dir provided in the norm_conf.

scale_backward(data: pcntoolkit.dataio.norm_data.NormData) None#

Scales data back to its original scale using stored scalers.

Parameters:

data (NormData) –

Data object containing arrays to be scaled back: - X : array-like, shape (n_samples, n_covariates)

Covariate data to be scaled back

  • yarray-like, shape (n_samples, n_response_vars), optional

    Response variable data to be scaled back

scale_forward(data: pcntoolkit.dataio.norm_data.NormData, overwrite: bool = False) None#

Scales input data to standardized form using configured scalers.

Parameters:
  • data (NormData) –

    Data object containing arrays to be scaled: - X : array-like, shape (n_samples, n_covariates)

    Covariate data to be scaled

    • yarray-like, shape (n_samples, n_response_vars), optional

      Response variable data to be scaled

  • overwrite (bool, default False) – If True, creates new scalers even if they already exist. If False, uses existing scalers when available.

set_ensure_save_dirs()#

Ensures that the save directories for results and plots are created when they are not there yet (otherwise resulted in an error)

set_save_dir(save_dir: str) None#

Override the save_dir in the norm_conf.

Args:

save_dir (str): New save directory.

synthesize(data: pcntoolkit.dataio.norm_data.NormData | None = None, n_samples: int | None = None, covariate_range_per_batch_effect=False) pcntoolkit.dataio.norm_data.NormData#

Synthesize data from the model

Parameters:
  • data (NormData, optional) – A NormData object with X and batch_effects. If provided, used to generate the synthetic data. If the data has no batch_effects, batch_effects are sampled from the model. If the data has no X, X is sampled from the model, using the provided or sampled batch_effects. If neither X nor batch_effects are provided, the model is used to generate the synthetic data.

  • n_samples (int, optional) – Number of samples to synthesize. If this is None, the number of samples that were in the train data is used.

  • covariate_range_per_batch_effect (bool, optional) – If True, the covariate range is different for each batch effect.

to_dict()#
transfer(transfer_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, **kwargs) NormativeModel#

Transfers the model to a new dataset.

transfer_predict(transfer_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, **kwargs) NormativeModel#

Transfers the model to a new dataset and predicts the data.

batch_effect_counts = None#
batch_effect_covariate_ranges = None#
property batch_effect_dims: list[str]#

Returns the batch effect dimensions. Returns:

list[str]: The batch effect dimensions.

batch_effects_maps = None#
correlation_matrix = None#
covariate_ranges = None#
covariates = None#
evaluate_model: bool = True#
evaluator#
property has_batch_effect: bool#

Returns whether the model has a batch effect. Returns:

bool: True if the model has a batch effect, False otherwise. This currently looks at the template reg conf

inscaler: str = 'standardize'#
inscalers: dict#
is_fitted: bool = False#
property n_fit_observations: int#

Returns the number of batch effects. Returns:

int: The number of batch effects.

name: str | None = None#
outscaler: str = 'standardize'#
outscalers: dict#
regression_models: dict[str, pcntoolkit.regression_model.regression_model.RegressionModel]#
response_vars: list[str] = None#
property save_dir: str#
savemodel: bool = True#
saveplots: bool = True#
saveresults: bool = True#
template_regression_model: pcntoolkit.regression_model.regression_model.RegressionModel#
thrive_covariate = None#
unique_batch_effects = None#
y_transform: str | None = None#