pcntoolkit.normative_model

Module providing the NormativeModel class, which is the main class for building and using normative models.

Classes

NormativeModel

This class provides the foundation for building normative models, handling multiple

Module Contents

class NormativeModel(template_regression_model: pcntoolkit.regression_model.regression_model.RegressionModel, savemodel: bool = True, evaluate_model: bool = True, saveresults: bool = True, saveplots: bool = True, save_dir: str | None = None, inscaler: str = 'standardize', outscaler: str = 'standardize', y_transform: str | None = None, name: str | None = None)

This class provides the foundation for building normative models, handling multiple response variables through separate regression models. It manages data preprocessing, model fitting, prediction, and evaluation.

Parameters:
  • template_reg_model (RegressionModel) – Regression model used as a template to create all regression models.

  • savemodel (bool) – Whether to save the model.

  • evaluate_model (bool) – Whether to evaluate the model.

  • saveresults (bool) – Whether to save the results.

  • saveplots (bool) – Whether to save the plots.

  • save_dir (str) – Directory to save the model, results, and plots.

  • inscaler (str) – Input (X/covariates) scaler to use.

  • outscaler (str) – Output (Y/response_vars) scaler to use.

  • y_transform (str or None) – Optional transform applied to Y before fitting and inverted after prediction. Currently supported: - "log1p" applies log(Y+1) - "log" applies natural log(Y) This is useful for phenotypes that cannot be negative. Default is None (no transform).

  • name (str) – Name of the model

__getitem__(key: str) pcntoolkit.regression_model.regression_model.RegressionModel
__setitem__(key: str, value: pcntoolkit.regression_model.regression_model.RegressionModel) None
check_compatibility(data: pcntoolkit.dataio.norm_data.NormData) bool

Check if the data is compatible with the model.

Parameters:

data (NormData) – Data to check compatibility with.

Returns:

True if compatible, False otherwise

Return type:

bool

compute_baseline_logp(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData

Computes the log-probability of the data under a simple Gaussian model.

The baseline model is a Gaussian with mean and standard deviation computed from the scaled Y data. This serves as a baseline model to evaluate for example the MSLL (Mean Standardized Log Loss) of our fitted model.

Parameters:

data (NormData) – Test data containing response variables (Y).

Returns:

Data with baseline_logp computed for each response variable.

Return type:

NormData

compute_centiles(data: pcntoolkit.dataio.norm_data.NormData, centiles: List[float] | numpy.ndarray | None = None, **kwargs) pcntoolkit.dataio.norm_data.NormData

Computes the centiles for each response variable in the data.

Parameters:
  • data (NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).

  • centiles (np.ndarray, optional) – The centiles to compute. Defaults to [0.05, 0.25, 0.5, 0.75, 0.95].

Returns:

Prediction results containing: - Centiles: centiles of the response variables

Return type:

NormData

compute_correlation_matrix(data, bandwidth=5, covariate='age')
compute_logp(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData

Computes the log-probability of the data under the fitted model.

Parameters:

data (NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).

Returns:

Prediction results containing: - logp: log-probability of the response variables per datapoint under the fitted model

Return type:

NormData

compute_thrivelines(data: pcntoolkit.dataio.norm_data.NormData, span: int = 5, step: int = 1, z_thrive: float = 0.0, covariate='age', **kwargs) pcntoolkit.dataio.norm_data.NormData

Computes the thrivelines for each responsevar in the data

compute_yhat(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData

Computes the predicted values for each response variable in the data.

compute_zscores(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData

Computes Z-scores for each response variable using fitted regression models.

Parameters:

data (NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).

Returns:

Prediction results containing: - Zscores: z-scores of the response variables

Return type:

NormData

static elemwise_logp_baseline_model(y_scaled: numpy.ndarray) numpy.ndarray

Compute log-probability for each observation under a baseline Gaussian model.

Parameters:

y_scaled (np.ndarray) – Scaled response variable values.

Returns:

Log-probability

Return type:

np.ndarray

evaluate(data: pcntoolkit.dataio.norm_data.NormData) None

Evaluates the model performance on the data. This method performs the following steps: 1. Preprocesses the data

  1. Evaluates the model performance

  2. Postprocesses the data

extend(data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, n_synth_samples: int | None = None) NormativeModel

Extends the model to a new dataset.

extend_predict(extend_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, n_synth_samples: int | None = None) NormativeModel

Extends the model to a new dataset and predicts the data.

extract_data(data: pcntoolkit.dataio.norm_data.NormData) Tuple[xarray.DataArray, xarray.DataArray, dict[str, dict[str, int]], xarray.DataArray, xarray.DataArray]

Returns a 5-tuple of covariates, batch effects, batch effect maps, response vars, Z-scores. If the variable is not available, returns None instead of the variable.

fit(data: pcntoolkit.dataio.norm_data.NormData) None

Fits a regression model for each response variable in the data.

Parameters:

data (NormData) – Training data containing covariates (X), batch effects (batch_effects), and response variables (Y). Must be a valid NormData object with properly formatted dimensions: - X: (n_samples, n_covariates) - batch_effects: (n_samples, n_batch_effects) - Y: (n_samples, n_response_vars)

fit_predict(fit_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData

Combines model.fit and model.predict in a single operation.

classmethod from_args(**kwargs) NormativeModel

Create a new normative model from command line arguments.

Parameters:

args (dict[str, str]) – A dictionary of command line arguments.

Returns:

An instance of a normative model.

Return type:

NormBase

Raises:

ValueError – If the regression model specified in the arguments is unknown.

harmonize(data: pcntoolkit.dataio.norm_data.NormData, reference_batch_effect: dict[str, str] | None = None) pcntoolkit.dataio.norm_data.NormData

Harmonizes the data to a reference batch effect. Harmonizes to the provided reference batch effect if provided, otherwise, harmonizes to the first batch effect alphabetically.

Parameters:
  • data (NormData) – Data to harmonize.

  • reference_batch_effect (dict[str, str]) – Reference batch effect.

classmethod load(path: str, into: NormativeModel | None = None) NormativeModel

Load a normative model from a path.

Parameters:
  • path (str) – The path to the normative model.

  • into (NormBase, optional) – The normative model to load the data into. If None, a new normative model is created. This is useful if you want to load a normative model into an existing normative model, for example in the runner.

map_batch_effects(batch_effects: xarray.DataArray) xarray.DataArray
classmethod merge(save_dir: str, models: list[NormativeModel | str]) NormativeModel

Merges multiple models into a single model.

model_specific_evaluation() None

Save model-specific evaluation metrics.

postprocess(data: pcntoolkit.dataio.norm_data.NormData) None

Apply postprocessing to the data.

First unscales, then applies the inverse response transform (e.g. expm1).

Args:

data (NormData): Data to postprocess.

predict(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData

Computes Z-scores, centiles, logp, yhat for each observation using fitted regression models.

preprocess(data: pcntoolkit.dataio.norm_data.NormData) None

Applies preprocessing transformations to the input data.

First applies an optional response transform (e.g. log1p), then scales.

Args:

data (NormData): Data to preprocess.

register_batch_effects(data: pcntoolkit.dataio.norm_data.NormData) None
register_data_info(data: pcntoolkit.dataio.norm_data.NormData) None
sample_batch_effects(n_samples: int) xarray.DataArray

Sample the batch effects from the estimated distribution.

sample_covariates(bes: xarray.DataArray, covariate_range_per_batch_effect: bool = False) xarray.DataArray

Sample the covariates from the estimated distribution.

Uses ranges of observed covariates matched with batch effects to create a representative sample

save(path: str | None = None) None

Save the model to a file.

Args:

path (str, optional): The path to save the model to. If None, the model is saved to the save_dir provided in the norm_conf.

scale_backward(data: pcntoolkit.dataio.norm_data.NormData) None

Scales data back to its original scale using stored scalers.

Parameters:

data (NormData) –

Data object containing arrays to be scaled back: - X : array-like, shape (n_samples, n_covariates)

Covariate data to be scaled back

  • yarray-like, shape (n_samples, n_response_vars), optional

    Response variable data to be scaled back

scale_forward(data: pcntoolkit.dataio.norm_data.NormData, overwrite: bool = False) None

Scales input data to standardized form using configured scalers.

Parameters:
  • data (NormData) –

    Data object containing arrays to be scaled: - X : array-like, shape (n_samples, n_covariates)

    Covariate data to be scaled

    • yarray-like, shape (n_samples, n_response_vars), optional

      Response variable data to be scaled

  • overwrite (bool, default False) – If True, creates new scalers even if they already exist. If False, uses existing scalers when available.

set_ensure_save_dirs()

Ensures that the save directories for results and plots are created when they are not there yet (otherwise resulted in an error)

set_save_dir(save_dir: str) None

Override the save_dir in the norm_conf.

Args:

save_dir (str): New save directory.

synthesize(data: pcntoolkit.dataio.norm_data.NormData | None = None, n_samples: int | None = None, covariate_range_per_batch_effect=False) pcntoolkit.dataio.norm_data.NormData

Synthesize data from the model

Parameters:
  • data (NormData, optional) – A NormData object with X and batch_effects. If provided, used to generate the synthetic data. If the data has no batch_effects, batch_effects are sampled from the model. If the data has no X, X is sampled from the model, using the provided or sampled batch_effects. If neither X nor batch_effects are provided, the model is used to generate the synthetic data.

  • n_samples (int, optional) – Number of samples to synthesize. If this is None, the number of samples that were in the train data is used.

  • covariate_range_per_batch_effect (bool, optional) – If True, the covariate range is different for each batch effect.

to_dict()
transfer(transfer_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, **kwargs) NormativeModel

Transfers the model to a new dataset.

transfer_predict(transfer_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, **kwargs) NormativeModel

Transfers the model to a new dataset and predicts the data.

batch_effect_counts = None
batch_effect_covariate_ranges = None
property batch_effect_dims: list[str]

Returns the batch effect dimensions. Returns:

list[str]: The batch effect dimensions.

batch_effects_maps = None
correlation_matrix = None
covariate_ranges = None
covariates = None
evaluate_model: bool = True
evaluator
property has_batch_effect: bool

Returns whether the model has a batch effect. Returns:

bool: True if the model has a batch effect, False otherwise. This currently looks at the template reg conf

inscaler: str = 'standardize'
inscalers: dict
is_fitted: bool = False
property n_fit_observations: int

Returns the number of batch effects. Returns:

int: The number of batch effects.

name: str | None = None
outscaler: str = 'standardize'
outscalers: dict
regression_models: dict[str, pcntoolkit.regression_model.regression_model.RegressionModel]
response_vars: list[str] = None
property save_dir: str
savemodel: bool = True
saveplots: bool = True
saveresults: bool = True
template_regression_model: pcntoolkit.regression_model.regression_model.RegressionModel
thrive_covariate = None
unique_batch_effects = None
y_transform: str | None = None