pcntoolkit.normative_model#

Module providing the NormativeModel class, which is the main class for building and using normative models.

Classes#

NormativeModel

This class provides the foundation for building normative models, handling multiple

Module Contents#

class NormativeModel(template_regression_model: pcntoolkit.regression_model.regression_model.RegressionModel, savemodel: bool = True, evaluate_model: bool = True, saveresults: bool = True, saveplots: bool = True, save_dir: str | None = None, inscaler: str = 'standardize', outscaler: str = 'standardize', y_transform: str | None = None, name: str | None = None)#

This class provides the foundation for building normative models, handling multiple response variables through separate regression models. It manages data preprocessing, model fitting, prediction, and evaluation.

Parameters:

template_reg_model (RegressionModel) – Regression model used as a template to create all regression models.
savemodel (bool) – Whether to save the model.
evaluate_model (bool) – Whether to evaluate the model.
saveresults (bool) – Whether to save the results.
saveplots (bool) – Whether to save the plots.
save_dir (str) – Directory to save the model, results, and plots.
inscaler (str) – Input (X/covariates) scaler to use.
outscaler (str) – Output (Y/response_vars) scaler to use.
y_transform (str or None) – Optional transform applied to Y before fitting and inverted after prediction. Currently supported: - "log1p" applies log(Y+1) - "log" applies natural log(Y) This is useful for phenotypes that cannot be negative. Default is None (no transform).
name (str) – Name of the model

__getitem__(key: str) → pcntoolkit.regression_model.regression_model.RegressionModel#

__setitem__(key: str, value: pcntoolkit.regression_model.regression_model.RegressionModel) → None#

check_compatibility(data: pcntoolkit.dataio.norm_data.NormData) → bool#

Check if the data is compatible with the model.

Parameters:: data (NormData) – Data to check compatibility with.
Returns:: True if compatible, False otherwise
Return type:: bool

compute_baseline_logp(data: pcntoolkit.dataio.norm_data.NormData) → pcntoolkit.dataio.norm_data.NormData#

Computes the log-probability of the data under a simple Gaussian model.

The baseline model is a Gaussian with mean and standard deviation computed from the scaled Y data. This serves as a baseline model to evaluate for example the MSLL (Mean Standardized Log Loss) of our fitted model.

Parameters:: data (NormData) – Test data containing response variables (Y).
Returns:: Data with baseline_logp computed for each response variable.
Return type:: NormData

compute_centiles(data: pcntoolkit.dataio.norm_data.NormData, centiles: List[float] | numpy.ndarray | None = None, **kwargs) → pcntoolkit.dataio.norm_data.NormData#

Computes the centiles for each response variable in the data.

Parameters:

data (NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).
centiles (np.ndarray, optional) – The centiles to compute. Defaults to [0.05, 0.25, 0.5, 0.75, 0.95].

Returns:

Prediction results containing: - Centiles: centiles of the response variables

Return type:

NormData

compute_correlation_matrix(data, bandwidth=5, covariate='age')#

compute_logp(data: pcntoolkit.dataio.norm_data.NormData) → pcntoolkit.dataio.norm_data.NormData#

Computes the log-probability of the data under the fitted model.

Parameters:: data (NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).
Returns:: Prediction results containing: - logp: log-probability of the response variables per datapoint under the fitted model
Return type:: NormData

compute_thrivelines(data: pcntoolkit.dataio.norm_data.NormData, span: int = 5, step: int = 1, z_thrive: float = 0.0, covariate='age', **kwargs) → pcntoolkit.dataio.norm_data.NormData#: Computes the thrivelines for each responsevar in the data

compute_yhat(data: pcntoolkit.dataio.norm_data.NormData) → pcntoolkit.dataio.norm_data.NormData#: Computes the predicted values for each response variable in the data.

compute_zscores(data: pcntoolkit.dataio.norm_data.NormData) → pcntoolkit.dataio.norm_data.NormData#

Computes Z-scores for each response variable using fitted regression models.

Parameters:: data (NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).
Returns:: Prediction results containing: - Zscores: z-scores of the response variables
Return type:: NormData

static elemwise_logp_baseline_model(y_scaled: numpy.ndarray) → numpy.ndarray#

Compute log-probability for each observation under a baseline Gaussian model.

Parameters:: y_scaled (np.ndarray) – Scaled response variable values.
Returns:: Log-probability
Return type:: np.ndarray

evaluate(data: pcntoolkit.dataio.norm_data.NormData) → None#

Evaluates the model performance on the data. This method performs the following steps: 1. Preprocesses the data

Evaluates the model performance
Postprocesses the data

extend(data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, n_synth_samples: int | None = None) → NormativeModel#: Extends the model to a new dataset.

extend_predict(extend_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, n_synth_samples: int | None = None) → NormativeModel#: Extends the model to a new dataset and predicts the data.

extract_data(data: pcntoolkit.dataio.norm_data.NormData) → Tuple[xarray.DataArray, xarray.DataArray, dict[str, dict[str, int]], xarray.DataArray, xarray.DataArray]#: Returns a 5-tuple of covariates, batch effects, batch effect maps, response vars, Z-scores. If the variable is not available, returns None instead of the variable.

fit(data: pcntoolkit.dataio.norm_data.NormData) → None#

Fits a regression model for each response variable in the data.

Parameters:: data (NormData) – Training data containing covariates (X), batch effects (batch_effects), and response variables (Y). Must be a valid NormData object with properly formatted dimensions: - X: (n_samples, n_covariates) - batch_effects: (n_samples, n_batch_effects) - Y: (n_samples, n_response_vars)

fit_predict(fit_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData) → pcntoolkit.dataio.norm_data.NormData#: Combines model.fit and model.predict in a single operation.

classmethod from_args(**kwargs) → NormativeModel#

Create a new normative model from command line arguments.

Parameters:: args (dict[str, str]) – A dictionary of command line arguments.
Returns:: An instance of a normative model.
Return type:: NormBase
Raises:: ValueError – If the regression model specified in the arguments is unknown.

harmonize(data: pcntoolkit.dataio.norm_data.NormData, reference_batch_effect: dict[str, str] | None = None) → pcntoolkit.dataio.norm_data.NormData#

Harmonizes the data to a reference batch effect. Harmonizes to the provided reference batch effect if provided, otherwise, harmonizes to the first batch effect alphabetically.

Parameters:

data (NormData) – Data to harmonize.
reference_batch_effect (dict[str, str]) – Reference batch effect.

classmethod load(path: str, into: NormativeModel | None = None) → NormativeModel#

Load a normative model from a path.

Parameters:

path (str) – The path to the normative model.
into (NormBase, optional) – The normative model to load the data into. If None, a new normative model is created. This is useful if you want to load a normative model into an existing normative model, for example in the runner.

map_batch_effects(batch_effects: xarray.DataArray) → xarray.DataArray#

classmethod merge(save_dir: str, models: list[NormativeModel | str]) → NormativeModel#: Merges multiple models into a single model.

model_specific_evaluation() → None#: Save model-specific evaluation metrics.

postprocess(data: pcntoolkit.dataio.norm_data.NormData) → None#

Apply postprocessing to the data.

First unscales, then applies the inverse response transform (e.g. expm1).

Args:: data (NormData): Data to postprocess.

predict(data: pcntoolkit.dataio.norm_data.NormData) → pcntoolkit.dataio.norm_data.NormData#: Computes Z-scores, centiles, logp, yhat for each observation using fitted regression models.

preprocess(data: pcntoolkit.dataio.norm_data.NormData) → None#

Applies preprocessing transformations to the input data.

First applies an optional response transform (e.g. log1p), then scales.

Args:: data (NormData): Data to preprocess.

register_batch_effects(data: pcntoolkit.dataio.norm_data.NormData) → None#

register_data_info(data: pcntoolkit.dataio.norm_data.NormData) → None#

sample_batch_effects(n_samples: int) → xarray.DataArray#: Sample the batch effects from the estimated distribution.

sample_covariates(bes: xarray.DataArray, covariate_range_per_batch_effect: bool = False) → xarray.DataArray#

Sample the covariates from the estimated distribution.

Uses ranges of observed covariates matched with batch effects to create a representative sample

save(path: str | None = None) → None#

Save the model to a file.

Args:: path (str, optional): The path to save the model to. If None, the model is saved to the save_dir provided in the norm_conf.

scale_backward(data: pcntoolkit.dataio.norm_data.NormData) → None#

Scales data back to its original scale using stored scalers.

Parameters:

data (NormData) –

Data object containing arrays to be scaled back: - X : array-like, shape (n_samples, n_covariates)

Covariate data to be scaled back

yarray-like, shape (n_samples, n_response_vars), optional
Response variable data to be scaled back

scale_forward(data: pcntoolkit.dataio.norm_data.NormData, overwrite: bool = False) → None#

Scales input data to standardized form using configured scalers.

Parameters:

data (NormData) –
Data object containing arrays to be scaled: - X : array-like, shape (n_samples, n_covariates)

Covariate data to be scaled
- yarray-like, shape (n_samples, n_response_vars), optional
  Response variable data to be scaled
overwrite (bool, default False) – If True, creates new scalers even if they already exist. If False, uses existing scalers when available.

set_ensure_save_dirs()#: Ensures that the save directories for results and plots are created when they are not there yet (otherwise resulted in an error)

set_save_dir(save_dir: str) → None#

Override the save_dir in the norm_conf.

Args:: save_dir (str): New save directory.

synthesize(data: pcntoolkit.dataio.norm_data.NormData | None = None, n_samples: int | None = None, covariate_range_per_batch_effect=False) → pcntoolkit.dataio.norm_data.NormData#

Synthesize data from the model

Parameters:

data (NormData, optional) – A NormData object with X and batch_effects. If provided, used to generate the synthetic data. If the data has no batch_effects, batch_effects are sampled from the model. If the data has no X, X is sampled from the model, using the provided or sampled batch_effects. If neither X nor batch_effects are provided, the model is used to generate the synthetic data.
n_samples (int, optional) – Number of samples to synthesize. If this is None, the number of samples that were in the train data is used.
covariate_range_per_batch_effect (bool, optional) – If True, the covariate range is different for each batch effect.

to_dict()#

transfer(transfer_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, **kwargs) → NormativeModel#: Transfers the model to a new dataset.

transfer_predict(transfer_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, **kwargs) → NormativeModel#: Transfers the model to a new dataset and predicts the data.

batch_effect_counts = None#

batch_effect_covariate_ranges = None#

property batch_effect_dims: list[str]#: Returns the batch effect dimensions. Returns:

list[str]: The batch effect dimensions.

batch_effects_maps = None#

correlation_matrix = None#

covariate_ranges = None#

covariates = None#

evaluate_model: bool = True#

evaluator#

property has_batch_effect: bool#: Returns whether the model has a batch effect. Returns:

bool: True if the model has a batch effect, False otherwise. This currently looks at the template reg conf

inscaler: str = 'standardize'#

inscalers: dict#

is_fitted: bool = False#

property n_fit_observations: int#: Returns the number of batch effects. Returns:

int: The number of batch effects.

name: str | None = None#

outscaler: str = 'standardize'#

outscalers: dict#

regression_models: dict[str, pcntoolkit.regression_model.regression_model.RegressionModel]#

response_vars: list[str] = None#

property save_dir: str#

savemodel: bool = True#

saveplots: bool = True#

saveresults: bool = True#

template_regression_model: pcntoolkit.regression_model.regression_model.RegressionModel#

thrive_covariate = None#

unique_batch_effects = None#

y_transform: str | None = None#