pcntoolkit.normative_model#
Module providing the NormativeModel class, which is the main class for building and using normative models.
Classes#
This class provides the foundation for building normative models, handling multiple |
Module Contents#
- class NormativeModel(template_regression_model: pcntoolkit.regression_model.regression_model.RegressionModel, savemodel: bool = True, evaluate_model: bool = True, saveresults: bool = True, saveplots: bool = True, save_dir: str | None = None, inscaler: str = 'standardize', outscaler: str = 'standardize', y_transform: str | None = None, name: str | None = None)#
This class provides the foundation for building normative models, handling multiple response variables through separate regression models. It manages data preprocessing, model fitting, prediction, and evaluation.
- Parameters:
template_reg_model (
RegressionModel) – Regression model used as a template to create all regression models.savemodel (
bool) – Whether to save the model.evaluate_model (
bool) – Whether to evaluate the model.saveresults (
bool) – Whether to save the results.saveplots (
bool) – Whether to save the plots.save_dir (
str) – Directory to save the model, results, and plots.inscaler (
str) – Input (X/covariates) scaler to use.outscaler (
str) – Output (Y/response_vars) scaler to use.y_transform (
strorNone) – Optional transform applied to Y before fitting and inverted after prediction. Currently supported: -"log1p"applies log(Y+1) -"log"applies natural log(Y) This is useful for phenotypes that cannot be negative. Default isNone(no transform).name (
str) – Name of the model
- __getitem__(key: str) pcntoolkit.regression_model.regression_model.RegressionModel#
- __setitem__(key: str, value: pcntoolkit.regression_model.regression_model.RegressionModel) None#
- check_compatibility(data: pcntoolkit.dataio.norm_data.NormData) bool#
Check if the data is compatible with the model.
- Parameters:
data (
NormData) – Data to check compatibility with.- Returns:
True if compatible, False otherwise
- Return type:
- compute_baseline_logp(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#
Computes the log-probability of the data under a simple Gaussian model.
The baseline model is a Gaussian with mean and standard deviation computed from the scaled Y data. This serves as a baseline model to evaluate for example the MSLL (Mean Standardized Log Loss) of our fitted model.
- Parameters:
data (
NormData) – Test data containing response variables (Y).- Returns:
Data with baseline_logp computed for each response variable.
- Return type:
NormData
- compute_centiles(data: pcntoolkit.dataio.norm_data.NormData, centiles: List[float] | numpy.ndarray | None = None, **kwargs) pcntoolkit.dataio.norm_data.NormData#
Computes the centiles for each response variable in the data.
- Parameters:
data (
NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).centiles (
np.ndarray, optional) – The centiles to compute. Defaults to [0.05, 0.25, 0.5, 0.75, 0.95].
- Returns:
Prediction results containing: - Centiles: centiles of the response variables
- Return type:
NormData
- compute_correlation_matrix(data, bandwidth=5, covariate='age')#
- compute_logp(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#
Computes the log-probability of the data under the fitted model.
- Parameters:
data (
NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).- Returns:
Prediction results containing: - logp: log-probability of the response variables per datapoint under the fitted model
- Return type:
NormData
- compute_thrivelines(data: pcntoolkit.dataio.norm_data.NormData, span: int = 5, step: int = 1, z_thrive: float = 0.0, covariate='age', **kwargs) pcntoolkit.dataio.norm_data.NormData#
Computes the thrivelines for each responsevar in the data
- compute_yhat(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#
Computes the predicted values for each response variable in the data.
- compute_zscores(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#
Computes Z-scores for each response variable using fitted regression models.
- Parameters:
data (
NormData) – Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).- Returns:
Prediction results containing: - Zscores: z-scores of the response variables
- Return type:
NormData
- static elemwise_logp_baseline_model(y_scaled: numpy.ndarray) numpy.ndarray#
Compute log-probability for each observation under a baseline Gaussian model.
- Parameters:
y_scaled (
np.ndarray) – Scaled response variable values.- Returns:
Log-probability
- Return type:
np.ndarray
- evaluate(data: pcntoolkit.dataio.norm_data.NormData) None#
Evaluates the model performance on the data. This method performs the following steps: 1. Preprocesses the data
Evaluates the model performance
Postprocesses the data
- extend(data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, n_synth_samples: int | None = None) NormativeModel#
Extends the model to a new dataset.
- extend_predict(extend_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, n_synth_samples: int | None = None) NormativeModel#
Extends the model to a new dataset and predicts the data.
- extract_data(data: pcntoolkit.dataio.norm_data.NormData) Tuple[xarray.DataArray, xarray.DataArray, dict[str, dict[str, int]], xarray.DataArray, xarray.DataArray]#
Returns a 5-tuple of covariates, batch effects, batch effect maps, response vars, Z-scores. If the variable is not available, returns None instead of the variable.
- fit(data: pcntoolkit.dataio.norm_data.NormData) None#
Fits a regression model for each response variable in the data.
- Parameters:
data (
NormData) – Training data containing covariates (X), batch effects (batch_effects), and response variables (Y). Must be a valid NormData object with properly formatted dimensions: - X: (n_samples, n_covariates) - batch_effects: (n_samples, n_batch_effects) - Y: (n_samples, n_response_vars)
- fit_predict(fit_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#
Combines model.fit and model.predict in a single operation.
- classmethod from_args(**kwargs) NormativeModel#
Create a new normative model from command line arguments.
- Parameters:
args (
dict[str,str]) – A dictionary of command line arguments.- Returns:
An instance of a normative model.
- Return type:
NormBase- Raises:
ValueError – If the regression model specified in the arguments is unknown.
- harmonize(data: pcntoolkit.dataio.norm_data.NormData, reference_batch_effect: dict[str, str] | None = None) pcntoolkit.dataio.norm_data.NormData#
Harmonizes the data to a reference batch effect. Harmonizes to the provided reference batch effect if provided, otherwise, harmonizes to the first batch effect alphabetically.
- Parameters:
data (
NormData) – Data to harmonize.reference_batch_effect (
dict[str,str]) – Reference batch effect.
- classmethod load(path: str, into: NormativeModel | None = None) NormativeModel#
Load a normative model from a path.
- Parameters:
path (
str) – The path to the normative model.into (
NormBase, optional) – The normative model to load the data into. If None, a new normative model is created. This is useful if you want to load a normative model into an existing normative model, for example in the runner.
- map_batch_effects(batch_effects: xarray.DataArray) xarray.DataArray#
- classmethod merge(save_dir: str, models: list[NormativeModel | str]) NormativeModel#
Merges multiple models into a single model.
- postprocess(data: pcntoolkit.dataio.norm_data.NormData) None#
Apply postprocessing to the data.
First unscales, then applies the inverse response transform (e.g. expm1).
- Args:
data (NormData): Data to postprocess.
- predict(data: pcntoolkit.dataio.norm_data.NormData) pcntoolkit.dataio.norm_data.NormData#
Computes Z-scores, centiles, logp, yhat for each observation using fitted regression models.
- preprocess(data: pcntoolkit.dataio.norm_data.NormData) None#
Applies preprocessing transformations to the input data.
First applies an optional response transform (e.g. log1p), then scales.
- Args:
data (NormData): Data to preprocess.
- register_batch_effects(data: pcntoolkit.dataio.norm_data.NormData) None#
- register_data_info(data: pcntoolkit.dataio.norm_data.NormData) None#
- sample_batch_effects(n_samples: int) xarray.DataArray#
Sample the batch effects from the estimated distribution.
- sample_covariates(bes: xarray.DataArray, covariate_range_per_batch_effect: bool = False) xarray.DataArray#
Sample the covariates from the estimated distribution.
Uses ranges of observed covariates matched with batch effects to create a representative sample
- save(path: str | None = None) None#
Save the model to a file.
- Args:
path (str, optional): The path to save the model to. If None, the model is saved to the save_dir provided in the norm_conf.
- scale_backward(data: pcntoolkit.dataio.norm_data.NormData) None#
Scales data back to its original scale using stored scalers.
- Parameters:
data (
NormData) –Data object containing arrays to be scaled back: - X : array-like, shape (n_samples, n_covariates)
Covariate data to be scaled back
- yarray-like, shape (n_samples, n_response_vars), optional
Response variable data to be scaled back
- scale_forward(data: pcntoolkit.dataio.norm_data.NormData, overwrite: bool = False) None#
Scales input data to standardized form using configured scalers.
- Parameters:
data (
NormData) –Data object containing arrays to be scaled: - X : array-like, shape (n_samples, n_covariates)
Covariate data to be scaled
- yarray-like, shape (n_samples, n_response_vars), optional
Response variable data to be scaled
overwrite (
bool, defaultFalse) – If True, creates new scalers even if they already exist. If False, uses existing scalers when available.
- set_ensure_save_dirs()#
Ensures that the save directories for results and plots are created when they are not there yet (otherwise resulted in an error)
- set_save_dir(save_dir: str) None#
Override the save_dir in the norm_conf.
- Args:
save_dir (str): New save directory.
- synthesize(data: pcntoolkit.dataio.norm_data.NormData | None = None, n_samples: int | None = None, covariate_range_per_batch_effect=False) pcntoolkit.dataio.norm_data.NormData#
Synthesize data from the model
- Parameters:
data (
NormData, optional) – A NormData object with X and batch_effects. If provided, used to generate the synthetic data. If the data has no batch_effects, batch_effects are sampled from the model. If the data has no X, X is sampled from the model, using the provided or sampled batch_effects. If neither X nor batch_effects are provided, the model is used to generate the synthetic data.n_samples (
int, optional) – Number of samples to synthesize. If this is None, the number of samples that were in the train data is used.covariate_range_per_batch_effect (
bool, optional) – If True, the covariate range is different for each batch effect.
- to_dict()#
- transfer(transfer_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, **kwargs) NormativeModel#
Transfers the model to a new dataset.
- transfer_predict(transfer_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, **kwargs) NormativeModel#
Transfers the model to a new dataset and predicts the data.
- batch_effect_counts = None#
- batch_effect_covariate_ranges = None#
- property batch_effect_dims: list[str]#
Returns the batch effect dimensions. Returns:
list[str]: The batch effect dimensions.
- batch_effects_maps = None#
- correlation_matrix = None#
- covariate_ranges = None#
- covariates = None#
- evaluator#
- property has_batch_effect: bool#
Returns whether the model has a batch effect. Returns:
bool: True if the model has a batch effect, False otherwise. This currently looks at the template reg conf
- property n_fit_observations: int#
Returns the number of batch effects. Returns:
int: The number of batch effects.
- regression_models: dict[str, pcntoolkit.regression_model.regression_model.RegressionModel]#
- template_regression_model: pcntoolkit.regression_model.regression_model.RegressionModel#
- thrive_covariate = None#
- unique_batch_effects = None#