pcntoolkit.normative_model
==========================

.. py:module:: pcntoolkit.normative_model

.. autoapi-nested-parse::

   Module providing the NormativeModel class, which is the main class for building and using normative models.


Classes
-------

.. autoapisummary::

   pcntoolkit.normative_model.NormativeModel


Module Contents
---------------

.. py:class:: NormativeModel(template_regression_model: pcntoolkit.regression_model.regression_model.RegressionModel, savemodel: bool = True, evaluate_model: bool = True, saveresults: bool = True, saveplots: bool = True, save_dir: Optional[str] = None, inscaler: str = 'standardize', outscaler: str = 'standardize', y_transform: Optional[str] = None, name: Optional[str] = None)

   This class provides the foundation for building normative models, handling multiple
   response variables through separate regression models. It manages data preprocessing,
   model fitting, prediction, and evaluation.

   :param template_reg_model: Regression model used as a template to create all regression models.
   :type template_reg_model: :py:class:`RegressionModel`
   :param savemodel: Whether to save the model.
   :type savemodel: :py:class:`bool`
   :param evaluate_model: Whether to evaluate the model.
   :type evaluate_model: :py:class:`bool`
   :param saveresults: Whether to save the results.
   :type saveresults: :py:class:`bool`
   :param saveplots: Whether to save the plots.
   :type saveplots: :py:class:`bool`
   :param save_dir: Directory to save the model, results, and plots.
   :type save_dir: :py:class:`str`
   :param inscaler: Input (X/covariates) scaler to use.
   :type inscaler: :py:class:`str`
   :param outscaler: Output (Y/response_vars) scaler to use.
   :type outscaler: :py:class:`str`
   :param y_transform: Optional transform applied to Y before fitting and inverted
                       after prediction. Currently supported:
                       - ``"log1p"`` applies log(Y+1)
                       - ``"log"`` applies natural log(Y)
                       This is useful for phenotypes that cannot be negative.
                       Default is ``None`` (no transform).
   :type y_transform: :py:class:`str` or :py:obj:`None`
   :param name: Name of the model
   :type name: :py:class:`str`


   .. py:method:: __getitem__(key: str) -> pcntoolkit.regression_model.regression_model.RegressionModel


   .. py:method:: __setitem__(key: str, value: pcntoolkit.regression_model.regression_model.RegressionModel) -> None


   .. py:method:: check_compatibility(data: pcntoolkit.dataio.norm_data.NormData) -> bool

      Check if the data is compatible with the model.

      :param data: Data to check compatibility with.
      :type data: :py:class:`NormData`

      :returns: True if compatible, False otherwise
      :rtype: :py:class:`bool`


   .. py:method:: compute_baseline_logp(data: pcntoolkit.dataio.norm_data.NormData) -> pcntoolkit.dataio.norm_data.NormData

      Computes the log-probability of the data under a simple Gaussian model.

      The baseline model is a Gaussian with mean and standard deviation
      computed from the scaled Y data. This serves as a baseline model
      to evaluate for example the MSLL (Mean Standardized Log Loss) of our
      fitted model.

      :param data: Test data containing response variables (Y).
      :type data: :py:class:`NormData`

      :returns: Data with baseline_logp computed for each response variable.
      :rtype: :py:class:`NormData`


   .. py:method:: compute_centiles(data: pcntoolkit.dataio.norm_data.NormData, centiles: Optional[List[float] | numpy.ndarray] = None, **kwargs) -> pcntoolkit.dataio.norm_data.NormData

      Computes the centiles for each response variable in the data.

      :param data: Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).
      :type data: :py:class:`NormData`
      :param centiles: The centiles to compute. Defaults to [0.05, 0.25, 0.5, 0.75, 0.95].
      :type centiles: :py:class:`np.ndarray`, *optional*

      :returns: Prediction results containing:
                - Centiles: centiles of the response variables
      :rtype: :py:class:`NormData`


   .. py:method:: compute_correlation_matrix(data, bandwidth=5, covariate='age')


   .. py:method:: compute_logp(data: pcntoolkit.dataio.norm_data.NormData) -> pcntoolkit.dataio.norm_data.NormData

      Computes the log-probability of the data under the fitted model.

      :param data: Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).
      :type data: :py:class:`NormData`

      :returns: Prediction results containing:
                - logp: log-probability of the response variables per datapoint under the fitted model
      :rtype: :py:class:`NormData`


   .. py:method:: compute_thrivelines(data: pcntoolkit.dataio.norm_data.NormData, span: int = 5, step: int = 1, z_thrive: float = 0.0, covariate='age', **kwargs) -> pcntoolkit.dataio.norm_data.NormData

      Computes the thrivelines for each responsevar in the data


   .. py:method:: compute_yhat(data: pcntoolkit.dataio.norm_data.NormData) -> pcntoolkit.dataio.norm_data.NormData

      Computes the predicted values for each response variable in the data.


   .. py:method:: compute_zscores(data: pcntoolkit.dataio.norm_data.NormData) -> pcntoolkit.dataio.norm_data.NormData

      Computes Z-scores for each response variable using fitted regression models.

      :param data: Test data containing covariates (X) for which to generate predictions, batch effects (batch_effects), and response variables (Y).
      :type data: :py:class:`NormData`

      :returns: Prediction results containing:
                - Zscores: z-scores of the response variables
      :rtype: :py:class:`NormData`


   .. py:method:: elemwise_logp_baseline_model(y_scaled: numpy.ndarray) -> numpy.ndarray
      :staticmethod:


      Compute log-probability for each observation under a baseline
      Gaussian model.

      :param y_scaled: Scaled response variable values.
      :type y_scaled: :py:class:`np.ndarray`

      :returns: Log-probability
      :rtype: :py:class:`np.ndarray`


   .. py:method:: evaluate(data: pcntoolkit.dataio.norm_data.NormData) -> None

      Evaluates the model performance on the data.
      This method performs the following steps:
      1. Preprocesses the data

      5. Evaluates the model performance
      6. Postprocesses the data


   .. py:method:: extend(data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, n_synth_samples: int | None = None) -> NormativeModel

      Extends the model to a new dataset.


   .. py:method:: extend_predict(extend_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, n_synth_samples: int | None = None) -> NormativeModel

      Extends the model to a new dataset and predicts the data.


   .. py:method:: extract_data(data: pcntoolkit.dataio.norm_data.NormData) -> Tuple[xarray.DataArray, xarray.DataArray, dict[str, dict[str, int]], xarray.DataArray, xarray.DataArray]

      Returns a 5-tuple of covariates, batch effects, batch effect maps, response vars, Z-scores.
      If the variable is not available, returns None instead of the variable.


   .. py:method:: fit(data: pcntoolkit.dataio.norm_data.NormData) -> None

      Fits a regression model for each response variable in the data.

      :param data: Training data containing covariates (X), batch effects (batch_effects), and response variables (Y).
                   Must be a valid NormData object with properly formatted dimensions:
                   - X: (n_samples, n_covariates)
                   - batch_effects: (n_samples, n_batch_effects)
                   - Y: (n_samples, n_response_vars)
      :type data: :py:class:`NormData`


   .. py:method:: fit_predict(fit_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData) -> pcntoolkit.dataio.norm_data.NormData

      Combines model.fit and model.predict in a single operation.


   .. py:method:: from_args(**kwargs) -> NormativeModel
      :classmethod:


      Create a new normative model from command line arguments.

      :param args: A dictionary of command line arguments.
      :type args: :py:class:`dict[str`, :py:class:`str]`

      :returns: An instance of a normative model.
      :rtype: :py:class:`NormBase`

      :raises ValueError: If the regression model specified in the arguments is unknown.


   .. py:method:: harmonize(data: pcntoolkit.dataio.norm_data.NormData, reference_batch_effect: dict[str, str] | None = None) -> pcntoolkit.dataio.norm_data.NormData

      Harmonizes the data to a reference batch effect. Harmonizes to the provided reference batch effect if provided,
      otherwise, harmonizes to the first batch effect alphabetically.

      :param data: Data to harmonize.
      :type data: :py:class:`NormData`
      :param reference_batch_effect: Reference batch effect.
      :type reference_batch_effect: :py:class:`dict[str`, :py:class:`str]`


   .. py:method:: load(path: str, into: NormativeModel | None = None) -> NormativeModel
      :classmethod:


      Load a normative model from a path.

      :param path: The path to the normative model.
      :type path: :py:class:`str`
      :param into: The normative model to load the data into. If None, a new normative model is created.
                   This is useful if you want to load a normative model into an existing normative model, for example in the runner.
      :type into: :py:class:`NormBase`, *optional*


   .. py:method:: map_batch_effects(batch_effects: xarray.DataArray) -> xarray.DataArray


   .. py:method:: merge(save_dir: str, models: list[Union[NormativeModel, str]]) -> NormativeModel
      :classmethod:


      Merges multiple models into a single model.


   .. py:method:: model_specific_evaluation() -> None

      Save model-specific evaluation metrics.


   .. py:method:: postprocess(data: pcntoolkit.dataio.norm_data.NormData) -> None

      Apply postprocessing to the data.

      First unscales, then applies the inverse response transform (e.g. expm1).

      Args:
          data (NormData): Data to postprocess.


   .. py:method:: predict(data: pcntoolkit.dataio.norm_data.NormData) -> pcntoolkit.dataio.norm_data.NormData

      Computes Z-scores, centiles, logp, yhat for each observation using fitted regression models.


   .. py:method:: preprocess(data: pcntoolkit.dataio.norm_data.NormData) -> None

      Applies preprocessing transformations to the input data.

      First applies an optional response transform (e.g. log1p), then scales.

      Args:
          data (NormData): Data to preprocess.


   .. py:method:: register_batch_effects(data: pcntoolkit.dataio.norm_data.NormData) -> None


   .. py:method:: register_data_info(data: pcntoolkit.dataio.norm_data.NormData) -> None


   .. py:method:: sample_batch_effects(n_samples: int) -> xarray.DataArray

      Sample the batch effects from the estimated distribution.


   .. py:method:: sample_covariates(bes: xarray.DataArray, covariate_range_per_batch_effect: bool = False) -> xarray.DataArray

      Sample the covariates from the estimated distribution.

      Uses ranges of observed covariates matched with batch effects to create a representative sample


   .. py:method:: save(path: Optional[str] = None) -> None

      Save the model to a file.

      Args:
          path (str, optional): The path to save the model to. If None, the model is saved to the save_dir provided in the norm_conf.


   .. py:method:: scale_backward(data: pcntoolkit.dataio.norm_data.NormData) -> None

      Scales data back to its original scale using stored scalers.

      :param data: Data object containing arrays to be scaled back:
                   - X : array-like, shape (n_samples, n_covariates)
                       Covariate data to be scaled back
                   - y : array-like, shape (n_samples, n_response_vars), optional
                       Response variable data to be scaled back
      :type data: :py:class:`NormData`


   .. py:method:: scale_forward(data: pcntoolkit.dataio.norm_data.NormData, overwrite: bool = False) -> None

      Scales input data to standardized form using configured scalers.

      :param data: Data object containing arrays to be scaled:
                   - X : array-like, shape (n_samples, n_covariates)
                       Covariate data to be scaled
                   - y : array-like, shape (n_samples, n_response_vars), optional
                       Response variable data to be scaled
      :type data: :py:class:`NormData`
      :param overwrite: If True, creates new scalers even if they already exist.
                        If False, uses existing scalers when available.
      :type overwrite: :py:class:`bool`, *default* :py:obj:`False`


   .. py:method:: set_ensure_save_dirs()

      Ensures that the save directories for results and plots are created when they are not there yet (otherwise resulted in an error)


   .. py:method:: set_save_dir(save_dir: str) -> None

      Override the save_dir in the norm_conf.

      Args:
          save_dir (str): New save directory.


   .. py:method:: synthesize(data: pcntoolkit.dataio.norm_data.NormData | None = None, n_samples: int | None = None, covariate_range_per_batch_effect=False) -> pcntoolkit.dataio.norm_data.NormData

      Synthesize data from the model

      :param data: A NormData object with X and batch_effects. If provided, used to generate the synthetic data.
                   If the data has no batch_effects, batch_effects are sampled from the model.
                   If the data has no X, X is sampled from the model, using the provided or sampled batch_effects.
                   If neither X nor batch_effects are provided, the model is used to generate the synthetic data.
      :type data: :py:class:`NormData`, *optional*
      :param n_samples: Number of samples to synthesize. If this is None, the number of samples that were in the train data is used.
      :type n_samples: :py:class:`int`, *optional*
      :param covariate_range_per_batch_effect: If True, the covariate range is different for each batch effect.
      :type covariate_range_per_batch_effect: :py:class:`bool`, *optional*


   .. py:method:: to_dict()


   .. py:method:: transfer(transfer_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, **kwargs) -> NormativeModel

      Transfers the model to a new dataset.


   .. py:method:: transfer_predict(transfer_data: pcntoolkit.dataio.norm_data.NormData, predict_data: pcntoolkit.dataio.norm_data.NormData, save_dir: str | None = None, **kwargs) -> NormativeModel

      Transfers the model to a new dataset and predicts the data.


   .. py:attribute:: batch_effect_counts
      :value: None


   .. py:attribute:: batch_effect_covariate_ranges
      :value: None


   .. py:property:: batch_effect_dims
      :type: list[str]


      Returns the batch effect dimensions.
      Returns:
          list[str]: The batch effect dimensions.


   .. py:attribute:: batch_effects_maps
      :value: None


   .. py:attribute:: correlation_matrix
      :value: None


   .. py:attribute:: covariate_ranges
      :value: None


   .. py:attribute:: covariates
      :value: None


   .. py:attribute:: evaluate_model
      :type:  bool
      :value: True


   .. py:attribute:: evaluator


   .. py:property:: has_batch_effect
      :type: bool


      Returns whether the model has a batch effect.
      Returns:
          bool: True if the model has a batch effect, False otherwise. This currently looks at the template reg conf


   .. py:attribute:: inscaler
      :type:  str
      :value: 'standardize'


   .. py:attribute:: inscalers
      :type:  dict


   .. py:attribute:: is_fitted
      :type:  bool
      :value: False


   .. py:property:: n_fit_observations
      :type: int


      Returns the number of batch effects.
      Returns:
          int: The number of batch effects.


   .. py:attribute:: name
      :type:  Optional[str]
      :value: None


   .. py:attribute:: outscaler
      :type:  str
      :value: 'standardize'


   .. py:attribute:: outscalers
      :type:  dict


   .. py:attribute:: regression_models
      :type:  dict[str, pcntoolkit.regression_model.regression_model.RegressionModel]


   .. py:attribute:: response_vars
      :type:  list[str]
      :value: None


   .. py:property:: save_dir
      :type: str


   .. py:attribute:: savemodel
      :type:  bool
      :value: True


   .. py:attribute:: saveplots
      :type:  bool
      :value: True


   .. py:attribute:: saveresults
      :type:  bool
      :value: True


   .. py:attribute:: template_regression_model
      :type:  pcntoolkit.regression_model.regression_model.RegressionModel


   .. py:attribute:: thrive_covariate
      :value: None


   .. py:attribute:: unique_batch_effects
      :value: None


   .. py:attribute:: y_transform
      :type:  Optional[str]
      :value: None