Getting started with normative modelling

Welcome to this tutorial notebook that will show you the very basics of normative modeling. It’s like the “Hello World” of normative modeling.

Let’s jump right in.

Imports

import warnings
import pandas as pd
import matplotlib.pyplot as plt
from pcntoolkit import (
    BLR,
    NormativeModel,
    NormData,
    load_fcon1000,
    plot_centiles,
    plot_qq,
)
import pcntoolkit.util.output
import seaborn as sns

sns.set_style("darkgrid")
warnings.simplefilter(action="ignore", category=FutureWarning)
pd.options.mode.chained_assignment = None  # default='warn'
pcntoolkit.util.output.Output.set_show_messages(False)

Load data

First we download a small example dataset from github.

# Download an example dataset
norm_data: NormData = load_fcon1000()
# Select only these three features to model for this example
norm_data = norm_data.sel({"response_vars": ["WM-hypointensities", "Left-Lateral-Ventricle", "Brain-Stem"]})
# Train-test split
train, test = norm_data.train_test_split()
# Inspect the data
df = train.to_dataframe()
fig, ax = plt.subplots(1, 2, figsize=(15, 5))

sns.countplot(data=df, y=("batch_effects", "site"), hue=("batch_effects", "sex"), ax=ax[0], orient="h")
ax[0].legend(title="Sex")
ax[0].set_title("Count of sites")
ax[0].set_xlabel("Site")
ax[0].set_ylabel("Count")

scatter_feature = "Left-Lateral-Ventricle"

sns.scatterplot(
    data=df,
    x=("X", "age"),
    y=("Y", scatter_feature),
    hue=("batch_effects", "site"),
    style=("batch_effects", "sex"),
    ax=ax[1],
)
ax[1].legend([], [])
ax[1].set_title(f"Scatter plot of age vs {scatter_feature}")
ax[1].set_xlabel("Age")
ax[1].set_ylabel(scatter_feature)

plt.show()
../../_images/00_getting_started_6_0.png

Creating a Normative model

save_dir = "/Users/stijndeboer/Projects/PCN/PCNtoolkit/examples/saves"
model = NormativeModel(BLR(), inscaler="standardize", outscaler="standardize")
model.has_batch_effect
False

Fit the model

With all that configured, we can fit the model.

The fit_predict function will fit the model, evaluate it, save the results and plots, and return the test data with all the predictions added.

After that, it will compute Z-scores and centiles for the test set.

All results can be found in the save directory.

model.fit_predict(train, test)
<xarray.NormData> Size: 87kB
Dimensions:            (observations: 216, response_vars: 3, covariates: 1,
                        batch_effect_dims: 2, centile: 5, statistic: 11)
Coordinates:
  * observations       (observations) int64 2kB 756 769 692 616 ... 751 470 1043
  * response_vars      (response_vars) <U22 264B 'WM-hypointensities' ... 'Br...
  * covariates         (covariates) <U3 12B 'age'
  * batch_effect_dims  (batch_effect_dims) <U4 32B 'sex' 'site'
  * centile            (centile) float64 40B 0.05 0.25 0.5 0.75 0.95
  * statistic          (statistic) <U8 352B 'EXPV' 'MACE' ... 'SMSE' 'ShapiroW'
Data variables:
    subject_ids        (observations) object 2kB 'Munchen_sub96752' ... 'Quee...
    Y                  (observations, response_vars) float64 5kB 2.721e+03 .....
    X                  (observations, covariates) float64 2kB 63.0 ... 23.0
    batch_effects      (observations, batch_effect_dims) <U17 29kB 'F' ... 'Q...
    Z                  (observations, response_vars) float64 5kB 0.8681 ... -...
    centiles           (centile, observations, response_vars) float64 26kB 75...
    logp               (observations, response_vars) float64 5kB -1.254 ... -...
    Yhat               (observations, response_vars) float64 5kB 2.041e+03 .....
    statistics         (response_vars, statistic) float64 264B 0.1501 ... 0.9891
    Y_harmonized       (observations, response_vars) float64 5kB 2.721e+03 .....
Attributes:
    real_ids:                       True
    is_scaled:                      False
    name:                           fcon1000_test
    unique_batch_effects:           {np.str_('sex'): [np.str_('F'), np.str_('...
    batch_effect_counts:            defaultdict(<function NormData.register_b...
    covariate_ranges:               {np.str_('age'): {'mean': np.float64(28.2...
    batch_effect_covariate_ranges:  {np.str_('sex'): {np.str_('F'): {np.str_(...

Plotting the centiles

With the fitted model, and some data, we can plot some centiles. There are a lot of different configurations possible, but here is a simple example.

plot_centiles(model, scatter_data=train)
../../_images/00_getting_started_13_0.png ../../_images/00_getting_started_13_1.png ../../_images/00_getting_started_13_2.png

We see that the model fits the data reasonably well. We can do better, but that is a topic for another tutorial.

Showing the evaluation metrics

We also computed evaluation metrics for the model. Those are saved in the save_dir/results/statistics.csv file, but are also added to the NormData object as a new data variable.

# We can use the `get_statistics_df` method to get a nicely formatted dataframe with the evaluation metrics.
display(train.get_statistics_df())
display(test.get_statistics_df())
statistic EXPV MACE MAPE MSLL NLL R2 RMSE Rho Rho_p SMSE ShapiroW
response_vars
Brain-Stem 0.000004 0.006311 0.096330 -7.800671 1.418910 0.000004 2442.168242 -0.050426 1.390658e-01 0.999996 0.996466
Left-Lateral-Ventricle 0.162276 0.053828 0.401734 -8.440148 1.330158 0.162276 3877.069187 0.269669 7.905570e-16 0.837724 0.877568
WM-hypointensities 0.132905 0.068724 0.410158 -6.778665 1.345555 0.132905 760.501165 0.019769 5.621687e-01 0.867095 0.722254
statistic EXPV MACE MAPE MSLL NLL R2 RMSE Rho Rho_p SMSE ShapiroW
response_vars
Brain-Stem -0.000274 0.014259 0.103240 -7.790895 1.525974 -0.000309 2692.120004 -0.105739 0.121292 1.000309 0.989058
Left-Lateral-Ventricle 0.175440 0.046296 0.430686 -8.443777 1.404512 0.171542 4168.272079 0.220610 0.001099 0.828458 0.898344
WM-hypointensities 0.150136 0.060741 0.444148 -6.693763 1.130787 0.143994 559.964146 -0.027495 0.687809 0.856006 0.972405

QQ plots

We also have a nice function to make QQ plots.

plot_qq(test, plot_id_line=True)
../../_images/00_getting_started_17_0.png ../../_images/00_getting_started_17_1.png ../../_images/00_getting_started_17_2.png

And those are the basics of Normative Modelling with the PCNtoolkit. We will go over some more advanced models in the next tutorials, but this should give you a good first impression.