Getting started with normative modelling#

Welcome to this tutorial notebook that will show you the very basics of normative modeling. It’s like the “Hello World” of normative modeling.

Let’s jump right in.

Imports#

import warnings
import pandas as pd
import matplotlib.pyplot as plt
from pcntoolkit import (
    BLR,
    NormativeModel,
    NormData,
    load_fcon1000,
    plot_centiles,
    plot_qq,
)
import pcntoolkit.util.output
import seaborn as sns

sns.set_style("darkgrid")
warnings.simplefilter(action="ignore", category=FutureWarning)
pd.options.mode.chained_assignment = None  # default='warn'
pcntoolkit.util.output.Output.set_show_messages(False)

Load data#

First we download a small example dataset from github.

# Download an example dataset
norm_data: NormData = load_fcon1000()
# Select only these three features to model for this example
norm_data = norm_data.sel({"response_vars": ["WM-hypointensities", "Left-Lateral-Ventricle", "Brain-Stem"]})
# Train-test split
train, test = norm_data.train_test_split()
# Inspect the data
df = train.to_dataframe()
fig, ax = plt.subplots(1, 2, figsize=(15, 5))

sns.countplot(data=df, y=("batch_effects", "site"), hue=("batch_effects", "sex"), ax=ax[0], orient="h")
ax[0].legend(title="Sex")
ax[0].set_title("Count of sites")
ax[0].set_xlabel("Site")
ax[0].set_ylabel("Count")

scatter_feature = "Left-Lateral-Ventricle"

sns.scatterplot(
    data=df,
    x=("X", "age"),
    y=("Y", scatter_feature),
    hue=("batch_effects", "site"),
    style=("batch_effects", "sex"),
    ax=ax[1],
)
ax[1].legend([], [])
ax[1].set_title(f"Scatter plot of age vs {scatter_feature}")
ax[1].set_xlabel("Age")
ax[1].set_ylabel(scatter_feature)

plt.show()
../_images/00_getting_started_6_0.png

Creating a Normative model#

save_dir = "/Users/stijndeboer/Projects/PCN/PCNtoolkit/examples/saves"
model = NormativeModel(BLR(), inscaler="standardize", outscaler="standardize")
model.has_batch_effect
False

Fit the model#

With all that configured, we can fit the model.

The fit_predict function will fit the model, evaluate it, save the results and plots, and return the test data with all the predictions added.

After that, it will compute Z-scores and centiles for the test set.

All results can be found in the save directory.

model.fit_predict(train, test)
<xarray.NormData> Size: 87kB
Dimensions:            (observations: 216, response_vars: 3, covariates: 1,
                        batch_effect_dims: 2, centile: 5, statistic: 11)
Coordinates:
  * observations       (observations) int64 2kB 756 769 692 616 ... 751 470 1043
  * response_vars      (response_vars) <U22 264B 'WM-hypointensities' ... 'Br...
  * covariates         (covariates) <U3 12B 'age'
  * batch_effect_dims  (batch_effect_dims) <U4 32B 'sex' 'site'
  * centile            (centile) float64 40B 0.05 0.25 0.5 0.75 0.95
  * statistic          (statistic) <U8 352B 'EXPV' 'MACE' ... 'SMSE' 'ShapiroW'
Data variables:
    subject_ids        (observations) object 2kB 'Munchen_sub96752' ... 'Quee...
    Y                  (observations, response_vars) float64 5kB 2.721e+03 .....
    X                  (observations, covariates) float64 2kB 63.0 ... 23.0
    batch_effects      (observations, batch_effect_dims) <U17 29kB 'F' ... 'Q...
    Z                  (observations, response_vars) float64 5kB 0.8681 ... -...
    centiles           (centile, observations, response_vars) float64 26kB 75...
    logp               (observations, response_vars) float64 5kB -1.254 ... -...
    Yhat               (observations, response_vars) float64 5kB 2.041e+03 .....
    statistics         (response_vars, statistic) float64 264B 0.1501 ... 0.9891
    Y_harmonized       (observations, response_vars) float64 5kB 2.721e+03 .....
Attributes:
    real_ids:                       True
    is_scaled:                      False
    name:                           fcon1000_test
    unique_batch_effects:           {np.str_('sex'): [np.str_('F'), np.str_('...
    batch_effect_counts:            defaultdict(<function NormData.register_b...
    covariate_ranges:               {np.str_('age'): {'mean': np.float64(28.2...
    batch_effect_covariate_ranges:  {np.str_('sex'): {np.str_('F'): {np.str_(...

Plotting the centiles#

With the fitted model, and some data, we can plot some centiles. There are a lot of different configurations possible, but here is a simple example.

plot_centiles(model, scatter_data=train)
../_images/00_getting_started_13_0.png ../_images/00_getting_started_13_1.png ../_images/00_getting_started_13_2.png

We see that the model fits the data reasonably well. We can do better, but that is a topic for another tutorial.

Showing the evaluation metrics#

We also computed evaluation metrics for the model. Those are saved in the save_dir/results/statistics.csv file, but are also added to the NormData object as a new data variable.

# We can use the `get_statistics_df` method to get a nicely formatted dataframe with the evaluation metrics.
display(train.get_statistics_df())
display(test.get_statistics_df())
statistic EXPV MACE MAPE MSLL NLL R2 RMSE Rho Rho_p SMSE ShapiroW
response_vars
Brain-Stem 0.000004 0.006311 0.096330 -7.800671 1.418910 0.000004 2442.168242 -0.050426 1.390658e-01 0.999996 0.996466
Left-Lateral-Ventricle 0.162276 0.053828 0.401734 -8.440148 1.330158 0.162276 3877.069187 0.269669 7.905570e-16 0.837724 0.877568
WM-hypointensities 0.132905 0.068724 0.410158 -6.778665 1.345555 0.132905 760.501165 0.019769 5.621687e-01 0.867095 0.722254
statistic EXPV MACE MAPE MSLL NLL R2 RMSE Rho Rho_p SMSE ShapiroW
response_vars
Brain-Stem -0.000274 0.014259 0.103240 -7.790895 1.525974 -0.000309 2692.120004 -0.105739 0.121292 1.000309 0.989058
Left-Lateral-Ventricle 0.175440 0.046296 0.430686 -8.443777 1.404512 0.171542 4168.272079 0.220610 0.001099 0.828458 0.898344
WM-hypointensities 0.150136 0.060741 0.444148 -6.693763 1.130787 0.143994 559.964146 -0.027495 0.687809 0.856006 0.972405

QQ plots#

We also have a nice function to make QQ plots.

plot_qq(test, plot_id_line=True)
../_images/00_getting_started_17_0.png ../_images/00_getting_started_17_1.png ../_images/00_getting_started_17_2.png

And those are the basics of Normative Modelling with the PCNtoolkit. We will go over some more advanced models in the next tutorials, but this should give you a good first impression.