Gaussian Process Regression

Created by

Mariam Zabihi @m_zabihi

Saige Rutherford @being_saige

Thomas Wolfers @ThomasWolfers _______________________________________________________________________________

https://colab.research.google.com/assets/colab-badge.svg

Background Story

Morten and Ingrid are concerned about the health of their father, Nordan. He recently turned 65 years. A few months ago he could not find his way home. Together, they visit a neurologist/psychiatrist to conduct a number of cognitive tests. However, those tests were inconclusive. While Nordan has a relatively low IQ it could not explain his trouble returning home.

Recently, the family heard about a new screening technique called normative modeling with which one can place individuals in reference to a population norm on for instance measures such as brain volume. Nordan would like to undertake this procedure to better know what is going on and to potentially find targets for treatment. Therefore, the family booked an appointment with you, the normative modeling specialist. To find out what is going on you compare Nordan’s hyppocampus to the norm and to a group of persons with Dementia disorders, who have a similar IQ, age as well as the same sex as Nordan.

Do your best to get as far as you can. However, you do not need to feel bad if you cannot complete everything during the tutorial.

Task 0: Load data and install the pcntoolkit

#install normative modeling
! pip install pcntoolkit==0.26

Option 1: Connect your Google Drive account, and load data from Google Drive. Having Google Drive connected will allow you to save any files created back to your Drive folder. This step will require you to download the csv files from Github to your computer, and then make a folder in your Google Drive account and upload the csv files to this folder.

from google.colab import drive
drive.mount('/content/drive')

#change dir to data on your google drive
import os
os.chdir('drive/My Drive/name-of-folder-where-you-uploaded-csv-files-from-Github/') #Change this path to match the path to your data in Google Drive

# code by T. Wolfers

Option 2: Import the files directly from Github, and skip adding them to Google Drive.

!wget -nc https://raw.githubusercontent.com/predictive-clinical-neuroscience/PCNtoolkit-demo/master/tutorials/CPC_2020/data/camcan_demographics.csv
!wget -nc https://raw.githubusercontent.com/predictive-clinical-neuroscience/PCNtoolkit-demo/master/tutorials/CPC_2020/data/camcan_demographics_nordan.csv
!wget -nc https://raw.githubusercontent.com/predictive-clinical-neuroscience/PCNtoolkit-demo/master/tutorials/CPC_2020/data/camcan_features.csv
!wget -nc https://raw.githubusercontent.com/predictive-clinical-neuroscience/PCNtoolkit-demo/master/tutorials/CPC_2020/data/camcan_features_nordan.csv

# code by S. Rutherford
--2022-02-17 15:03:58--  https://raw.githubusercontent.com/predictive-clinical-neuroscience/PCNtoolkit-demo/master/tutorials/CPC_2020/data/camcan_demographics.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17484 (17K) [text/plain]
Saving to: ‘camcan_demographics.csv’

camcan_demographics 100%[===================>]  17.07K  --.-KB/s    in 0.001s

2022-02-17 15:03:58 (12.9 MB/s) - ‘camcan_demographics.csv’ saved [17484/17484]

--2022-02-17 15:03:58--  https://raw.githubusercontent.com/predictive-clinical-neuroscience/PCNtoolkit-demo/master/tutorials/CPC_2020/data/camcan_demographics_nordan.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 332 [text/plain]
Saving to: ‘camcan_demographics_nordan.csv’

camcan_demographics 100%[===================>]     332  --.-KB/s    in 0s

2022-02-17 15:03:58 (15.5 MB/s) - ‘camcan_demographics_nordan.csv’ saved [332/332]

--2022-02-17 15:03:58--  https://raw.githubusercontent.com/predictive-clinical-neuroscience/PCNtoolkit-demo/master/tutorials/CPC_2020/data/camcan_features.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 188944 (185K) [text/plain]
Saving to: ‘camcan_features.csv’

camcan_features.csv 100%[===================>] 184.52K  --.-KB/s    in 0.05s

2022-02-17 15:03:58 (3.88 MB/s) - ‘camcan_features.csv’ saved [188944/188944]

--2022-02-17 15:03:58--  https://raw.githubusercontent.com/predictive-clinical-neuroscience/PCNtoolkit-demo/master/tutorials/CPC_2020/data/camcan_features_nordan.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1695 (1.7K) [text/plain]
Saving to: ‘camcan_features_nordan.csv’

camcan_features_nor 100%[===================>]   1.66K  --.-KB/s    in 0s

2022-02-17 15:03:59 (25.3 MB/s) - ‘camcan_features_nordan.csv’ saved [1695/1695]

TASK 1: Format input data

You have four files. The features and demographics file for the normsample and two files of the same name for Nordan your test sample. As one of your coworkers has done the preporcessing and quality control there are more subjects in the demographics file than in the features file of the norm sample. Please select the overlap of participants between those two files.

Question for your understanding:

  1. Why do we have to select the overlap between participants in terms of featrues and demographics?

import pandas as pd

# read in the files.
norm_demographics = pd.read_csv('camcan_demographics.csv',
                                sep= ",",
                                index_col = 0)
norm_features = pd.read_csv('camcan_features.csv',
                            sep=",",
                            index_col = 0)

# check columns through print [there are other better options]
print(norm_demographics)
print(norm_features)

# find overlap in terms of participants between norm_sample_features and
# norm_sample_demographics

norm_demographics_features = pd.concat([norm_demographics, norm_features],
                                       axis = 1,
                                       join = 'inner') # inner checks overlap
                                                       # outer combines
print(norm_demographics_features)

# code by T. Wolfers
             age sex_name  sex  IQ_random
paricipants
CC110033      24     MALE    1         73
CC110037      18     MALE    1        103
CC110045      24   FEMALE    0        124
CC110056      22   FEMALE    0        124
CC110062      20     MALE    1        126
...          ...      ...  ...        ...
CC722542      79     MALE    1        116
CC722651      79   FEMALE    0        128
CC722891      84   FEMALE    0        129
CC723197      80   FEMALE    0         96
CC723395      86   FEMALE    0        145

[707 rows x 4 columns]
              left_Hippocampal_tail  ...  right_Whole_hippocampus
participants                         ...
CC110033                 482.768229  ...              3531.764896
CC110037                 595.269259  ...              3835.426137
CC110045                 655.847194  ...              3681.494304
CC110056                 561.345626  ...              3461.373764
CC110062                 756.521166  ...              4782.407821
...                             ...  ...                      ...
CC722542                 467.896808  ...              3284.108783
CC722651                 406.326167  ...              3210.272905
CC722891                 393.430481  ...              2423.675065
CC723197                 475.929914  ...              3043.146264
CC723395                 444.301617  ...              2988.001288

[651 rows x 26 columns]
          age sex_name  sex  ...  right_fimbria  right_HATA  right_Whole_hippocampus
CC110033   24     MALE    1  ...      87.127463   73.589184              3531.764896
CC110037   18     MALE    1  ...      99.657823   60.920924              3835.426137
CC110045   24   FEMALE    0  ...      69.436808   59.323542              3681.494304
CC110056   22   FEMALE    0  ...      60.505521   51.726283              3461.373764
CC110062   20     MALE    1  ...      92.215816   85.484454              4782.407821
...       ...      ...  ...  ...            ...         ...                      ...
CC722542   79     MALE    1  ...      46.144212   43.966509              3284.108783
CC722651   79   FEMALE    0  ...      68.730322   59.699644              3210.272905
CC722891   84   FEMALE    0  ...      27.913196   38.629828              2423.675065
CC723197   80   FEMALE    0  ...      51.893458   65.474967              3043.146264
CC723395   86   FEMALE    0  ...      68.335159   62.081225              2988.001288

[650 rows x 30 columns]

TASK 2: Prepare the covariate_normsample and testresponse_normsample file.

As mentioned in the introductory presentation those files need a specific format and the entries need to be seperated by spaces. Use whatever method you know to prepare those files based on the data provided in TASK 1. Save those files in .txt format in your drive. Also get rid of the column names and participant IDs.

Given that we only have limited time in this practical we have to make a selection for the features based on your prior knowledge. With the information in mind that Nordan does not remember his way home, which subfield of the hyppocampus is probably a good target for the investigations? Select a maximum of four hyppocampal regions as features.

NOTE: Normative modeling is a screening tool we just make this selection due to time constraints, in reality we build these models on millions of putative biomarkers that are not restricted to brain imaging.

Qestions for your understanding:

  1. What is the requirement for the features in terms of variable properties (e.g. dicotomous or continous)? 3) What is the requirement for the covariates in terms of these properties? 4) What are the requirements for both together? 5) How does this depent on the algorithm used?

# perpare covariate_normsample for sex and age
covariate_normsample = norm_demographics_features[['sex',
                                                   'age']]

covariate_normsample.to_csv('covariate_normsample.txt',
                            sep = ' ',
                            header = False,
                            index = False)

# perpare features_normsample for relevant hyppocampal subfields
features_normsample = norm_demographics_features[['left_CA1',
                                                 'left_CA3',
                                                 'right_CA1',
                                                 'right_CA3']]

features_normsample.to_csv('features_normsample.txt',
                           sep = ' ',
                           header = False,
                           index = False)

# code by T. Wolfers

TASK 3: Estimate normative model

Once you have prepared and saved all the necessary files. Look at the pcntoolkit for running normative modeling. Select an appropritate method set up the toolkit and run your analyses using 2-fold cross validation in the normsample. Change the output suffix from estimate to ’_2fold’.

HINT: You primarily need the estimate function.

SUGGESTION: While this process is running you can go to the next TASK 4, you will have no doubt when it is correctly running.

Question for your understaning:

  1. What does cvfolds mean and why do we use it? 7) What is the output of the estimate function and what does it mean?

import pcntoolkit as pcn

# run normative modeling using 2-fold cross-validation

pcn.normative.estimate(covfile = 'covariate_normsample.txt',
                       respfile = 'features_normsample.txt',
                       cvfolds = 2,
                       alg = 'gpr',
                       outputsuffix = '_2fold')

# code by T. Wolfers
Processing data in features_normsample.txt
Estimating model  1 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 1856.502251
         Iterations: 40
         Function evaluations: 99
         Gradient evaluations: 99
Estimating model  2 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 1596.239263
         Iterations: 42
         Function evaluations: 93
         Gradient evaluations: 93
Estimating model  3 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 1862.316698
         Iterations: 47
         Function evaluations: 104
         Gradient evaluations: 104
Estimating model  4 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 1587.950935
         Iterations: 30
         Function evaluations: 64
         Gradient evaluations: 64
Estimating model  1 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 1916.461484
         Iterations: 44
         Function evaluations: 94
         Gradient evaluations: 87
Estimating model  2 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 1611.661888
         Iterations: 34
         Function evaluations: 85
         Gradient evaluations: 85
Estimating model  3 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 1912.665851
         Iterations: 61
         Function evaluations: 133
         Gradient evaluations: 126
Estimating model  4 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 1619.045647
         Iterations: 43
         Function evaluations: 110
         Gradient evaluations: 105
Evaluating the model ...
Writing outputs ...

TASK 4: Estimate the forward model of the normative model

In order to visulize the normative trajectories you first need to run the forward model. To this end you need to set up an appropriate covariate_forwardmodel file that covers the age range appropriately for both sexes. Save this file as .txt . Then you can input the files you made in TASK 1 as well as the file you made now and run the forward model using the appropriate specifications.

Question for your understaning:

  1. What is yhat and ys2? 9) Why does the output of the forward model does not inlcude the Z-scores?

# create covariate_forwardmodel.txt file
covariate_forwardmodel = {'sex': [0, 0, 0, 0, 0, 0, 0,
                                  1, 1, 1, 1, 1, 1, 1],
                          'age': [20, 30, 40, 50, 60, 70, 80,
                                  20, 30, 40, 50, 60, 70, 80]}
covariate_forwardmodel = pd.DataFrame(data=covariate_forwardmodel)

covariate_forwardmodel.to_csv('covariate_forwardmodel.txt',
                           sep = ' ',
                           header = False,
                           index = False)

# estimate forward model
pcn.normative.estimate(covfile = 'covariate_normsample.txt',
                       respfile = 'features_normsample.txt',
                       testcov = 'covariate_forwardmodel.txt',
                       cvfolds = None,
                       alg = 'gpr',
                       outputsuffix = '_forward')

# code by T. Wolfers
Processing data in features_normsample.txt
Estimating model  1 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 3781.497401
         Iterations: 20
         Function evaluations: 61
         Gradient evaluations: 54
Estimating model  2 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 3201.761309
         Iterations: 39
         Function evaluations: 108
         Gradient evaluations: 108
Estimating model  3 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 3771.310488
         Iterations: 47
         Function evaluations: 181
         Gradient evaluations: 167
Estimating model  4 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 3200.837262
         Iterations: 40
         Function evaluations: 104
         Gradient evaluations: 104
Writing outputs ...

TASK 5: Visualize forward model

Visualize the forward model of the normative model similar to the figure below.

1-s2.0-S245190221830329X-gr2.jpg

1-s2.0-S245190221830329X-gr2.jpg

HINT: First create a function that calculates the confidence intervals and then plot yhat, y2 of the forward model. Finally, plot the data of individual participants.

import numpy as np
import matplotlib.pyplot as plt

# confidence interval calculation at x_forward
def confidence_interval(s2,x,z):
  CI=np.zeros((len(x_forward),4))
  for i,xdot in enumerate(x_forward):
    ci_inx=np.isin(x,xdot)
    S2=s2[ci_inx]
    S_hat=np.mean(S2,axis=0)
    n=S2.shape[0]
    CI[i,:]=z*np.power(S_hat/n,.5)
  return CI


feature_names=['left_CA1','left_CA3','right_CA1','right_CA3']
sex_covariates=[ 'Female','Male']
# Creating plots for Female and male
for i,sex in enumerate(sex_covariates):
#forward model data
    forward_yhat = pd.read_csv('yhat_forward.txt', sep = ' ', header=None)
    yhat_forward=forward_yhat.values
    yhat_forward=yhat_forward[7*i:7*(i+1)]
    x_forward=[20, 30, 40, 50, 60, 70, 80]

# Find the index of the data exclusively for one sex. Female:0, Male: 1
    inx=np.where(covariate_normsample.sex==i)[0]
    x=covariate_normsample.values[inx,1]
# actual data
    y = pd.read_csv('features_normsample.txt', sep = ' ', header=None)
    y=y.values[inx]
# confidence Interval yhat+ z *(std/n^.5)-->.95 % CI:z=1.96, 99% CI:z=2.58
    s2= pd.read_csv('ys2_2fold.txt', sep = ' ', header=None)
    s2=s2.values[inx]

    CI_95=confidence_interval(s2,x,1.96)
    CI_99=confidence_interval(s2,x,2.58)

# Creat a trejactroy for each point
    for j,name in enumerate(feature_names):
         fig=plt.figure()
         ax=fig.add_subplot(111)
         ax.plot(x_forward,yhat_forward[:,j], linewidth=4, label='Normative trejactory')


         ax.plot(x_forward,CI_95[:,j]+yhat_forward[:,j], linewidth=2,linestyle='--',c='g', label='95% confidence interval')
         ax.plot(x_forward,-CI_95[:,j]+yhat_forward[:,j], linewidth=2,linestyle='--',c='g')

         ax.plot(x_forward,CI_99[:,j]+yhat_forward[:,j], linewidth=1,linestyle='--',c='k', label='99% confidence interval')
         ax.plot(x_forward,-CI_99[:,j]+yhat_forward[:,j], linewidth=1,linestyle='--',c='k')

         ax.scatter(x,y[:,j],c='r', label=name)
         plt.legend(loc='upper left')
         plt.title('Normative trejectory of' +name+' in '+sex+' cohort')
         plt.show()
         plt.close()

# code by M. Zabihi
../_images/normative_modelling_walkthrough_22_0.png ../_images/normative_modelling_walkthrough_22_1.png ../_images/normative_modelling_walkthrough_22_2.png ../_images/normative_modelling_walkthrough_22_3.png ../_images/normative_modelling_walkthrough_22_4.png ../_images/normative_modelling_walkthrough_22_5.png ../_images/normative_modelling_walkthrough_22_6.png ../_images/normative_modelling_walkthrough_22_7.png

TASK 6: Apply the normative model to Nordan’s data and the dementia patients.

# read in Nordan's as well as the patient's demographics and features
demographics_nordan = pd.read_csv('camcan_demographics_nordan.csv',
                                       sep= ",",
                                       index_col = 0)
features_nordan = pd.read_csv('camcan_features_nordan.csv',
                            sep=",",
                            index_col = 0)

# create a covariate file for Nordan's as well as the patient's demograhpics
covariate_nordan = demographics_nordan[['sex',
                                        'age']]
covariate_nordan.to_csv('covariate_nordan.txt',
                        sep = ' ',
                        header = False,
                        index = False)

# create the corresponding feature file
features_nordan = features_nordan[['left_CA1',
                                  'left_CA3',
                                  'right_CA1',
                                  'right_CA3']]

features_nordan.to_csv('features_nordan.txt',
                        sep = ' ',
                        header = False,
                        index = False)

# apply normative modeling
pcn.normative.estimate(covfile = 'covariate_normsample.txt',
                       respfile = 'features_normsample.txt',
                       testcov = 'covariate_nordan.txt',
                       testresp = 'features_nordan.txt',
                       cvfolds = None,
                       alg = 'gpr',
                       outputsuffix = '_nordan')

# code by T. Wolfers
Processing data in features_normsample.txt
Estimating model  1 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 3781.497401
         Iterations: 20
         Function evaluations: 61
         Gradient evaluations: 54
Estimating model  2 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 3201.761309
         Iterations: 39
         Function evaluations: 108
         Gradient evaluations: 108
Estimating model  3 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 3771.310488
         Iterations: 47
         Function evaluations: 181
         Gradient evaluations: 167
Estimating model  4 of 4
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Warning: Estimation of posterior distribution failed
Optimization terminated successfully.
         Current function value: 3200.837262
         Iterations: 40
         Function evaluations: 104
         Gradient evaluations: 104
Evaluating the model ...
Writing outputs ...

TASK 7: In which hyppocampal subfield(s) does Nordan deviate extremely?

No coding necessary just create a presentation which includes recommendations to Nordan and his family. Use i) |Z| > 3.6 ii) |Z| > 1.96 as definitions for extreme normative deviations.

TASK 8 (OPTIONAL): Implement a function that calculates percentage change.

Percentage change = \frac{x1 - x2}{|x2|}*100

# function that calculates percentage change
def calculate_percentage_change(x1, x2):
  percentage_change = ((x1 - x2) / abs(x2)) * 100
  return percentage_change

# code by T. Wolfers

TASK 9 (OPTIONAL): Visualize percent change

Plot the prercentage change in Yhat of the forward model in reference to age 20. Do that for both sexes seperately.

import matplotlib.pyplot as plt

forward_yhat = pd.read_csv('yhat_forward.txt', sep = ' ', header=None)

# You can indicate here which hypocampal subfield you like to visualize
hyppocampal_subfield = 0

percentage_change_female = []
percentage_change_male = []
count = 0
lengths = len(forward_yhat[hyppocampal_subfield])
for entry in forward_yhat[hyppocampal_subfield]:
  if count > 0 and count < 7:
    loop_percentage_change_female = calculate_percentage_change(entry,
                                                                forward_yhat.iloc[0,
                                                                                  hyppocampal_subfield])
    percentage_change_female.append(loop_percentage_change_female)
  elif count > 7:
    loop_percentage_change_male = calculate_percentage_change(entry,
                                                              forward_yhat.iloc[9,
                                                                                hyppocampal_subfield])
    percentage_change_male.append(loop_percentage_change_male)
  count = count + 1

names = ['30 compared to 20 years',
         '40 compared to 20 years',
         '50 compared to 20 years',
         '60 compared to 20 years',
         '70 compared to 20 years',
         '80 compared to 20 years']

# females
plt.subplot(121)
plt.bar(names, percentage_change_female)
plt.xticks(rotation=90)
plt.ylim(-20, 2)

# males
plt.subplot(122)
plt.bar(names, percentage_change_male)
plt.xticks(rotation=90)
plt.ylim(-20, 2)

# code by T. Wolfers
(-20.0, 2.0)
../_images/normative_modelling_walkthrough_32_1.png