Module Index
- class bayesreg.BLR(**kwargs)[source]
Bases:
object
Bayesian linear regression
Estimation and prediction of Bayesian linear regression models
Basic usage:
B = BLR() hyp = B.estimate(hyp0, X, y) ys,s2 = B.predict(hyp, X, y, Xs)
where the variables are
- Parameters:
hyp – vector of hyperparmaters.
X – N x D data array
y – 1D Array of targets (length N)
Xs – Nte x D array of test cases
hyp0 – starting estimates for hyperparameter optimisation
- Returns:
ys - predictive mean
s2 - predictive variance
The hyperparameters are:
hyp = ( log(beta), log(alpha) ) # hyp is a list or numpy array
The implementation and notation mostly follows Bishop (2006). The hyperparameter beta is the noise precision and alpha is the precision over lengthscale parameters. This can be either a scalar variable (a common lengthscale for all input variables), or a vector of length D (a different lengthscale for each input variable, derived using an automatic relevance determination formulation). These are estimated using conjugate gradient optimisation of the marginal likelihood.
Reference: Bishop (2006) Pattern Recognition and Machine Learning, Springer
Written by A. Marquand
- estimate(hyp0, X, y, **kwargs)[source]
Function to estimate the model
- Parameters:
hyp – hyperparameter vector
X – covariates
y – responses
optimizer – optimisation algorithm (‘cg’,’powell’,’nelder-mead’,’l0bfgs-b’)
- penalized_loglik(hyp, X, y, Xv=None, l=0.1, norm='L1')[source]
Function to compute the penalized log (marginal) likelihood
- Parameters:
hyp – hyperparameter vector
X – covariates
y – responses
Xv – covariates for heteroskedastic noise
l – regularisation penalty
norm – type of regulariser (L1 or L2)
- post(hyp, X, y, Xv=None)[source]
Generic function to compute posterior distribution.
This function will save the posterior mean and precision matrix as self.m and self.A and will also update internal parameters (e.g. N, D and the prior covariance (Sigma_a) and precision (Lambda_a).
- Parameters:
hyp – hyperparameter vector
X – covariates
y – responses
Xv – covariates for heteroskedastic noise
- predict(hyp, X, y, Xs, var_groups_test=None, var_covariates_test=None, **kwargs)[source]
Function to make predictions from the model
- Parameters:
hyp – hyperparameter vector
X – covariates for training data
y – responses for training data
Xs – covariates for test data
var_covariates_test – test covariates for heteroskedastic noise
This always returns Gaussian predictions, i.e.
- Returns:
ys - predictive mean
s2 - predictive variance
- predict_and_adjust(hyp, X, y, Xs=None, ys=None, var_groups_test=None, var_groups_adapt=None, **kwargs)[source]
Function to transfer the model to a new site. This is done by first making predictions on the adaptation data given by X, adjusting by the residuals with respect to y.
- Parameters:
hyp – hyperparameter vector
X – covariates for adaptation (i.e. calibration) data
y – responses for adaptation data
Xs – covariate data (for which predictions should be adjusted)
ys – true response variables (to be adjusted)
var_groups_test – variance groups (e.g. sites) for test data
var_groups_adapt – variance groups for adaptation data
There are two possible ways of using this function, depending on whether ys or Xs is specified
If ys is specified, this is applied directly to the data, which is assumed to be in the input space (i.e. not warped). In this case the adjusted true data points are returned in the same space
Alternatively, Xs is specified, then the predictions are made and adjusted. In this case the predictive variance are returned in the warped (i.e. Gaussian) space.
This function needs to know which sites are associated with which data points, which provided by var_groups_xxx, which is a list or array of scalar ids .
- class gp.CovBase(x=None)[source]
Bases:
object
Base class for covariance functions.
All covariance functions must define the following methods:
CovFunction.get_n_params() CovFunction.cov() CovFunction.xcov() CovFunction.dcov()
- abstract cov(theta, x, z=None)[source]
Return the full covariance (or cross-covariance if z is given)
- class gp.CovLin(x=None)[source]
Bases:
CovBase
Linear covariance function (no hyperparameters)
- dcov(theta, x, i)[source]
Return the derivative of the covariance function with respect to the i-th hyperparameter
- get_n_params()
Report the number of parameters required
- class gp.CovSqExp(x=None)[source]
Bases:
CovBase
Ordinary squared exponential covariance function. The hyperparameters are:
theta = ( log(ell), log(sf) )
where ell is a lengthscale parameter and sf2 is the signal variance
- dcov(theta, x, i)[source]
Return the derivative of the covariance function with respect to the i-th hyperparameter
- get_n_params()
Report the number of parameters required
- class gp.CovSqExpARD(x=None)[source]
Bases:
CovBase
Squared exponential covariance function with ARD The hyperparameters are:
theta = (log(ell_1, ..., log_ell_D), log(sf))
where ell_i are lengthscale parameters and sf2 is the signal variance
- dcov(theta, x, i)[source]
Return the derivative of the covariance function with respect to the i-th hyperparameter
- get_n_params()
Report the number of parameters required
- class gp.CovSum(x=None, covfuncnames=None)[source]
Bases:
CovBase
Sum of covariance functions. These are passed in as a cell array and intialised automatically. For example:
C = CovSum(x,(CovLin, CovSqExpARD)) C = CovSum.cov(x, )
The hyperparameters are:
theta = ( log(ell_1, ..., log_ell_D), log(sf2) )
where ell_i are lengthscale parameters and sf2 is the signal variance
- dcov(theta, x, i)[source]
Return the derivative of the covariance function with respect to the i-th hyperparameter
- get_n_params()
Report the number of parameters required
- class gp.GPR(hyp=None, covfunc=None, X=None, y=None, n_iter=100, tol=0.001, verbose=False, warp=None)[source]
Bases:
object
Gaussian process regression
Estimation and prediction of Gaussian process regression models
Basic usage:
G = GPR() hyp = B.estimate(hyp0, cov, X, y) ys, ys2 = B.predict(hyp, cov, X, y, Xs)
where the variables are
- Parameters:
hyp – vector of hyperparmaters
cov – covariance function
X – N x D data array
y – 1D Array of targets (length N)
Xs – Nte x D array of test cases
hyp0 – starting estimates for hyperparameter optimisation
- Returns:
ys - predictive mean
ys2 - predictive variance
The hyperparameters are:
hyp = ( log(sn), (cov function params) ) # hyp is a list or array
The implementation and notation follows Rasmussen and Williams (2006). As in the gpml toolbox, these parameters are estimated using conjugate gradient optimisation of the marginal likelihood. Note that there is no explicit mean function, thus the gpr routines are limited to modelling zero-mean processes.
Reference: C. Rasmussen and C. Williams (2006) Gaussian Processes for Machine Learning
Written by A. Marquand
- normative_parallel.bashwrap_nm(processing_dir, python_path, normative_path, job_name, covfile_path, respfile_path, func='estimate', **kwargs)[source]
This function wraps normative modelling into a bash script to run it on a torque cluster system.
Basic usage:
bashwrap_nm(processing_dir, python_path, normative_path, job_name, covfile_path, respfile_path)
- Parameters:
processing_dir – Full path to the processing dir
python_path – Full path to the python distribution
normative_path – Full path to the normative.py
job_name – Name for the bash script that is the output of this function
covfile_path – Full path to a .txt file that contains all covariates (subjects x covariates) for the responsefile
respfile_path – Full path to a .txt that contains all features (subjects x features)
cv_folds – Number of cross validations
testcovfile_path – Full path to a .txt file that contains all covariates (subjects x covariates) for the testresponse file
testrespfile_path – Full path to a .txt file that contains all test features
alg – which algorithm to use
configparam – configuration parameters for this algorithm
- Outputs:
A bash.sh file containing the commands for normative modelling saved to the processing directory (written to disk).
written by (primarily) T Wolfers, (adapted) S Rutherford.
- normative_parallel.check_job_status(jobs)[source]
A utility function to count the tasks with different status.
- Parameters:
jobs – List of job ids.
- Returns:
returns the number of taks athat are queued, running, completed etc
- normative_parallel.check_jobs(jobs, delay=60)[source]
A utility function for chacking the status of submitted jobs.
- Parameters:
jobs – list of job ids.
delay – the delay (in sec) between two consequative checks, defaults to 60.
- normative_parallel.collect_nm(processing_dir, job_name, func='estimate', collect=False, binary=False, batch_size=None, outputsuffix='_estimate')[source]
Function to checks and collects all batches.
Basic usage:
collect_nm(processing_dir, job_name)
- Parameters:
processing_dir – Full path to the processing directory
collect – If True data is checked for failed batches and collected; if False data is just checked
binary – Results in pkl format
- Outputs:
Text or pkl files containing all results accross all batches the combined output (written to disk).
- Returns 0:
if batches fail
- Returns 1:
if bathches complete successfully
written by (primarily) T Wolfers, (adapted) SM Kia, (adapted) S Rutherford.
- normative_parallel.delete_nm(processing_dir, binary=False)[source]
This function deletes all processing for normative modelling and just keeps the combined output.
Basic usage:
collect_nm(processing_dir)
- Parameters:
processing_dir – Full path to the processing directory.
binary – Results in pkl format.
written by (primarily) T Wolfers, (adapted) SM Kia, (adapted) S Rutherford.
- normative_parallel.execute_nm(processing_dir, python_path, job_name, covfile_path, respfile_path, batch_size, memory, duration, normative_path=None, func='estimate', interactive=False, **kwargs)[source]
Execute parallel normative models This function is a mother function that executes all parallel normative modelling routines. Different specifications are possible using the sub- functions.
Basic usage:
execute_nm(processing_dir, python_path, job_name, covfile_path, respfile_path, batch_size, memory, duration)
- Parameters:
processing_dir – Full path to the processing dir
python_path – Full path to the python distribution
normative_path – Full path to the normative.py. If None (default) then it will automatically retrieves the path from the installed packeage.
job_name – Name for the bash script that is the output of this function
covfile_path – Full path to a .txt file that contains all covariats (subjects x covariates) for the responsefile
respfile_path – Full path to a .txt that contains all features (subjects x features)
batch_size – Number of features in each batch
memory – Memory requirements written as string for example 4gb or 500mb
duation – The approximate duration of the job, a string with HH:MM:SS for example 01:01:01
cv_folds – Number of cross validations
testcovfile_path – Full path to a .txt file that contains all covariates (subjects x covariates) for the test response file
testrespfile_path – Full path to a .txt file that contains all test features
log_path – Path for saving log files
binary – If True uses binary format for response file otherwise it is text
interactive – If False (default) the user should manually rerun the failed jobs or collect the results. If ‘auto’ the job status are checked until all jobs are completed then the failed jobs are rerun and the results are automaticallu collectted. Using ‘query’ is similar to ‘auto’ unless it asks for user verification thius is immune to endless loop in the case of bugs in the code.
written by (primarily) T Wolfers, (adapted) SM Kia The documentation is adapated by S Rutherford.
- normative_parallel.qsub_nm(job_path, log_path, memory, duration)[source]
This function submits a job.sh scipt to the torque custer using the qsub command.
Basic usage:
qsub_nm(job_path, log_path, memory, duration)
- Parameters:
job_path – Full path to the job.sh file.
memory – Memory requirements written as string for example 4gb or 500mb.
duation – The approximate duration of the job, a string with HH:MM:SS for example 01:01:01.
- Outputs:
Submission of the job to the (torque) cluster.
written by (primarily) T Wolfers, (adapted) SM Kia, (adapted) S Rutherford.
- normative_parallel.rerun_nm(processing_dir, log_path, memory, duration, binary=False, interactive=False)[source]
This function reruns all failed batched in processing_dir after collect_nm has identified the failed batches. Basic usage:
rerun_nm(processing_dir, log_path, memory, duration)
- Parameters:
processing_dir – Full path to the processing directory
memory – Memory requirements written as string for example 4gb or 500mb.
duration – The approximate duration of the job, a string with HH:MM:SS for example 01:01:01.
written by (primarily) T Wolfers, (adapted) SM Kia, (adapted) S Rutherford.
- normative_parallel.retrieve_jobs()[source]
A utility function to retrieve task status from the outputs of qstat.
- Returns:
a dictionary of jobs.
- normative_parallel.sbatch_nm(job_path, log_path)[source]
This function submits a job.sh scipt to the torque custer using the qsub command.
Basic usage:
sbatch_nm(job_path, log_path)
- Parameters:
job_path – Full path to the job.sh file
log_path – The logs are currently stored in the working dir
- Outputs:
Submission of the job to the (torque) cluster.
written by (primarily) T Wolfers, (adapted) S Rutherford.
- normative_parallel.sbatchrerun_nm(processing_dir, memory, duration, new_memory=False, new_duration=False, binary=False, **kwargs)[source]
This function reruns all failed batched in processing_dir after collect_nm has identified he failed batches.
Basic usage:
rerun_nm(processing_dir, memory, duration)
- Parameters:
processing_dir – Full path to the processing directory.
memory – Memory requirements written as string, for example 4gb or 500mb.
duration – The approximate duration of the job, a string with HH:MM:SS for example 01:01:01.
new_memory – If you want to change the memory you have to indicate it here.
new_duration – If you want to change the duration you have to indicate it here.
- Outputs:
Re-runs failed batches.
written by (primarily) T Wolfers, (adapted) S Rutherford.
- normative_parallel.sbatchwrap_nm(processing_dir, python_path, normative_path, job_name, covfile_path, respfile_path, memory, duration, func='estimate', **kwargs)[source]
This function wraps normative modelling into a bash script to run it on a torque cluster system.
Basic usage:
sbatchwrap_nm(processing_dir, python_path, normative_path, job_name, covfile_path, respfile_path, memory, duration)
- Parameters:
processing_dir – Full path to the processing dir
python_path – Full path to the python distribution
normative_path – Full path to the normative.py
job_name – Name for the bash script that is the output of this function
covfile_path – Full path to a .txt file that contains all covariates (subjects x covariates) for the responsefile
respfile_path – Full path to a .txt that contains all features (subjects x features)
cv_folds – Number of cross validations
testcovfile_path – Full path to a .txt file that contains all covariates (subjects x covariates) for the testresponse file
testrespfile_path – Full path to a .txt file that contains all test features
alg – which algorithm to use
configparam – configuration parameters for this algorithm
- Outputs:
A bash.sh file containing the commands for normative modelling saved to the processing directory (written to disk).
written by (primarily) T Wolfers, (adapted) S Rutherford
- normative_parallel.split_nm(processing_dir, respfile_path, batch_size, binary, **kwargs)[source]
This function prepares the input files for normative_parallel.
Basic usage:
split_nm(processing_dir, respfile_path, batch_size, binary, testrespfile_path)
- Parameters:
processing_dir – Full path to the processing dir
respfile_path – Full path to the responsefile.txt (subjects x features)
batch_size – Number of features in each batch
testrespfile_path – Full path to the test responsefile.txt (subjects x features)
binary – If True binary file
- Outputs:
The creation of a folder struture for batch-wise processing.
witten by (primarily) T Wolfers (adapted) SM Kia, (adapted) S Rutherford.
- trendsurf.create_basis(X, basis, mask)[source]
Create a basis set
This will create a basis set for the trend surface model. This is currently fit using a polynomial model of a specified degree. The models are estimated on the basis of data stored on disk in ascii or neuroimaging data formats (currently nifti only). Ascii data should be in tab or space delimited format with the number of voxels in rows and the number of subjects in columns. Neuroimaging data will be reshaped into the appropriate format
- Parameters:
X – covariates
basis – model order for the interpolating polynomial
mask – mask used to apply to the data
- Returns:
Phi - basis set
- trendsurf.estimate(filename, maskfile, basis, ard=False, outputall=False, saveoutput=True, **kwargs)[source]
Estimate a trend surface model
This will estimate a trend surface model, independently for each subject. This is currently fit using a polynomial model of a specified degree. The models are estimated on the basis of data stored on disk in ascii or neuroimaging data formats (currently nifti only). Ascii data should be in tab or space delimited format with the number of voxels in rows and the number of subjects in columns. Neuroimaging data will be reshaped into the appropriate format
Basic usage:
estimate(filename, maskfile, basis)
where the variables are defined below. Note that either the cfolds parameter or (testcov, testresp) should be specified, but not both.
- Parameters:
filename – 4-d nifti file containing the images to be estimated
maskfile – nifti mask used to apply to the data
basis – model order for the interpolating polynomial
All outputs are written to disk in the same format as the input. These are:
- Outputs:
yhat - predictive mean
ys2 - predictive variance
trendcoeff - coefficients from the trend surface model
negloglik - Negative log marginal likelihood
hyp - hyperparameters
explainedvar - explained variance
rmse - standardised mean squared error
- trendsurf.get_args(*args)[source]
Parse command line arguments
This will parse the command line arguments for the trend surface model. The arguments are:
- Parameters:
filename – 4-d nifti file containing the images to be estimated
maskfile – nifti mask used to apply to the data
basis – model order for the interpolating polynomial
covfile – file containing covariates
ard – use ARD
outputall – output all measures
- Returns:
filename - 4-d nifti file containing the images to be estimated
maskfile - nifti mask used to apply to the data
basis - model order for the interpolating polynomial
covfile - file containing covariates
ard - use ARD
outputall - output all measures
- trendsurf.load_data(datafile, maskfile=None)[source]
Load data from disk
This will load data from disk, either in nifti or ascii format. If the data are in ascii format, they should be in tab or space delimited format with the number of voxels in rows and the number of subjects in columns. Neuroimaging data will be reshaped into the appropriate format
- Parameters:
datafile – 4-d nifti file containing the images to be estimated
maskfile – nifti mask used to apply to the data
- Returns:
dat - data in vectorised form
world - voxel coordinates
mask - mask used to apply to the data
- trendsurf.write_nii(data, filename, examplenii, mask)[source]
Write data to nifti file
This will write data to a nifti file, using the header information from an example nifti file.
- Parameters:
data – data to be written
filename – name of file to be written
examplenii – example nifti file
mask – mask used to apply to the data
- Returns:
Phi - basis set
- class rfa.GPRRFA(hyp=None, X=None, y=None, n_feat=None, n_iter=100, tol=0.001, verbose=False)[source]
Bases:
object
Random Feature Approximation for Gaussian Process Regression
Estimation and prediction of Bayesian linear regression models
Basic usage:
R = GPRRFA() hyp = R.estimate(hyp0, X, y) ys,s2 = R.predict(hyp, X, y, Xs)
where the variables are
- Parameters:
hyp – vector of hyperparmaters.
X – N x D data array
y – 1D Array of targets (length N)
Xs – Nte x D array of test cases
hyp0 – starting estimates for hyperparameter optimisation
- Returns:
ys - predictive mean
s2 - predictive variance
The hyperparameters are:
hyp = [ log(sn), log(ell), log(sf) ] # hyp is a numpy array
where sn^2 is the noise variance, ell are lengthscale parameters and sf^2 is the signal variance. This provides an approximation to the covariance function:
k(x,z) = x'*z + sn2*exp(0.5*(x-z)'*Lambda*(x-z))
where Lambda = diag((ell_1^2, … ell_D^2))
Written by A. Marquand
- fileio.alphanum_key(s)[source]
Turn a string into a list of numbers
Basic usage:
alphanum_key(s)
- Parameters:
s – string to convert
- fileio.create_mask(data_array, mask, verbose=False)[source]
Create a mask from a data array or a nifti file
Basic usage:
create_mask(data_array, mask, verbose)
- Parameters:
data_array – numpy array containing the data to write out
mask – nifti image containing a mask for the image
verbose – verbose output
- fileio.file_extension(filename)[source]
Determine the file extension of a file (e.g. .nii.gz)
Basic usage:
file_extension(filename)
- Parameters:
filename – name of the file to check
- fileio.file_stem(filename)[source]
Determine the file stem of a file (e.g. /path/to/file.nii.gz -> file)
Basic usage:
file_stem(filename)
- Parameters:
filename – name of the file to check
- fileio.file_type(filename)[source]
Determine the file type of a file
Basic usage:
file_type(filename)
- Parameters:
filename – name of the file to check
- fileio.load(filename, mask=None, text=False, vol=True)[source]
Load a numpy array from a file
Basic usage:
load(filename, mask, text, vol)
- Parameters:
filename – name of the file to load
mask – nifti image containing a mask for the image
text – whether to write out a text file
vol – whether to load the image as a volume
- fileio.load_ascii(filename)[source]
Load an ascii file into a numpy array
Basic usage:
load_ascii(filename)
- Parameters:
filename – name of the file to load
- fileio.load_cifti(filename, vol=False, mask=None, rmtmp=True)[source]
Load a cifti file into a numpy array
Basic usage:
load_cifti(filename, vol, mask, rmtmp)
- Parameters:
filename – name of the file to load
vol – whether to load the image as a volume
mask – nifti image containing a mask for the image
rmtmp – whether to remove temporary files
- fileio.load_nifti(datafile, mask=None, vol=False, verbose=False)[source]
Load a nifti file into a numpy array
Basic usage:
load_nifti(datafile, mask, vol, verbose)
- Parameters:
datafile – name of the file to load
mask – nifti image containing a mask for the image
vol – whether to load the image as a volume
verbose – verbose output
- fileio.load_pd(filename)[source]
Load a csv file into a pandas dataframe
Basic usage:
load_pd(filename)
- Parameters:
filename – name of the file to load
- fileio.predictive_interval(s2_forward, cov_forward, multiplicator)[source]
Calculates a predictive interval for the forward model
- fileio.save(data, filename, example=None, mask=None, text=False, dtype=None)[source]
Save a numpy array to a file
Basic usage:
save(data, filename, example, mask, text, dtype)
- Parameters:
data – numpy array containing the data to write out
filename – where to store it
example – example file to copy the geometry from
mask – nifti image containing a mask for the image
text – whether to write out a text file
dtype – data type for the output image (if different from the image)
- fileio.save_ascii(data, filename)[source]
Save a numpy array to an ascii file
Basic usage:
save_ascii(data, filename)
- Parameters:
data – numpy array containing the data to write out
filename – where to store it
- fileio.save_cifti(data, filename, example, mask=None, vol=True, volatlas=None)[source]
Save a cifti file from a numpy array
Basic usage:
save_cifti(data, filename, example, mask, vol, volatlas)
- Parameters:
data – numpy array containing the data to write out
filename – where to store it
example – example file to copy the geometry from
mask – nifti image containing a mask for the image
vol – whether to load the image as a volume
volatlas – atlas to use for the volume
- fileio.save_nifti(data, filename, examplenii, mask, dtype=None)[source]
Write output to nifti
Basic usage:
save_nifti(data, filename mask, dtype)
- Parameters:
data – numpy array containing the data to write out
filename – where to store it
examplenii – nifti to copy the geometry and data type from
dtype – data type for the output image (if different from the image)
- Mask:
nifti image containing a mask for the image
- fileio.save_pd(data, filename)[source]
Save a pandas dataframe to a csv file
Basic usage:
save_pd(data, filename)
- Parameters:
data – pandas dataframe containing the data to write out
filename – where to store it
- fileio.sort_nicely(l)[source]
Sort a list of strings in a natural way
Basic usage:
sort_nicely(l)
- Parameters:
l – list of strings to sort