Gaussian Processes¶

Gaussian process regression module suited to learn and predict energies and forces

Example:

gp = GaussianProcess(kernel, noise)
gp.fit(train_configurations, train_forces)
gp.predict(test_configurations)

class mff.gp.GaussianProcess(kernel=None, noise=1e-10, optimizer=None, n_restarts_optimizer=0)¶

Gaussian process class Class of GP regression of QM energies and forces

Parameters

kernel (obj) – A kernel object (typically a two or three body)
noise (float) – The regularising noise level (typically named sigma_n^2)
optimizer (str) – The kind of optimization of marginal likelihood (not implemented yet)

X_train_¶

The configurations used for training

Type: list

alpha_¶

The coefficients obtained during training

Type: array

L_¶

The lower triangular matrix from cholesky decomposition of gram matrix

Type: array

K¶

The kernel gram matrix

Type: array

calc_gram_ee(X)¶

Calculate the force-force kernel gram matrix

Parameters: X (list) – list of N training configurations, which are M x 5 matrices
Returns: The energy energy gram matrix, has dimensions N x N
Return type: K (matrix)

calc_gram_ff(X)¶

Calculate the force-force kernel gram matrix

Parameters: X (list) – list of N training configurations, which are M x 5 matrices
Returns: The force-force gram matrix, has dimensions 3N x 3N
Return type: K (matrix)

fit(X, y, ncores=1)¶

Fit a Gaussian process regression model on training forces

Parameters

X (list) – training configurations
y (np.ndarray) – training forces
ncores (int) – number of CPU workers to use, default is 1

fit_energy(X_glob, y, ncores=1)¶

Fit a Gaussian process regression model using local energies.

Parameters

X_glob (list of lists of arrays) – list of grouped training configurations
y (np.ndarray) – training total energies
ncores (int) – number of CPU workers to use, default is 1

fit_force_and_energy(X, y_force, X_glob, y_energy, ncores=1)¶

Fit a Gaussian process regression model using forces and energies

Parameters

X (list of arrays) – training configurations
y_force (np.ndarray) – training forces
X_glob (list of lists of arrays) – list of grouped training configurations
y_energy (np.ndarray) – training total energies
ncores (int) – number of CPU workers to use, default is 1

load(filename)¶

Load a saved GP model

Parameters: filename (str) – name of the file where the GP is saved

log_marginal_likelihood(theta=None, eval_gradient=False)¶

Returns log-marginal likelihood of theta for training data.

Parameters

theta – array-like, shape = (n_kernel_params,) or None Kernel hyperparameters for which the log-marginal likelihood is evaluated. If None, the precomputed log_marginal_likelihood of self.kernel_.theta is returned.
eval_gradient – bool, default: False If True, the gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta is returned additionally. If True, theta must not be None.

Returns

float: Log-marginal likelihood of theta for training data.
log_likelihood_gradientarray, shape = (n_kernel_params,), optional: Gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta. Only returned when eval_gradient is True.

Return type

log_likelihood

predict(X, return_std=False, ncores=1)¶

Predict forces using the Gaussian process regression model

We can also predict based on an unfitted model by using the GP prior. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True)

Parameters

X (np.ndarray) – Target configuration where the GP is evaluated
return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.

Returns

Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target

configurations. Only returned when return_std is True.

Return type

y_mean (np.ndarray)

predict_energy(X, return_std=False, ncores=1, mapping=False, **kwargs)¶

Predict energies from forces only using the Gaussian process regression model

This function evaluates the GP energies for a set of test configurations.

Parameters

X (np.ndarray) – Target configurations where the GP is evaluated
return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.

Returns

Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target

configurations. Only returned when return_std is True.

Return type

y_mean (np.ndarray)

pseudo_log_likelihood()¶

Returns pseudo log-likelihood of the training data.

Parameters

theta – array-like, shape = (n_kernel_params,) or None Kernel hyperparameters for which the log-marginal likelihood is evaluated. If None, the precomputed log_marginal_likelihood of self.kernel_.theta is returned.
eval_gradient – bool, default: False If True, the gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta is returned additionally. If True, theta must not be None.

Returns

float: Log-marginal likelihood of theta for training data.
log_likelihood_gradientarray, shape = (n_kernel_params,), optional: Gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta. Only returned when eval_gradient is True.

Return type

log_likelihood

save(filename)¶

Dump the current GP model for later use

Parameters: filename (str) – name of the file where to save the GP

class mff.gp.ThreeBodySingleSpeciesGP(theta, noise=1e-10, optimizer=None, n_restarts_optimizer=0)¶

build_grid(dists, element1)¶: Function that builds and predicts energies on a cube of values

class mff.gp.TwoBodySingleSpeciesGP(theta, noise=1e-10, optimizer=None, n_restarts_optimizer=0)¶

Two Body Kernel¶

Module that contains the expressions for the 2-body single-species and multi-species kernel. The module uses the Theano package to create the energy-energy, force-energy and force-force kernels through automatic differentiation of the energy-energy kernel. The module is used to calculate the energy-energy, energy-force and force-force gram matrices for the Gaussian processes, and supports multi processing. The module is called by the gp.py script.

Example:

from twobodykernel import TwoBodySingleSpeciesKernel
kernel = kernels.TwoBodySingleSpeciesKernel(theta=[sigma, theta, r_cut])
ee_gram_matrix = kernel.calc_gram_e(training_configurations, number_nodes)

class mff.kernels.twobodykernel.BaseTwoBody(kernel_name, theta, bounds)¶

Two body kernel class Handles the functions common to the single-species and multi-species two-body kernels.

Parameters

kernel_name (str) – To choose between single- and two-species kernel
theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius
bounds (list) – bounds of the kernel function.

k2_ee¶

Energy-energy kernel function

Type: object

k2_ef¶

Energy-force kernel function

Type: object

k2_ff¶

Force-force kernel function

Type: object

calc(X1, X2, ncores=1)¶

Calculate the force-force kernel between two sets of configurations.

Parameters

X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N2*3 matrix of the vector-valued kernels

Return type

K (matrix)

calc_ee(X1, X2, ncores=1, mapping=False)¶

Calculate the energy-energy kernel between two global environments.

Parameters

X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N1 x N2 matrix of the scalar-valued kernels

Return type

K (matrix)

calc_ef(X_glob, X, ncores=1, mapping=False)¶

Calculate the energy-force kernel between two sets of configurations.

Parameters

X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N2*3 matrix of the vector-valued kernels

Return type

K (matrix)

calc_gram(X, ncores=1, eval_gradient=False)¶

Calculate the force-force gram matrix for a set of configurations X.

Parameters

X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N*3 x N*3 gram matrix of the matrix-valued kernels

Return type

gram (matrix)

calc_gram_e(X, ncores=1, eval_gradient=False)¶

Calculate the energy-energy gram matrix for a set of configurations X.

Parameters

X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N x N gram matrix of the scalar-valued kernels

Return type

gram (matrix)

calc_gram_ef(X, X_glob, ncores=1, eval_gradient=False)¶

Calculate the energy-force gram matrix for a set of configurations X. This returns a non-symmetric matrix which is equal to the transpose of the force-energy gram matrix.

Parameters

X (list) – list of N1 M1x5 arrays containing xyz coordinates and atomic species
X_glob (list) – list of N2 M2x5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N2 x N1*3 gram matrix of the vector-valued kernels

Return type

gram (matrix)

class mff.kernels.twobodykernel.TwoBodyManySpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶

Two body many species kernel.

Parameters

theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius

static compile_theano()¶

This function generates theano compiled kernels for global energy and force learning

The position of the atoms relative to the central one, and their chemical species are defined by a matrix of dimension Mx5 here called r1 and r2.

Returns: energy-energy kernel k2_ef (func): energy-force kernel k2_ff (func): force-force kernel
Return type: k2_ee (func)

class mff.kernels.twobodykernel.TwoBodySingleSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶

Two body single species kernel.

Parameters

theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius

static compile_theano()¶

This function generates theano compiled kernels for global energy and force learning

The position of the atoms relative to the central one, and their chemical species are defined by a matrix of dimension Mx5 here called r1 and r2.

Returns: energy-energy kernel k2_ef (func): energy-force kernel k2_ff (func): force-force kernel
Return type: k2_ee (func)

mff.kernels.twobodykernel.dummy_calc_ee(data)¶

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns: the computed kernel values
Return type: result (array)

mff.kernels.twobodykernel.dummy_calc_ef(data)¶

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns: the computed kernel values
Return type: result (array)

mff.kernels.twobodykernel.dummy_calc_ff(data)¶

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns: the computed kernel values
Return type: result (array)

Three Body Kernel¶

Module that contains the expressions for the 3-body single-species and multi-species kernel. The module uses the Theano package to create the energy-energy, force-energy and force-force kernels through automatic differentiation of the energy-energy kernel. The module is used to calculate the energy-energy, energy-force and force-force gram matrices for the Gaussian processes, and supports multi processing. The module is called by the gp.py script.

Example:

from threebodykernel import ThreeBodySingleSpeciesKernel
kernel = kernels.ThreeBodySingleSpeciesKernel(theta=[sigma, theta, r_cut])
ee_gram_matrix = kernel.calc_gram_e(training_configurations, number_nodes)

class mff.kernels.threebodykernel.BaseThreeBody(kernel_name, theta, bounds)¶

Three body kernel class Handles the functions common to the single-species and multi-species three-body kernels.

Parameters

kernel_name (str) – To choose between single- and two-species kernel
theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius
bounds (list) – bounds of the kernel function.

k3_ee¶

Energy-energy kernel function

Type: object

k3_ef¶

Energy-force kernel function

Type: object

k3_ef_loc¶

Local Energy-force kernel function

Type: object

k3_ff¶

Force-force kernel function

Type: object

calc(X1, X2, ncores=1)¶

Calculate the energy-force kernel between two sets of configurations.

Parameters

X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N2*3 matrix of the vector-valued kernels

Return type

K (matrix)

calc_ee(X1, X2, ncores=1, mapping=False)¶

Calculate the energy-energy kernel between two global environments.

Parameters

X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N1 x N2 matrix of the scalar-valued kernels

Return type

K (matrix)

calc_ef(X_glob, X, ncores=1, mapping=False)¶

Calculate the energy-force kernel between two sets of configurations.

Parameters

X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N2*3 matrix of the vector-valued kernels

Return type

K (matrix)

calc_gram(X, ncores=1, eval_gradient=False)¶

Calculate the force-force gram matrix for a set of configurations X.

Parameters

X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N*3 x N*3 gram matrix of the matrix-valued kernels

Return type

gram (matrix)

calc_gram_e(X, ncores=1, eval_gradient=False)¶

Calculate the energy-energy gram matrix for a set of configurations X.

Parameters

X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N x N gram matrix of the scalar-valued kernels

Return type

gram (matrix)

calc_gram_ef(X, X_glob, ncores=1, eval_gradient=False)¶

Calculate the energy-force gram matrix for a set of configurations X. This returns a non-symmetric matrix which is equal to the transpose of the force-energy gram matrix.

Parameters

X (list) – list of N1 M1x5 arrays containing xyz coordinates and atomic species
X_glob (list) – list of N2 M2x5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N2 x N1*3 gram matrix of the vector-valued kernels

Return type

gram (matrix)

class mff.kernels.threebodykernel.ThreeBodyManySpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶

Three body many species kernel.

Parameters

theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius

static compile_theano()¶

This function generates theano compiled kernels for energy and force learning ker_jkmn_withcutoff = ker_jkmn #* cutoff_ikmn

The position of the atoms relative to the centrla one, and their chemical species are defined by a matrix of dimension Mx5

Returns: energy-energy kernel k3_ef (func): energy-force kernel k3_ff (func): force-force kernel
Return type: k3_ee (func)

class mff.kernels.threebodykernel.ThreeBodySingleSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶

Three body two species kernel.

Parameters

theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius

static compile_theano()¶

This function generates theano compiled kernels for energy and force learning ker_jkmn_withcutoff = ker_jkmn #* cutoff_ikmn

The position of the atoms relative to the centrla one, and their chemical species are defined by a matrix of dimension Mx5

Returns: energy-energy kernel k3_ef (func): energy-force kernel k3_ff (func): force-force kernel
Return type: k3_ee (func)

mff.kernels.threebodykernel.dummy_calc_ee(data)¶

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns: the computed kernel values
Return type: result (array)

mff.kernels.threebodykernel.dummy_calc_ef(data)¶

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns: the computed kernel values
Return type: result (array)

mff.kernels.threebodykernel.dummy_calc_ff(data)¶

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns: the computed kernel values
Return type: result (array)