Gaussian Processes

Gaussian process regression module suited to learn and predict energies and forces

Example:

gp = GaussianProcess(kernel, noise)
gp.fit(train_configurations, train_forces)
gp.predict(test_configurations)
class mff.gp.GaussianProcess(kernel=None, noise=1e-10, optimizer=None, n_restarts_optimizer=0)

Gaussian process class Class of GP regression of QM energies and forces

Parameters
  • kernel (obj) – A kernel object (typically a two or three body)

  • noise (float) – The regularising noise level (typically named sigma_n^2)

  • optimizer (str) – The kind of optimization of marginal likelihood (not implemented yet)

X_train_

The configurations used for training

Type

list

alpha_

The coefficients obtained during training

Type

array

L_

The lower triangular matrix from cholesky decomposition of gram matrix

Type

array

K

The kernel gram matrix

Type

array

calc_gram_ee(X)

Calculate the force-force kernel gram matrix

Parameters

X (list) – list of N training configurations, which are M x 5 matrices

Returns

The energy energy gram matrix, has dimensions N x N

Return type

K (matrix)

calc_gram_ff(X)

Calculate the force-force kernel gram matrix

Parameters

X (list) – list of N training configurations, which are M x 5 matrices

Returns

The force-force gram matrix, has dimensions 3N x 3N

Return type

K (matrix)

fit(X, y, ncores=1)

Fit a Gaussian process regression model on training forces

Parameters
  • X (list) – training configurations

  • y (np.ndarray) – training forces

  • ncores (int) – number of CPU workers to use, default is 1

fit_energy(X_glob, y, ncores=1)

Fit a Gaussian process regression model using local energies.

Parameters
  • X_glob (list of lists of arrays) – list of grouped training configurations

  • y (np.ndarray) – training total energies

  • ncores (int) – number of CPU workers to use, default is 1

fit_force_and_energy(X, y_force, X_glob, y_energy, ncores=1)

Fit a Gaussian process regression model using forces and energies

Parameters
  • X (list of arrays) – training configurations

  • y_force (np.ndarray) – training forces

  • X_glob (list of lists of arrays) – list of grouped training configurations

  • y_energy (np.ndarray) – training total energies

  • ncores (int) – number of CPU workers to use, default is 1

load(filename)

Load a saved GP model

Parameters

filename (str) – name of the file where the GP is saved

log_marginal_likelihood(theta=None, eval_gradient=False)

Returns log-marginal likelihood of theta for training data.

Parameters
  • theta – array-like, shape = (n_kernel_params,) or None Kernel hyperparameters for which the log-marginal likelihood is evaluated. If None, the precomputed log_marginal_likelihood of self.kernel_.theta is returned.

  • eval_gradient – bool, default: False If True, the gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta is returned additionally. If True, theta must not be None.

Returns

float

Log-marginal likelihood of theta for training data.

log_likelihood_gradientarray, shape = (n_kernel_params,), optional

Gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta. Only returned when eval_gradient is True.

Return type

log_likelihood

predict(X, return_std=False, ncores=1)

Predict forces using the Gaussian process regression model

We can also predict based on an unfitted model by using the GP prior. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True)

Parameters
  • X (np.ndarray) – Target configuration where the GP is evaluated

  • return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.

Returns

Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target

configurations. Only returned when return_std is True.

Return type

y_mean (np.ndarray)

predict_energy(X, return_std=False, ncores=1, mapping=False, **kwargs)

Predict energies from forces only using the Gaussian process regression model

This function evaluates the GP energies for a set of test configurations.

Parameters
  • X (np.ndarray) – Target configurations where the GP is evaluated

  • return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.

Returns

Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target

configurations. Only returned when return_std is True.

Return type

y_mean (np.ndarray)

pseudo_log_likelihood()

Returns pseudo log-likelihood of the training data.

Parameters
  • theta – array-like, shape = (n_kernel_params,) or None Kernel hyperparameters for which the log-marginal likelihood is evaluated. If None, the precomputed log_marginal_likelihood of self.kernel_.theta is returned.

  • eval_gradient – bool, default: False If True, the gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta is returned additionally. If True, theta must not be None.

Returns

float

Log-marginal likelihood of theta for training data.

log_likelihood_gradientarray, shape = (n_kernel_params,), optional

Gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta. Only returned when eval_gradient is True.

Return type

log_likelihood

save(filename)

Dump the current GP model for later use

Parameters

filename (str) – name of the file where to save the GP

class mff.gp.ThreeBodySingleSpeciesGP(theta, noise=1e-10, optimizer=None, n_restarts_optimizer=0)
build_grid(dists, element1)

Function that builds and predicts energies on a cube of values

class mff.gp.TwoBodySingleSpeciesGP(theta, noise=1e-10, optimizer=None, n_restarts_optimizer=0)

Two Body Kernel

Module that contains the expressions for the 2-body single-species and multi-species kernel. The module uses the Theano package to create the energy-energy, force-energy and force-force kernels through automatic differentiation of the energy-energy kernel. The module is used to calculate the energy-energy, energy-force and force-force gram matrices for the Gaussian processes, and supports multi processing. The module is called by the gp.py script.

Example:

from twobodykernel import TwoBodySingleSpeciesKernel
kernel = kernels.TwoBodySingleSpeciesKernel(theta=[sigma, theta, r_cut])
ee_gram_matrix = kernel.calc_gram_e(training_configurations, number_nodes)
class mff.kernels.twobodykernel.BaseTwoBody(kernel_name, theta, bounds)

Two body kernel class Handles the functions common to the single-species and multi-species two-body kernels.

Parameters
  • kernel_name (str) – To choose between single- and two-species kernel

  • theta[0] (float) – lengthscale of the kernel

  • theta[1] (float) – decay rate of the cutoff function

  • theta[2] (float) – cutoff radius

  • bounds (list) – bounds of the kernel function.

k2_ee

Energy-energy kernel function

Type

object

k2_ef

Energy-force kernel function

Type

object

k2_ff

Force-force kernel function

Type

object

calc(X1, X2, ncores=1)

Calculate the force-force kernel between two sets of configurations.

Parameters
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species

  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N2*3 matrix of the vector-valued kernels

Return type

K (matrix)

calc_ee(X1, X2, ncores=1, mapping=False)

Calculate the energy-energy kernel between two global environments.

Parameters
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species

  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N1 x N2 matrix of the scalar-valued kernels

Return type

K (matrix)

calc_ef(X_glob, X, ncores=1, mapping=False)

Calculate the energy-force kernel between two sets of configurations.

Parameters
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species

  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N2*3 matrix of the vector-valued kernels

Return type

K (matrix)

calc_gram(X, ncores=1, eval_gradient=False)

Calculate the force-force gram matrix for a set of configurations X.

Parameters
  • X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species

  • ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)

  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N*3 x N*3 gram matrix of the matrix-valued kernels

Return type

gram (matrix)

calc_gram_e(X, ncores=1, eval_gradient=False)

Calculate the energy-energy gram matrix for a set of configurations X.

Parameters
  • X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species

  • ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)

  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N x N gram matrix of the scalar-valued kernels

Return type

gram (matrix)

calc_gram_ef(X, X_glob, ncores=1, eval_gradient=False)

Calculate the energy-force gram matrix for a set of configurations X. This returns a non-symmetric matrix which is equal to the transpose of the force-energy gram matrix.

Parameters
  • X (list) – list of N1 M1x5 arrays containing xyz coordinates and atomic species

  • X_glob (list) – list of N2 M2x5 arrays containing xyz coordinates and atomic species

  • ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)

  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N2 x N1*3 gram matrix of the vector-valued kernels

Return type

gram (matrix)

class mff.kernels.twobodykernel.TwoBodyManySpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))

Two body many species kernel.

Parameters
  • theta[0] (float) – lengthscale of the kernel

  • theta[1] (float) – decay rate of the cutoff function

  • theta[2] (float) – cutoff radius

static compile_theano()

This function generates theano compiled kernels for global energy and force learning

The position of the atoms relative to the central one, and their chemical species are defined by a matrix of dimension Mx5 here called r1 and r2.

Returns

energy-energy kernel k2_ef (func): energy-force kernel k2_ff (func): force-force kernel

Return type

k2_ee (func)

class mff.kernels.twobodykernel.TwoBodySingleSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))

Two body single species kernel.

Parameters
  • theta[0] (float) – lengthscale of the kernel

  • theta[1] (float) – decay rate of the cutoff function

  • theta[2] (float) – cutoff radius

static compile_theano()

This function generates theano compiled kernels for global energy and force learning

The position of the atoms relative to the central one, and their chemical species are defined by a matrix of dimension Mx5 here called r1 and r2.

Returns

energy-energy kernel k2_ef (func): energy-force kernel k2_ff (func): force-force kernel

Return type

k2_ee (func)

mff.kernels.twobodykernel.dummy_calc_ee(data)

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns

the computed kernel values

Return type

result (array)

mff.kernels.twobodykernel.dummy_calc_ef(data)

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns

the computed kernel values

Return type

result (array)

mff.kernels.twobodykernel.dummy_calc_ff(data)

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns

the computed kernel values

Return type

result (array)

Three Body Kernel

Module that contains the expressions for the 3-body single-species and multi-species kernel. The module uses the Theano package to create the energy-energy, force-energy and force-force kernels through automatic differentiation of the energy-energy kernel. The module is used to calculate the energy-energy, energy-force and force-force gram matrices for the Gaussian processes, and supports multi processing. The module is called by the gp.py script.

Example:

from threebodykernel import ThreeBodySingleSpeciesKernel
kernel = kernels.ThreeBodySingleSpeciesKernel(theta=[sigma, theta, r_cut])
ee_gram_matrix = kernel.calc_gram_e(training_configurations, number_nodes)
class mff.kernels.threebodykernel.BaseThreeBody(kernel_name, theta, bounds)

Three body kernel class Handles the functions common to the single-species and multi-species three-body kernels.

Parameters
  • kernel_name (str) – To choose between single- and two-species kernel

  • theta[0] (float) – lengthscale of the kernel

  • theta[1] (float) – decay rate of the cutoff function

  • theta[2] (float) – cutoff radius

  • bounds (list) – bounds of the kernel function.

k3_ee

Energy-energy kernel function

Type

object

k3_ef

Energy-force kernel function

Type

object

k3_ef_loc

Local Energy-force kernel function

Type

object

k3_ff

Force-force kernel function

Type

object

calc(X1, X2, ncores=1)

Calculate the energy-force kernel between two sets of configurations.

Parameters
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species

  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N2*3 matrix of the vector-valued kernels

Return type

K (matrix)

calc_ee(X1, X2, ncores=1, mapping=False)

Calculate the energy-energy kernel between two global environments.

Parameters
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species

  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N1 x N2 matrix of the scalar-valued kernels

Return type

K (matrix)

calc_ef(X_glob, X, ncores=1, mapping=False)

Calculate the energy-force kernel between two sets of configurations.

Parameters
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species

  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species

Returns

N2*3 matrix of the vector-valued kernels

Return type

K (matrix)

calc_gram(X, ncores=1, eval_gradient=False)

Calculate the force-force gram matrix for a set of configurations X.

Parameters
  • X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species

  • ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)

  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N*3 x N*3 gram matrix of the matrix-valued kernels

Return type

gram (matrix)

calc_gram_e(X, ncores=1, eval_gradient=False)

Calculate the energy-energy gram matrix for a set of configurations X.

Parameters
  • X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species

  • ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)

  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N x N gram matrix of the scalar-valued kernels

Return type

gram (matrix)

calc_gram_ef(X, X_glob, ncores=1, eval_gradient=False)

Calculate the energy-force gram matrix for a set of configurations X. This returns a non-symmetric matrix which is equal to the transpose of the force-energy gram matrix.

Parameters
  • X (list) – list of N1 M1x5 arrays containing xyz coordinates and atomic species

  • X_glob (list) – list of N2 M2x5 arrays containing xyz coordinates and atomic species

  • ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)

  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix

Returns

N2 x N1*3 gram matrix of the vector-valued kernels

Return type

gram (matrix)

class mff.kernels.threebodykernel.ThreeBodyManySpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))

Three body many species kernel.

Parameters
  • theta[0] (float) – lengthscale of the kernel

  • theta[1] (float) – decay rate of the cutoff function

  • theta[2] (float) – cutoff radius

static compile_theano()

This function generates theano compiled kernels for energy and force learning ker_jkmn_withcutoff = ker_jkmn #* cutoff_ikmn

The position of the atoms relative to the centrla one, and their chemical species are defined by a matrix of dimension Mx5

Returns

energy-energy kernel k3_ef (func): energy-force kernel k3_ff (func): force-force kernel

Return type

k3_ee (func)

class mff.kernels.threebodykernel.ThreeBodySingleSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))

Three body two species kernel.

Parameters
  • theta[0] (float) – lengthscale of the kernel

  • theta[1] (float) – decay rate of the cutoff function

  • theta[2] (float) – cutoff radius

static compile_theano()

This function generates theano compiled kernels for energy and force learning ker_jkmn_withcutoff = ker_jkmn #* cutoff_ikmn

The position of the atoms relative to the centrla one, and their chemical species are defined by a matrix of dimension Mx5

Returns

energy-energy kernel k3_ef (func): energy-force kernel k3_ff (func): force-force kernel

Return type

k3_ee (func)

mff.kernels.threebodykernel.dummy_calc_ee(data)

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns

the computed kernel values

Return type

result (array)

mff.kernels.threebodykernel.dummy_calc_ef(data)

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns

the computed kernel values

Return type

result (array)

mff.kernels.threebodykernel.dummy_calc_ff(data)

Function used when multiprocessing. :param data: contains all the information required

for the computation of the kernel values

Returns

the computed kernel values

Return type

result (array)