Gaussian Processes¶
Gaussian process regression module suited to learn and predict energies and forces
Example:
gp = GaussianProcess(kernel, noise)
gp.fit(train_configurations, train_forces)
gp.predict(test_configurations)
-
class
mff.gp.
GaussianProcess
(kernel=None, noise=1e-10, optimizer=None, n_restarts_optimizer=0)¶ Gaussian process class Class of GP regression of QM energies and forces
- Parameters
kernel (obj) – A kernel object (typically a two or three body)
noise (float) – The regularising noise level (typically named sigma_n^2)
optimizer (str) – The kind of optimization of marginal likelihood (not implemented yet)
-
X_train_
¶ The configurations used for training
- Type
list
-
alpha_
¶ The coefficients obtained during training
- Type
array
-
L_
¶ The lower triangular matrix from cholesky decomposition of gram matrix
- Type
array
-
K
¶ The kernel gram matrix
- Type
array
-
calc_gram_ee
(X)¶ Calculate the force-force kernel gram matrix
- Parameters
X (list) – list of N training configurations, which are M x 5 matrices
- Returns
The energy energy gram matrix, has dimensions N x N
- Return type
K (matrix)
-
calc_gram_ff
(X)¶ Calculate the force-force kernel gram matrix
- Parameters
X (list) – list of N training configurations, which are M x 5 matrices
- Returns
The force-force gram matrix, has dimensions 3N x 3N
- Return type
K (matrix)
-
fit
(X, y, ncores=1)¶ Fit a Gaussian process regression model on training forces
- Parameters
X (list) – training configurations
y (np.ndarray) – training forces
ncores (int) – number of CPU workers to use, default is 1
-
fit_energy
(X_glob, y, ncores=1)¶ Fit a Gaussian process regression model using local energies.
- Parameters
X_glob (list of lists of arrays) – list of grouped training configurations
y (np.ndarray) – training total energies
ncores (int) – number of CPU workers to use, default is 1
-
fit_force_and_energy
(X, y_force, X_glob, y_energy, ncores=1)¶ Fit a Gaussian process regression model using forces and energies
- Parameters
X (list of arrays) – training configurations
y_force (np.ndarray) – training forces
X_glob (list of lists of arrays) – list of grouped training configurations
y_energy (np.ndarray) – training total energies
ncores (int) – number of CPU workers to use, default is 1
-
load
(filename)¶ Load a saved GP model
- Parameters
filename (str) – name of the file where the GP is saved
-
log_marginal_likelihood
(theta=None, eval_gradient=False)¶ Returns log-marginal likelihood of theta for training data.
- Parameters
theta – array-like, shape = (n_kernel_params,) or None Kernel hyperparameters for which the log-marginal likelihood is evaluated. If None, the precomputed log_marginal_likelihood of
self.kernel_.theta
is returned.eval_gradient – bool, default: False If True, the gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta is returned additionally. If True, theta must not be None.
- Returns
- float
Log-marginal likelihood of theta for training data.
- log_likelihood_gradientarray, shape = (n_kernel_params,), optional
Gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta. Only returned when eval_gradient is True.
- Return type
log_likelihood
-
predict
(X, return_std=False, ncores=1)¶ Predict forces using the Gaussian process regression model
We can also predict based on an unfitted model by using the GP prior. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True)
- Parameters
X (np.ndarray) – Target configuration where the GP is evaluated
return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
- Returns
Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target
configurations. Only returned when return_std is True.
- Return type
y_mean (np.ndarray)
-
predict_energy
(X, return_std=False, ncores=1, mapping=False, **kwargs)¶ Predict energies from forces only using the Gaussian process regression model
This function evaluates the GP energies for a set of test configurations.
- Parameters
X (np.ndarray) – Target configurations where the GP is evaluated
return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
- Returns
Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target
configurations. Only returned when return_std is True.
- Return type
y_mean (np.ndarray)
-
pseudo_log_likelihood
()¶ Returns pseudo log-likelihood of the training data.
- Parameters
theta – array-like, shape = (n_kernel_params,) or None Kernel hyperparameters for which the log-marginal likelihood is evaluated. If None, the precomputed log_marginal_likelihood of
self.kernel_.theta
is returned.eval_gradient – bool, default: False If True, the gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta is returned additionally. If True, theta must not be None.
- Returns
- float
Log-marginal likelihood of theta for training data.
- log_likelihood_gradientarray, shape = (n_kernel_params,), optional
Gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta. Only returned when eval_gradient is True.
- Return type
log_likelihood
-
save
(filename)¶ Dump the current GP model for later use
- Parameters
filename (str) – name of the file where to save the GP
-
class
mff.gp.
ThreeBodySingleSpeciesGP
(theta, noise=1e-10, optimizer=None, n_restarts_optimizer=0)¶ -
build_grid
(dists, element1)¶ Function that builds and predicts energies on a cube of values
-
-
class
mff.gp.
TwoBodySingleSpeciesGP
(theta, noise=1e-10, optimizer=None, n_restarts_optimizer=0)¶
Two Body Kernel¶
Module that contains the expressions for the 2-body single-species and multi-species kernel. The module uses the Theano package to create the energy-energy, force-energy and force-force kernels through automatic differentiation of the energy-energy kernel. The module is used to calculate the energy-energy, energy-force and force-force gram matrices for the Gaussian processes, and supports multi processing. The module is called by the gp.py script.
Example:
from twobodykernel import TwoBodySingleSpeciesKernel
kernel = kernels.TwoBodySingleSpeciesKernel(theta=[sigma, theta, r_cut])
ee_gram_matrix = kernel.calc_gram_e(training_configurations, number_nodes)
-
class
mff.kernels.twobodykernel.
BaseTwoBody
(kernel_name, theta, bounds)¶ Two body kernel class Handles the functions common to the single-species and multi-species two-body kernels.
- Parameters
kernel_name (str) – To choose between single- and two-species kernel
theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius
bounds (list) – bounds of the kernel function.
-
k2_ee
¶ Energy-energy kernel function
- Type
object
-
k2_ef
¶ Energy-force kernel function
- Type
object
-
k2_ff
¶ Force-force kernel function
- Type
object
-
calc
(X1, X2, ncores=1)¶ Calculate the force-force kernel between two sets of configurations.
- Parameters
X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
- Returns
N2*3 matrix of the vector-valued kernels
- Return type
K (matrix)
-
calc_ee
(X1, X2, ncores=1, mapping=False)¶ Calculate the energy-energy kernel between two global environments.
- Parameters
X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
- Returns
N1 x N2 matrix of the scalar-valued kernels
- Return type
K (matrix)
-
calc_ef
(X_glob, X, ncores=1, mapping=False)¶ Calculate the energy-force kernel between two sets of configurations.
- Parameters
X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
- Returns
N2*3 matrix of the vector-valued kernels
- Return type
K (matrix)
-
calc_gram
(X, ncores=1, eval_gradient=False)¶ Calculate the force-force gram matrix for a set of configurations X.
- Parameters
X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
- Returns
N*3 x N*3 gram matrix of the matrix-valued kernels
- Return type
gram (matrix)
-
calc_gram_e
(X, ncores=1, eval_gradient=False)¶ Calculate the energy-energy gram matrix for a set of configurations X.
- Parameters
X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
- Returns
N x N gram matrix of the scalar-valued kernels
- Return type
gram (matrix)
-
calc_gram_ef
(X, X_glob, ncores=1, eval_gradient=False)¶ Calculate the energy-force gram matrix for a set of configurations X. This returns a non-symmetric matrix which is equal to the transpose of the force-energy gram matrix.
- Parameters
X (list) – list of N1 M1x5 arrays containing xyz coordinates and atomic species
X_glob (list) – list of N2 M2x5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
- Returns
N2 x N1*3 gram matrix of the vector-valued kernels
- Return type
gram (matrix)
-
class
mff.kernels.twobodykernel.
TwoBodyManySpeciesKernel
(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶ Two body many species kernel.
- Parameters
theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius
-
static
compile_theano
()¶ This function generates theano compiled kernels for global energy and force learning
The position of the atoms relative to the central one, and their chemical species are defined by a matrix of dimension Mx5 here called r1 and r2.
- Returns
energy-energy kernel k2_ef (func): energy-force kernel k2_ff (func): force-force kernel
- Return type
k2_ee (func)
-
class
mff.kernels.twobodykernel.
TwoBodySingleSpeciesKernel
(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶ Two body single species kernel.
- Parameters
theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius
-
static
compile_theano
()¶ This function generates theano compiled kernels for global energy and force learning
The position of the atoms relative to the central one, and their chemical species are defined by a matrix of dimension Mx5 here called r1 and r2.
- Returns
energy-energy kernel k2_ef (func): energy-force kernel k2_ff (func): force-force kernel
- Return type
k2_ee (func)
-
mff.kernels.twobodykernel.
dummy_calc_ee
(data)¶ Function used when multiprocessing. :param data: contains all the information required
for the computation of the kernel values
- Returns
the computed kernel values
- Return type
result (array)
-
mff.kernels.twobodykernel.
dummy_calc_ef
(data)¶ Function used when multiprocessing. :param data: contains all the information required
for the computation of the kernel values
- Returns
the computed kernel values
- Return type
result (array)
-
mff.kernels.twobodykernel.
dummy_calc_ff
(data)¶ Function used when multiprocessing. :param data: contains all the information required
for the computation of the kernel values
- Returns
the computed kernel values
- Return type
result (array)
Three Body Kernel¶
Module that contains the expressions for the 3-body single-species and multi-species kernel. The module uses the Theano package to create the energy-energy, force-energy and force-force kernels through automatic differentiation of the energy-energy kernel. The module is used to calculate the energy-energy, energy-force and force-force gram matrices for the Gaussian processes, and supports multi processing. The module is called by the gp.py script.
Example:
from threebodykernel import ThreeBodySingleSpeciesKernel
kernel = kernels.ThreeBodySingleSpeciesKernel(theta=[sigma, theta, r_cut])
ee_gram_matrix = kernel.calc_gram_e(training_configurations, number_nodes)
-
class
mff.kernels.threebodykernel.
BaseThreeBody
(kernel_name, theta, bounds)¶ Three body kernel class Handles the functions common to the single-species and multi-species three-body kernels.
- Parameters
kernel_name (str) – To choose between single- and two-species kernel
theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius
bounds (list) – bounds of the kernel function.
-
k3_ee
¶ Energy-energy kernel function
- Type
object
-
k3_ef
¶ Energy-force kernel function
- Type
object
-
k3_ef_loc
¶ Local Energy-force kernel function
- Type
object
-
k3_ff
¶ Force-force kernel function
- Type
object
-
calc
(X1, X2, ncores=1)¶ Calculate the energy-force kernel between two sets of configurations.
- Parameters
X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
- Returns
N2*3 matrix of the vector-valued kernels
- Return type
K (matrix)
-
calc_ee
(X1, X2, ncores=1, mapping=False)¶ Calculate the energy-energy kernel between two global environments.
- Parameters
X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
- Returns
N1 x N2 matrix of the scalar-valued kernels
- Return type
K (matrix)
-
calc_ef
(X_glob, X, ncores=1, mapping=False)¶ Calculate the energy-force kernel between two sets of configurations.
- Parameters
X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
- Returns
N2*3 matrix of the vector-valued kernels
- Return type
K (matrix)
-
calc_gram
(X, ncores=1, eval_gradient=False)¶ Calculate the force-force gram matrix for a set of configurations X.
- Parameters
X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
- Returns
N*3 x N*3 gram matrix of the matrix-valued kernels
- Return type
gram (matrix)
-
calc_gram_e
(X, ncores=1, eval_gradient=False)¶ Calculate the energy-energy gram matrix for a set of configurations X.
- Parameters
X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
- Returns
N x N gram matrix of the scalar-valued kernels
- Return type
gram (matrix)
-
calc_gram_ef
(X, X_glob, ncores=1, eval_gradient=False)¶ Calculate the energy-force gram matrix for a set of configurations X. This returns a non-symmetric matrix which is equal to the transpose of the force-energy gram matrix.
- Parameters
X (list) – list of N1 M1x5 arrays containing xyz coordinates and atomic species
X_glob (list) – list of N2 M2x5 arrays containing xyz coordinates and atomic species
ncores (int) – Number of CPU nodes to use for multiprocessing (default is 1)
eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
- Returns
N2 x N1*3 gram matrix of the vector-valued kernels
- Return type
gram (matrix)
-
class
mff.kernels.threebodykernel.
ThreeBodyManySpeciesKernel
(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶ Three body many species kernel.
- Parameters
theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius
-
static
compile_theano
()¶ This function generates theano compiled kernels for energy and force learning ker_jkmn_withcutoff = ker_jkmn #* cutoff_ikmn
The position of the atoms relative to the centrla one, and their chemical species are defined by a matrix of dimension Mx5
- Returns
energy-energy kernel k3_ef (func): energy-force kernel k3_ff (func): force-force kernel
- Return type
k3_ee (func)
-
class
mff.kernels.threebodykernel.
ThreeBodySingleSpeciesKernel
(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶ Three body two species kernel.
- Parameters
theta[0] (float) – lengthscale of the kernel
theta[1] (float) – decay rate of the cutoff function
theta[2] (float) – cutoff radius
-
static
compile_theano
()¶ This function generates theano compiled kernels for energy and force learning ker_jkmn_withcutoff = ker_jkmn #* cutoff_ikmn
The position of the atoms relative to the centrla one, and their chemical species are defined by a matrix of dimension Mx5
- Returns
energy-energy kernel k3_ef (func): energy-force kernel k3_ff (func): force-force kernel
- Return type
k3_ee (func)
-
mff.kernels.threebodykernel.
dummy_calc_ee
(data)¶ Function used when multiprocessing. :param data: contains all the information required
for the computation of the kernel values
- Returns
the computed kernel values
- Return type
result (array)
-
mff.kernels.threebodykernel.
dummy_calc_ef
(data)¶ Function used when multiprocessing. :param data: contains all the information required
for the computation of the kernel values
- Returns
the computed kernel values
- Return type
result (array)
-
mff.kernels.threebodykernel.
dummy_calc_ff
(data)¶ Function used when multiprocessing. :param data: contains all the information required
for the computation of the kernel values
- Returns
the computed kernel values
- Return type
result (array)