API reference

Copyright 2020 Marco Arrigoni

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Stand-alone implementation of the CMA-ES

The evolution module

Copyright 2020 Marco Arrigoni

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class clinamen.cmaes.evolution.CMAES(strategy_params, mean=None, covariance=None, step_size=None, random_seed=10, terminator=None)[source]

Implementation of the covariance matrix adaptation evolution strategy.

Parameters
  • strategy_params (StrategyParameters object) – contains the initial strategy parameters

  • mean (1D NumPy array) – the mean vector. If None, is taken as the zero vector

  • covariance (2D NumPy array) – the covariance matrix. If None, is taken as the identity matrix

  • step_size (float) – the global variance. If None, is taken as 1

  • random_seed (int) – the random seed for the random number generator

  • terminator (TerminationCriteria instance) – object that keeps track of termination criteria. If None, a default one will be used.

Notes

If given, mean must have shape (strategy_params.n, ) and covariance (strategy_params.n, strategy_params.n)

property C

The covariance matrix for the current generation

property StrategyParameters

The current instance of StrategyParameters

property Terminator

The current instance of TerminationCriteria

as_dict()[source]

Returns a dictionary with the parameters of the CMAES instance

evolve(manual_mutation=False)[source]

Generator for the evolutionary process.

Parameters

manual_mutation (bool) – When True, mutated individuals must be inserted manually using the method set_mutated_offspring

Yields

a dictionary with the relevant parameters for the current generation

property g

The index of the current generation

classmethod load_status(json_status)[source]

From a CMAES status saved as json, returns a CMAES instance initialized with the information contained in json_status

property m

The mean vector for the current generation

property mutated_offspring

The offpsring object parameters obtained after the mutation

property offspring

The offspring object parameters in the current generation

property pop_size

The population size

property random_seed

The user random seed

save_status(path=None)[source]

Save a json file with the data representing the current status of the evolution. Only data needed to restart a CMAES object are saved.

Parameters

path (str or None) – if not None, is the path to the folder where the .json file will be written

set_fitness_calculator(calculator)[source]

Set a fitness calculator: any object that can take object parameters representing the individuals (one row = one individual) and implements a method that calculates the fitness of individuals.

Basic interface of this fitness calculator:

  • A method called set_object_parameters which accepts a 2D NumPy array of shape (self.pop_size, self.dimension)

  • A method called get_fitness that returns an array of shape (self.pop_size, ) with the calculated fitness for each individual

  • Eventally, a method called get_gradients that returns an array of shape (self.pop_size, self.dimension) with the calculated gradients

Parameters

calculator (a fitness calculator instance) –

set_mutated_offspring(x)[source]

When manual mutation is selected, this method must be used to insert the mutated individuals

Parameters

x (2D NumPy array of shape (self.pop_size, self.dimension)) – the object parameters of the mutated individuals

property step_size

The step size for the current generation

class clinamen.cmaes.evolution.GpCMAES(*args, **kwargs)[source]
as_dict()[source]

Returns a dictionary with the parameters of the CMAES instance

property gradient_coefficient

The parameter controlling the gradient relevance

classmethod load_status(json_status)[source]

From a CMAES status saved as json, returns a CMAES instance initialized with the information contained in json_status

set_gradient_coefficient(alpha)[source]

Set the coefficient that multiplies the average gradient in the update of the mean.

Parameters

alpha (float) –

class clinamen.cmaes.evolution.StrategyParameters(dimension, pop_size=None, weights=None, c_sigma=None, d_sigma=None, c_c=None, c_1=None, c_mu=None, alpha_cov=None, c_m=None, std_min=None, c_g=None)[source]

Class for the initialization, update and tracking of the CMA-ES algorithm.

Parameters
  • dimension (int) – dimensionality of the problem.

  • pop_size (int or None) – population size

  • weights (tuple with pop_size entries or None) – weights used in the algorithm

  • c_sigma (float in (0, 1) or None) – learning rate for the conjugate evolution path used for step-size control

  • d_sigma (float > 0 or None) – damping term

  • c_c (float in [0, 1] or None) – learning rate for the evolution path used in the cumulation procedure

  • c_1 (float in [0, 1] or None) – learning rate for the rank-1 update of the covariance matrix

  • c_mu (float in [0, 1] or None) – learning rate for the rank-mu update of the covariance matrix

  • alpha_cov (float or None) – parameter for calculating default values of the learning rates

  • c_m (float or None) – learning rate for updating the mean. Generally 1, usually <= 1

  • std_min (float or None) – increase the global step size if the std of the individuals fitness is below this value

  • c_g (float or None) – learning rate for the evolution path of the gradient It is used only when the CMAES instance supports gradient usage

Notes

If some parameters are None, default values will be used. It is suggested to leave all the parameters to their default value, with the exception of pop_size and alpha_cov at most.

as_dict()[source]

Returns a dictionary with the parameters needed to initialize a StrategyParameters instance

exception clinamen.cmaes.evolution.TerminationConditionMet[source]
class clinamen.cmaes.evolution.TerminationCriteria(noeffectaxis=True, noeffectcoord=True, conditioncov=True, equalfunvalues=True, maxiter=1000, tolxup=True, smallstd=1e-15)[source]

A class for holding the various termination criteria suggested for the algorithm. If a value is set to None/False, the corresponding criterium will be ingored.

Parameters
  • noeffectaxis (bool) –

  • noeffectcoord (bool) –

  • conditioncov (bool) –

  • equalfunvalues (bool) –

  • maxiter (int) – maximum number of iterations

  • tolxup (bool) –

  • smallstd (float) – stop if the fitness std remains below smallstd for at least 15 iterations.

as_dict()[source]

Returns a dictionary with the parameters needed to initialize a TerminationCriteria instance

set_params(cmaes)[source]

From an instance of a CMAES object, set the value of the needed parameters.

The fitness_calculators module

Copyright 2020 Marco Arrigoni

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class clinamen.cmaes.fitness_calculators.FitnessCalculator(population)[source]

Fitness calculator for Population objects

Parameters

population (Population instance) – the current individual population

set_object_parameters(x)[source]

Set the object parameters to the individuals in self.population

Parameters

x (2D NumPy array) – shape (N, d), with N is the number of individuals in the population and d is the dimensionality of the search space

class clinamen.cmaes.fitness_calculators.FitnessGradientCalculator(population)[source]
get_gradients()[source]

From a Population instance with individuals with an attached calculator, that RETURNS FORCES, computes and returns the gradients for individuals in the population.

class clinamen.cmaes.fitness_calculators.MetaRSFitnessCalculator(population, atoms_within_cutoff, metamodel, data='Xy.hdf5', min_generation=0, train_kwargs={})[source]

Calculator for fitness function using a surrogate fitness metamodel. The metamodel is used exclusively for predicting the total energy. Forces are saved in the training dataset, but are not used for train and prediction

Parameters
  • population (Population instance) – the current individual population

  • atoms_within_cutoff (list on integers) – the indices of the atoms within the cutoff forming the restricted subspace

  • metamodel (MetaModel instance) – the fitness surrogate

  • data (string) – the name of the file which is used to save/read the training data. It will be named data.hdf5

  • min_generation (int. Default 0) – use the meta-model only in the current generation is larger or equal than min_generation

  • train_kwargs (dict) – the keyword-argument pairs to train the metamodel

class clinamen.cmaes.fitness_calculators.RSFitnessCalculator(population, atoms_within_cutoff)[source]

Fitness calculator for Population objects on a restricted subspace.

Parameters
  • population (Population instance) – the current individual population

  • atoms_within_cutoff (list on integers) – the indices of the atoms within the cutoff forming the restricted subspace

set_object_parameters(x)[source]

Set the object parameters to the individuals in self.population

Parameters

x (2D NumPy array) – shape (N, d), with N is the number of individuals in the population and d is the dimensionality of the restricted subspace of interest.

class clinamen.cmaes.fitness_calculators.RSFitnessGradientCalculator(population, atoms_within_cutoff)[source]
get_gradients()[source]

From a Population instance with individuals with an attached calculator, that RETURNS FORCES, computes and returns the gradients for individuals in the population.

clinamen.cmaes.fitness_calculators.write_train_hdf5(file_name, population, energies, forces, name=None)[source]

Append new data to an existing dataset, if the dataset does not exist, create a new one

Parameters
  • file_name (string) – the dataset name

  • population (Population instance) – the new individuals to be added to the dataset

  • energies (array-like of shape (n_individuals, )) – the energies of the individuals in population

  • forces (2D array-like of shape (n_individuals, 3*no_atoms)) – the forces of the individuals in population

  • name (string. Default None) – the system name. A tag that specifies the system when the dataset is created. If the dataset already exists, it checks the it corresponds to system name

The population_evolver module

Copyright 2020 Marco Arrigoni

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class clinamen.cmaes.population_evolver.AnalizeRun(evolver)[source]

Helper class for analyzing the evolution of the population

Parameters

Evolver (PopulationEvolver derived instance) – the evolver to be analyzed. If Evolver is set to None, then the class can be used to analyze an already existing simulation dataframe (set with the method load_dataframe.

evolve()[source]

Evolve generation-by-generation

Yields

dataframe (pandas DataFrame updated to the) – current generation

initialize()[source]

Initialize the evolution

Returns

dataframe – generation 0

Return type

pandas DataFrame with the elements for

load_dataframe(df)[source]

Use this method to analyze a proper simulation dataframe without need of running the whole evolutionary process

Parameters

df (pandas DataFrame) –

static plot_data_vs_generation(df, keys, other_keys=None, samples=[1], serrors=[0], alpha=0.05, **kwargs)[source]

Plot the evolution of key and eventually other_keys, in df with respect to the generation number.

Parameters
  • df (pandas data frame) – it must contains at least 2 columns: ‘generation’, with the generation number, and key.

  • keys (list of strings) – the column labels in df to be plotted. If this is an averaged value, the sample stds are given by se

  • other_keys (None or list of strings) – the eventual other column labels to be plotted

  • samples (list of int) – if one of keys is an average, it is its sample size.

  • serrors (list of float) – if one of keys is an average, it is its sample std. This will be used to calculate the confidence intervals

  • alpha (float in (0, 1)) – defines the wished (1-alpha)*100% confidence interval

  • kwargs (dictionary) – keyword-value pairs for tuning the plot parameters see documentation of pandas.DataFrame.plot.line

plot_energy_vs_generation(**kwargs)[source]

Plots the evolution of the mean population energy as a function of the generation.

Parameters

kwargs (dictionary) – keyword-value pairs for tuning the plot parameters see documentation of pandas.DataFrame.plot.line

run()[source]

Evolve the population until a termination criterion is met.

Returns

dataframe – stores information about the evolution of Evolver

Return type

pandas DataFrame

class clinamen.cmaes.population_evolver.GpPopulationEvolver(c_alpha, nn_cutoff, c_r, founder, **kwargs)[source]

Exploit the gradient during the run.

Parameters
  • c_alpha (float) – coefficient describing the relevance of the gradient term

  • nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix

  • c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix

  • founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.

  • kwargs (other keyword arguments necessary to initialize a PopulationEvolver) – instance

cmaes_obj

alias of clinamen.cmaes.evolution.GpCMAES

fitness_calc_obj

alias of clinamen.cmaes.fitness_calculators.FitnessGradientCalculator

class clinamen.cmaes.population_evolver.PopulationEvolver(founder, step_size=0.2, covariance=None, dmin=None, random_seed=10)[source]

Evolves a Population instance using the CMA-ES algorithm

Parameters
  • founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.

  • step_size (float > 0) – initial step size used in the CMA-ES algorithm default 0.2 Angstrom

  • covariance (None or float or 1D array or 2D array) – the initial covariance matrix. Default is the identity matrix. If covariance is a float, then the matrix is diagonal with that value on the diagonal. If it is 1D array, it is still diagonal with that array on the diagonal.

  • dmin (float) – minimum distance between two atoms to consider an individual to be valid. Default None (0.5 of the minimum bond distance)

  • random_seed (int) – random seed to be used for generating random variates. Default to 10

cmaes_obj

alias of clinamen.cmaes.evolution.CMAES

property cmaes_parameters

Return a dictionary with the objects needed to initialize the instance’s CMAES object

evolve_population()[source]

Evolve the current population. Returns a generator with the relevant parameters of the current generation.

fitness_calc_obj

alias of clinamen.cmaes.fitness_calculators.FitnessCalculator

get_object_parameters()[source]

Returns the object parameters as a NumPy 2D array of shape (N, d), where N is the number of individuals in the population and d is the search space dimension

property population

The current Population instance

save_population(generation)[source]

Append the current population to self.evolution_history file

Parameters

generation (int) – the index of the current generation. Used to create a corresponding new group in the hdf5 file

set_cmaes(cmaes)[source]

Set the a custom CMAES object to overwrite the default one.

Parameters

cmaes (instance of CMAES) –

set_strategy_parameters(strategy_params)[source]

Set the values of the strategy parameters to overwrite the default ones.

Parameters

strategy_params (instance of StrategyParameters) –

set_termination_criteria(termination_criteria)[source]

Set the values of the termination criteria to overwrite the default ones.

Parameters

termination_criteria (instance of TerminationCriteria) –

class clinamen.cmaes.population_evolver.RSPopulationEvolver(nn_cutoff, c_r, founder, **kwargs)[source]

Restricted-subspace population evolver: only the genotype for atoms inside a cutoff radius is considered

Parameters
  • nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix

  • c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix

  • founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.

  • kwargs (other keyword arguments necessary to initialize a PopulationEvolver) – instance

Notes

Similar to SSPopulationEvolver but only the atoms within the cutoff are moved.

fitness_calc_obj

alias of clinamen.cmaes.fitness_calculators.RSFitnessCalculator

get_object_parameters()[source]

Returns the object parameters as a NumPy 2D array of shape (N, d), where N is the number of individuals in the population and d is the dimension of the restricted subspace

property use_reduced_population_size

Bool, if True, choose automatically the population size as based on the dimension of the restricted subspace. If False, uses the population size given by the StrategyParameters instance given at initialization. Default False.

class clinamen.cmaes.population_evolver.RSPopulationEvolverGrad(c_alpha, nn_cutoff, c_r, founder, **kwargs)[source]
Parameters
  • c_alpha (float) – coefficient describing the relevance of the gradient term

  • nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix

  • c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix

  • founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.

  • kwargs (other keyword arguments necessary to initialize a PopulationEvolver) – instance

Notes

Similar to RSPopulationEvolver but gradients are also used. are moved.

cmaes_obj

alias of clinamen.cmaes.evolution.GpCMAES

fitness_calc_obj

alias of clinamen.cmaes.fitness_calculators.RSFitnessGradientCalculator

class clinamen.cmaes.population_evolver.RSPopulationEvolverMetamodel(metamodel, dataset, nn_cutoff, c_r, founder, min_generation=0, train_kwargs={}, **kwargs)[source]

RS Fitness Calculator with a metamodel to be trained on-the-fly

Parameters
  • metamodel (a Metamodel object that will be used to make the energy) – predictions

  • dataset (string) – the name of the .hdf5 file which will be used to write/read the training data

  • nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix

  • c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix

  • founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.

  • min_generation (int. Default 0) – use the meta-model only in the current generation is larger or equal than min_generation

  • train_kwargs (dict) – the keyword-argument values used to train the metamodel

  • kwargs (other keyword arguments necessary to initialize a PopulationEvolver) – instance

fitness_calc_obj

alias of clinamen.cmaes.fitness_calculators.MetaRSFitnessCalculator

class clinamen.cmaes.population_evolver.SSPopulationEvolver(nn_cutoff, c_r, founder, **kwargs)[source]

Add to the initial covariance matrix a rank-s matrix increasing the variance for coordinates representing atoms close to the point defect position

Parameters
  • nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix

  • c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix

  • founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.

  • kwargs (other keyword arguments necessary to initialize a PopulationEvolver) – instance

property atoms_within_cutoff

Indices of the atoms within the cutoff

property basis_coefficients

The distances of the atoms within the cutoff and the coefficients per atom which are used to add the rank-s matrix to the initial covariance matrix

property nn_cutoff

Cutoff selecting the NN to the defect which will form the basis for the selected subspace

property number_of_nn

The number of nearest neighbors atoms within self.nn_cutoff

property selected_subspace_basis

The basis spanning the selected subspace. Order with respect to the atomic distances from the defect

Objects describing the genotype of individuals and their populations

  • clinamen.evpd.core

The evpd.core.individual module

Copyright 2020 Marco Arrigoni

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class clinamen.evpd.core.individual.Individual(*args, **kwargs)[source]

Class for representing an individual in a population.

calculate_comparison_distances(other)[source]

Calculates the relative distance and the maximum distance discrepancy between this individual and another one.

Parameters

other (Individual instance) –

Returns

rel_dist, max_dist – distance between the two instances

Return type

the relative and maximum discrepancy

calculation_required()[source]

Returns True if a new energy calculation is required

property chromosome

The chromosome of an individual is the set of displacements from an initial configuration. Usually one very similar to the pristine system

clone()[source]

Copy the instance, return a new instance

property cost

Value of the cost function of the individual (its energy)

property defect_position

Location of the eventual defect in the structure

property distances_from_defect

Distance of each atom in the system from the defect

property dmax

Tolerance for the maximum distance discrepancy between two individuals. This distance is defined as:

\[d_max(i, j) = max_k(|d_i(k) - d_j(k)|)\]
property dmin

Tolerance for the minimum distance at which two atoms can be located. Used to reject a structure where two atoms are too close.

Default value = 0.25 minimum bond length in the system

property drel

Tolerance for the relative distance between two individuals. The relative distance is defined as:

\[d_{rel}(i, j) = \frac{\sum_k |d_i(k) - d_j(k)|}{\sum_k d_i(k)}\]
property etol

Tolerance for comparing costs between two individuals

property fitness

Fitness value of the individual

get_forces(*args, **kwargs)[source]

Calculate atomic forces.

Ask the attached calculator to calculate the forces and apply constraints. Use apply_constraint=False to get the raw forces.

For molecular dynamics (md=True) we don’t apply the constraint to the forces but to the momenta. When holonomic constraints for rigid linear triatomic molecules are present, ask the constraints to redistribute the forces within each triple defined in the constraints (required for molecular dynamics with this type of constraints).

has_proper_structure()[source]

Returns False if any interatomic distance in the system is smaller than self.dmin.

static make_individual_from_ase_atoms(atoms)[source]

Takes an ase.Atoms instance and returns an evpd.core.Individual instance.

The result is analogous to using atoms.copy(), but also the calculator will be copied.

property metric_tensor

The metric tensor of the cell

property my_name

Identifier

optimize_structure(**kwargs)[source]

Optimize the structure

Parameters

kwargs (dict) – parameters for running the geometry optimization. A mandatory key is optimizer, which is an ase optimizer. The other key:value pairs are the parameters to supply to optimizer.

set_calculator_factory(calc_factory, calc_parameters)[source]
Parameters
  • calc_factory (a class derived from) – ase.calculators.interface.Calculator or a function generating a calculator

  • calc_parameters (dict) – keyword-value pairs used to initialize a calc_factory instance

property total_energy_calculations

Number of times the energy of the individual has been calculated

update_fitness()[source]

Calculate the total energy of the individual

write_poscar(path=None)[source]

If path is not None, it is the folder where the POSCAR file will be saved

The evpd.core.population module

Copyright 2020 Marco Arrigoni

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

exception clinamen.evpd.core.population.BadPopulationMember[source]

Raise the exception when one tries to add to a Population instance an object which is not an Individual instance

class clinamen.evpd.core.population.Population(*individuals)[source]

A population is a group of individual of a given size.

Parameters

individuals (a list or tuple of individuals.) – the container of individuals forming the population. It can also be a single individual or empty.

property individuals_fitness

Return a list with the fitness of each individual

insert(index, value)[source]

S.insert(index, value) – insert value before index

Crystal structure fingerprint descriptors and utilities for general descriptors

The descriptors_cython module

The descriptors.utils module

Copyright 2020 Marco Arrigoni

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

clinamen.descriptors.utils.read_descriptors_by_id(file_name, ids)[source]

Given an id or a list thereof, it returns the eventual descriptors and Jacobians

Parameters
  • file_name (string) – the hdf5 file from where the descriptors should be fetched

  • ids (iterable) – the identity keys of the descriptors we want to fetch

Returns

X, DX indices – the descriptors and their Jacobians, each as a list, and a list representing the indices corresponding to the ids in ids that were found. If the Jacobians are not present, None is returned

Return type

tuple

clinamen.descriptors.utils.write_descriptors(file_name, descriptors, descriptors_grads, ids, name=None, flattened=True)[source]

Append new data to an existing dataset, if the dataset does not exist, create a new one

Parameters
  • file_name (string) – the dataset name

  • descriptors (2D array-like of shape (n, d), if flattened is True.) – n is the number of structures for which the descriptors were calculated. d is the dimensionality of the descriptors. If flattened is False, descriptors can be a multidimensional array of shape (n, …).

  • descriptors_grads (2D array-like of shape (n, r) if flattened is True.) – Otherwise, it can be a multidimensional array of shape (n, …). It can also be None. If not None, these are the (possibly flattened, if flattened is True) Jacobians of the descriptors.

  • name (string. Default None) – the system name. A tag that specifies the system when the dataset is created. If the dataset already exists, it checks the it corresponds to system name

  • ids (array-like of shape (n, )) – for each descriptor, is a string that identifies the structure coresponding to that descriptor

Utilities for the unsupervised classification of clusters

The clustering.misc module

Copyright 2020 Marco Arrigoni

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

clinamen.clustering.misc.calculate_k_distances(dataset, k, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None)[source]

Calculate and return the k-NN distances for each point in the data set. It uses sklearn.neighbors.NearestNeighbors, so look at the documentation for the parameters meaning.

Parameters
  • dataset (2D array) – the dataset

  • k (int) – the nearest neighbor number to consider

Returns

k_distances – the k-NN distances for each point in the dataset sorted in descending order.

Return type

1D array

clinamen.clustering.misc.find_centroids(data, labels)[source]

Finds the centroids locations of the clusters.

Parameters
  • data (2D array) – The dataset. Each row represents one structure

  • labels (1D array) – labels[i] is the index of the cluster where data[i] belongs. A label with value -1 is considered to represent noise. Its centroid will not be returned.

Returns

centroids – the key are the clusters indices, the values are the coordinates of the centroid

Return type

dict

clinamen.clustering.misc.get_structure_group_index(structure_name, groups)[source]

Given the name of a structure, returns the group index it belongs to.

Parameters
  • structure_name (string) – the name of the structure

  • groups (dict) – key:value pairs of cluster indices and a list of the name of the structures belonging to that cluster

Returns

key – the index of the cluster

Return type

int

clinamen.clustering.misc.group_structures_in_clusters(ordered_structures, cluster_labels)[source]

Group a list of structure names according to the cluster they belong to.

Parameters
  • ordered_structures (list) – Ordered list of structure names. The ordering is done by matching the dataset: the i-th element in the dataset is the structure corresponding to ordered_structures[i]

  • cluster_labels (list) – cluster_labels[i] is the label of the cluster where ordered_structures[i] belongs to.

Returns

groups – keys are cluster labels and the values are lists with the structures belonging to that cluster.

Return type

defaultdict

clinamen.clustering.misc.make_reachability_plot(optics_instance, x_lims=None)[source]

Make the reachability plot from a trained scikit learn OPTICS instance

Parameters
  • optics_instance (scikit learn OPTICS instance) – a trained instance

  • x_lims (tuple) – x limits to be plotted

clinamen.clustering.misc.plot_cluster_plot(data, labels, title, ordered_structures, plot_kwargs={}, cmap=<matplotlib.colors.LinearSegmentedColormap object>, show_names=True, plot_chull=False, plot_centroids=True, ax=None)[source]

Make a scatter plot of the clusters.

Parameters
  • data (2D array) – The dataset. Each row represents one structure

  • labels (1D array) – labels[i] is the index of the cluster where data[i] belongs. A label with value -1 is considered to represent noise. Its points are represented by crosses.

  • title (string) – the plot title

  • ordered_structures (list) – the i-th element is the structure name for data[i]

  • plot_kwargs (dict) – key:value pairs to fine-tune the plot

  • cmap (matplotlib cmap instance. Default cm.jet) – the colormap to be used in the plot

  • show_names (bool. Default True) – if True, the structure names will be shown in the plot

  • plot_chull (bool. Default False) – if True, plots also the convex hull of points in the cluster

  • plot_centroids (bool. Default True) – if True, the centroids of each cluster are also plotted

  • ax (matplotlib Axes instance or None. Default is None) – the axes for the plot. If None, the current axes is taken. (TODO)

clinamen.clustering.misc.write_clustered_structures(groups, key)[source]

Write on a text file all the structures belonging to a given cluster

Parameters
  • groups (dict) – key:value pairs of cluster indices and a list of the name of the structures belonging to that cluster

  • key (int) – the cluster index

Returns

fname – the name of the just-written text file

Return type

string

The clustering.stats_tools module

Copyright 2020 Marco Arrigoni

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

clinamen.clustering.stats_tools.calculate_B_coefficient_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]

Calculates the Bhattacharyya coefficient between two normal distributions

Parameters
  • mean_1 (1D np.ndarray) – the mean vectors of the distributions

  • mean_2 (1D np.ndarray) – the mean vectors of the distributions

  • covariance_1 (2D np.ndarray) – the covariance matrices of the distributions

  • covariance_2 (2D np.ndarray) – the covariance matrices of the distributions

Returns

b_coeff – the Bhattacharyya coefficient

Return type

float

clinamen.clustering.stats_tools.calculate_B_distance_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]

Calculates the Bhattacharyya distance between two normal distributions.

Parameters
  • mean_1 (1D np.ndarray) – the mean vectors of the distributions

  • mean_2 (1D np.ndarray) – the mean vectors of the distributions

  • covariance_1 (2D np.ndarray) – the covariance matrices of the distributions

  • covariance_2 (2D np.ndarray) – the covariance matrices of the distributions

Returns

distance – the Bhattacharyya distance

Return type

float

clinamen.clustering.stats_tools.calculate_H_distance_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]

Calculate the Hellinger distance between two gaussians.

Parameters
  • mean_1 (1D np.ndarray) – the mean vectors of the distributions

  • mean_2 (1D np.ndarray) – the mean vectors of the distributions

  • covariance_1 (2D np.ndarray) – the covariance matrices of the distributions

  • covariance_2 (2D np.ndarray) – the covariance matrices of the distributions

Returns

distance – the Hellinger distance

Return type

float

clinamen.clustering.stats_tools.calculate_KL_divergence_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]

Calculate the Kullback-Leibler divergence between two Gaussians: KL(G1 || G2) = E_1[ln G1 - ln G2]

Parameters
  • mean_1 (1D np.ndarray) – the mean vectors of the distributions

  • mean_2 (1D np.ndarray) – the mean vectors of the distributions

  • covariance_1 (2D np.ndarray) – the covariance matrices of the distributions

  • covariance_2 (2D np.ndarray) – the covariance matrices of the distributions

Returns

divergence – the KL divergence

Return type

float

clinamen.clustering.stats_tools.integral_multivariate_standard_normal_rectangular_region(region)[source]

Compute the probability that a normal standard vector assumes values in a rectangular region

Parameters

region (tuple of 2-ple) – region = ((a_1, b_1), (a_2, b_2), … , (a_k, b_k)) the number of tuples gives the dimension of the random vector. Each 2-ple contains the initial and final integration limits on the considered direction

Returns

probability – the corresponding probability

Return type

float

clinamen.clustering.stats_tools.read_cmaes_status(json_status)[source]

Read and parse a CMAES status

Parameters

json_status (string) – the path to the json file describing the CMAES status

Returns

data – the dictionary with the retrieved data

Return type

dict

Objects representing the metamodel

The metamodel module

Copyright 2020 Marco Arrigoni

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class clinamen.metamodel.metamodel.BaseExactGPMetaModel(descriptors_database, preprocessing_pipeline=None, std_value=0.01)[source]

Basic Exact GP regressor with a RBF kernel for minimal initialization effort.

class clinamen.metamodel.metamodel.BasePCAExactGPMetaModel(descriptors_database, scaler_kwargs, pca_kwargs, std_value=0.01)[source]

Basic Exact GP regressor with a RBF kernel for minimal initialization effort. The inputs are automatically passed through a pipeline that scales them and then performs PCA

class clinamen.metamodel.metamodel.ExactGPMetaModel(descriptors_database, mean_function=None, mean_function_kwargs=None, kernel_function=None, kernel_function_kwargs=None, likelihood_function=None, likelihood_function_kwargs=None, optimizer=None, optimizer_kwargs=None, marginal_likelihood_function=None, marginal_likelihood_function_kwargs=None, preprocessing_pipeline=None, std_value=0.01)[source]

Class for making a meta-model based on a Gaussian Process Regressor with exact inference.

initialize_model(X_train, y_train)[source]

This function initializes the GP model (gpytorch.models) class, which means it initializes the mean function, kernel function, and the likelihood. The function also initializes the optimizer and the marginal log likelihood.

All these initialized objects must be assigned to the respectie attributes:

  • self._mean_function

  • self._kernel_function

  • self._likelihood

  • self._model

  • self._optimizer

  • self._mll

which can then be accessed through the corresponding property

class clinamen.metamodel.metamodel.GPMetaModel(descriptors_database, mean_function=None, mean_function_kwargs=None, kernel_function=None, kernel_function_kwargs=None, likelihood_function=None, likelihood_function_kwargs=None, optimizer=None, optimizer_kwargs=None, marginal_likelihood_function=None, marginal_likelihood_function_kwargs=None, preprocessing_pipeline=None, std_value=0.01)[source]

Class for creating a Gaussian Process metamodel.

fit(structures, y, epochs=10000, stopping=0.001, stopping_epochs=10, verbose=False)[source]

Train the meta-model

Parameters
  • population (Iterable of structures of length n_samples.) –

  • y (np.ndarray of shape (n_samples, )) – the total energy of the structures in structures

  • epochs (int) – the number of epochs for training the metamodel

  • stopping (float) – the loss function minimum change to trigger early stopping

  • stopping_epochs (float) – for how many epochs the loss function should change by less than stopping in order to enforce early stopping

  • verbose (bool) – If True, prints the loss function every 100 epochs

abstract initialize_model(X_train, y_train)[source]

This function initializes the GP model (gpytorch.models) class, which means it initializes the mean function, kernel function, and the likelihood. The function also initializes the optimizer and the marginal log likelihood.

All these initialized objects must be assigned to the respectie attributes:

  • self._mean_function

  • self._kernel_function

  • self._likelihood

  • self._model

  • self._optimizer

  • self._mll

which can then be accessed through the corresponding property

property loaded_state

If True, it means that the state of the model has been loaded from an external file

predict(structures)[source]

Predict the total energy for each individuals in an iterable of structures.

Parameters
  • population (Iterable of structures of length n_samples.) –

  • Returns

  • --------

  • mean (np.array) – the predicted energies

  • std (np.array) – the predicted standard deviations

read_descriptors()[source]

Read the descriptors from the hdf5 database file.

save_model(filename='model_state.pth')[source]

Save the GP model

write_descriptors(structures)[source]

Save the descriptors into the database hdf5 file.