API reference¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Stand-alone implementation of the CMA-ES¶

clinamen.cmaes.evolution
clinamen.cmaes.fitness_calculators
clinamen.cmaes.population_evolver

The `evolution` module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class clinamen.cmaes.evolution.CMAES(strategy_params, mean=None, covariance=None, step_size=None, random_seed=10, terminator=None)[source]¶

Implementation of the covariance matrix adaptation evolution strategy.

Parameters

strategy_params (StrategyParameters object) – contains the initial strategy parameters
mean (1D NumPy array) – the mean vector. If None, is taken as the zero vector
covariance (2D NumPy array) – the covariance matrix. If None, is taken as the identity matrix
step_size (float) – the global variance. If None, is taken as 1
random_seed (int) – the random seed for the random number generator
terminator (TerminationCriteria instance) – object that keeps track of termination criteria. If None, a default one will be used.

Notes

If given, mean must have shape (strategy_params.n, ) and covariance (strategy_params.n, strategy_params.n)

property C¶: The covariance matrix for the current generation

property StrategyParameters¶: The current instance of StrategyParameters

property Terminator¶: The current instance of TerminationCriteria

as_dict()[source]¶: Returns a dictionary with the parameters of the CMAES instance

evolve(manual_mutation=False)[source]¶

Generator for the evolutionary process.

Parameters: manual_mutation (bool) – When True, mutated individuals must be inserted manually using the method set_mutated_offspring
Yields: a dictionary with the relevant parameters for the current generation

property g¶: The index of the current generation

classmethod load_status(json_status)[source]¶: From a CMAES status saved as json, returns a CMAES instance initialized with the information contained in json_status

property m¶: The mean vector for the current generation

property mutated_offspring¶: The offpsring object parameters obtained after the mutation

property offspring¶: The offspring object parameters in the current generation

property pop_size¶: The population size

property random_seed¶: The user random seed

save_status(path=None)[source]¶

Save a json file with the data representing the current status of the evolution. Only data needed to restart a CMAES object are saved.

Parameters: path (str or None) – if not None, is the path to the folder where the .json file will be written

set_fitness_calculator(calculator)[source]¶

Set a fitness calculator: any object that can take object parameters representing the individuals (one row = one individual) and implements a method that calculates the fitness of individuals.

Basic interface of this fitness calculator:

Parameters: calculator (a fitness calculator instance) –

set_mutated_offspring(x)[source]¶

When manual mutation is selected, this method must be used to insert the mutated individuals

Parameters: x (2D NumPy array of shape (self.pop_size, self.dimension)) – the object parameters of the mutated individuals

property step_size¶: The step size for the current generation

class clinamen.cmaes.evolution.GpCMAES(*args, **kwargs)[source]¶

as_dict()[source]¶: Returns a dictionary with the parameters of the CMAES instance

property gradient_coefficient¶: The parameter controlling the gradient relevance

classmethod load_status(json_status)[source]¶: From a CMAES status saved as json, returns a CMAES instance initialized with the information contained in json_status

set_gradient_coefficient(alpha)[source]¶

Set the coefficient that multiplies the average gradient in the update of the mean.

Parameters: alpha (float) –

class clinamen.cmaes.evolution.StrategyParameters(dimension, pop_size=None, weights=None, c_sigma=None, d_sigma=None, c_c=None, c_1=None, c_mu=None, alpha_cov=None, c_m=None, std_min=None, c_g=None)[source]¶

Class for the initialization, update and tracking of the CMA-ES algorithm.

Parameters

dimension (int) – dimensionality of the problem.
pop_size (int or None) – population size
weights (tuple with pop_size entries or None) – weights used in the algorithm
c_sigma (float in (0, 1) or None) – learning rate for the conjugate evolution path used for step-size control
d_sigma (float > 0 or None) – damping term
c_c (float in [0, 1] or None) – learning rate for the evolution path used in the cumulation procedure
c_1 (float in [0, 1] or None) – learning rate for the rank-1 update of the covariance matrix
c_mu (float in [0, 1] or None) – learning rate for the rank-mu update of the covariance matrix
alpha_cov (float or None) – parameter for calculating default values of the learning rates
c_m (float or None) – learning rate for updating the mean. Generally 1, usually <= 1
std_min (float or None) – increase the global step size if the std of the individuals fitness is below this value
c_g (float or None) – learning rate for the evolution path of the gradient It is used only when the CMAES instance supports gradient usage

Notes

If some parameters are None, default values will be used. It is suggested to leave all the parameters to their default value, with the exception of pop_size and alpha_cov at most.

as_dict()[source]¶: Returns a dictionary with the parameters needed to initialize a StrategyParameters instance

exception clinamen.cmaes.evolution.TerminationConditionMet[source]¶

class clinamen.cmaes.evolution.TerminationCriteria(noeffectaxis=True, noeffectcoord=True, conditioncov=True, equalfunvalues=True, maxiter=1000, tolxup=True, smallstd=1e-15)[source]¶

A class for holding the various termination criteria suggested for the algorithm. If a value is set to None/False, the corresponding criterium will be ingored.

Parameters

noeffectaxis (bool) –
noeffectcoord (bool) –
conditioncov (bool) –
equalfunvalues (bool) –
maxiter (int) – maximum number of iterations
tolxup (bool) –
smallstd (float) – stop if the fitness std remains below smallstd for at least 15 iterations.

as_dict()[source]¶: Returns a dictionary with the parameters needed to initialize a TerminationCriteria instance

set_params(cmaes)[source]¶: From an instance of a CMAES object, set the value of the needed parameters.

The `fitness_calculators` module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class clinamen.cmaes.fitness_calculators.FitnessCalculator(population)[source]¶

Fitness calculator for Population objects

Parameters: population (Population instance) – the current individual population

set_object_parameters(x)[source]¶

Set the object parameters to the individuals in self.population

Parameters: x (2D NumPy array) – shape (N, d), with N is the number of individuals in the population and d is the dimensionality of the search space

class clinamen.cmaes.fitness_calculators.FitnessGradientCalculator(population)[source]¶

get_gradients()[source]¶: From a Population instance with individuals with an attached calculator, that RETURNS FORCES, computes and returns the gradients for individuals in the population.

class clinamen.cmaes.fitness_calculators.MetaRSFitnessCalculator(population, atoms_within_cutoff, metamodel, data='Xy.hdf5', min_generation=0, train_kwargs={})[source]¶

Calculator for fitness function using a surrogate fitness metamodel. The metamodel is used exclusively for predicting the total energy. Forces are saved in the training dataset, but are not used for train and prediction

Parameters

population (Population instance) – the current individual population
atoms_within_cutoff (list on integers) – the indices of the atoms within the cutoff forming the restricted subspace
metamodel (MetaModel instance) – the fitness surrogate
data (string) – the name of the file which is used to save/read the training data. It will be named data.hdf5
min_generation (int. Default 0) – use the meta-model only in the current generation is larger or equal than min_generation
train_kwargs (dict) – the keyword-argument pairs to train the metamodel

class clinamen.cmaes.fitness_calculators.RSFitnessCalculator(population, atoms_within_cutoff)[source]¶

Fitness calculator for Population objects on a restricted subspace.

Parameters

population (Population instance) – the current individual population
atoms_within_cutoff (list on integers) – the indices of the atoms within the cutoff forming the restricted subspace

set_object_parameters(x)[source]¶

Set the object parameters to the individuals in self.population

Parameters: x (2D NumPy array) – shape (N, d), with N is the number of individuals in the population and d is the dimensionality of the restricted subspace of interest.

class clinamen.cmaes.fitness_calculators.RSFitnessGradientCalculator(population, atoms_within_cutoff)[source]¶

get_gradients()[source]¶: From a Population instance with individuals with an attached calculator, that RETURNS FORCES, computes and returns the gradients for individuals in the population.

clinamen.cmaes.fitness_calculators.write_train_hdf5(file_name, population, energies, forces, name=None)[source]¶

Append new data to an existing dataset, if the dataset does not exist, create a new one

Parameters

file_name (string) – the dataset name
population (Population instance) – the new individuals to be added to the dataset
energies (array-like of shape (n_individuals, )) – the energies of the individuals in population
forces (2D array-like of shape (n_individuals, 3*no_atoms)) – the forces of the individuals in population
name (string. Default None) – the system name. A tag that specifies the system when the dataset is created. If the dataset already exists, it checks the it corresponds to system name

The `population_evolver` module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class clinamen.cmaes.population_evolver.AnalizeRun(evolver)[source]¶

Helper class for analyzing the evolution of the population

Parameters: Evolver (PopulationEvolver derived instance) – the evolver to be analyzed. If Evolver is set to None, then the class can be used to analyze an already existing simulation dataframe (set with the method load_dataframe.

evolve()[source]¶

Evolve generation-by-generation

Yields: dataframe (pandas DataFrame updated to the) – current generation

initialize()[source]¶

Initialize the evolution

Returns: dataframe – generation 0
Return type: pandas DataFrame with the elements for

load_dataframe(df)[source]¶

Use this method to analyze a proper simulation dataframe without need of running the whole evolutionary process

Parameters: df (pandas DataFrame) –

static plot_data_vs_generation(df, keys, other_keys=None, samples=[1], serrors=[0], alpha=0.05, **kwargs)[source]¶

Plot the evolution of key and eventually other_keys, in df with respect to the generation number.

Parameters

df (pandas data frame) – it must contains at least 2 columns: ‘generation’, with the generation number, and key.
keys (list of strings) – the column labels in df to be plotted. If this is an averaged value, the sample stds are given by se
other_keys (None or list of strings) – the eventual other column labels to be plotted
samples (list of int) – if one of keys is an average, it is its sample size.
serrors (list of float) – if one of keys is an average, it is its sample std. This will be used to calculate the confidence intervals
alpha (float in (0, 1)) – defines the wished (1-alpha)*100% confidence interval
kwargs (dictionary) – keyword-value pairs for tuning the plot parameters see documentation of pandas.DataFrame.plot.line

plot_energy_vs_generation(**kwargs)[source]¶

Plots the evolution of the mean population energy as a function of the generation.

Parameters: kwargs (dictionary) – keyword-value pairs for tuning the plot parameters see documentation of pandas.DataFrame.plot.line

run()[source]¶

Evolve the population until a termination criterion is met.

Returns: dataframe – stores information about the evolution of Evolver
Return type: pandas DataFrame

class clinamen.cmaes.population_evolver.GpPopulationEvolver(c_alpha, nn_cutoff, c_r, founder, **kwargs)[source]¶

Exploit the gradient during the run.

Parameters

c_alpha (float) – coefficient describing the relevance of the gradient term
nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix
c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix
founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.
kwargs (other keyword arguments necessary to initialize a PopulationEvolver) – instance

cmaes_obj¶: alias of clinamen.cmaes.evolution.GpCMAES

fitness_calc_obj¶: alias of clinamen.cmaes.fitness_calculators.FitnessGradientCalculator

class clinamen.cmaes.population_evolver.PopulationEvolver(founder, step_size=0.2, covariance=None, dmin=None, random_seed=10)[source]¶

Evolves a Population instance using the CMA-ES algorithm

Parameters

founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.
step_size (float > 0) – initial step size used in the CMA-ES algorithm default 0.2 Angstrom
covariance (None or float or 1D array or 2D array) – the initial covariance matrix. Default is the identity matrix. If covariance is a float, then the matrix is diagonal with that value on the diagonal. If it is 1D array, it is still diagonal with that array on the diagonal.
dmin (float) – minimum distance between two atoms to consider an individual to be valid. Default None (0.5 of the minimum bond distance)
random_seed (int) – random seed to be used for generating random variates. Default to 10

cmaes_obj¶: alias of clinamen.cmaes.evolution.CMAES

property cmaes_parameters¶: Return a dictionary with the objects needed to initialize the instance’s CMAES object

evolve_population()[source]¶: Evolve the current population. Returns a generator with the relevant parameters of the current generation.

fitness_calc_obj¶: alias of clinamen.cmaes.fitness_calculators.FitnessCalculator

get_object_parameters()[source]¶: Returns the object parameters as a NumPy 2D array of shape (N, d), where N is the number of individuals in the population and d is the search space dimension

property population¶: The current Population instance

save_population(generation)[source]¶

Append the current population to self.evolution_history file

Parameters: generation (int) – the index of the current generation. Used to create a corresponding new group in the hdf5 file

set_cmaes(cmaes)[source]¶

Set the a custom CMAES object to overwrite the default one.

Parameters: cmaes (instance of CMAES) –

set_strategy_parameters(strategy_params)[source]¶

Set the values of the strategy parameters to overwrite the default ones.

Parameters: strategy_params (instance of StrategyParameters) –

set_termination_criteria(termination_criteria)[source]¶

Set the values of the termination criteria to overwrite the default ones.

Parameters: termination_criteria (instance of TerminationCriteria) –

class clinamen.cmaes.population_evolver.RSPopulationEvolver(nn_cutoff, c_r, founder, **kwargs)[source]¶

Restricted-subspace population evolver: only the genotype for atoms inside a cutoff radius is considered

Parameters

nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix
c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix
founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.
kwargs (other keyword arguments necessary to initialize a PopulationEvolver) – instance

Notes

Similar to SSPopulationEvolver but only the atoms within the cutoff are moved.

fitness_calc_obj¶: alias of clinamen.cmaes.fitness_calculators.RSFitnessCalculator

get_object_parameters()[source]¶: Returns the object parameters as a NumPy 2D array of shape (N, d), where N is the number of individuals in the population and d is the dimension of the restricted subspace

property use_reduced_population_size¶: Bool, if True, choose automatically the population size as based on the dimension of the restricted subspace. If False, uses the population size given by the StrategyParameters instance given at initialization. Default False.

class clinamen.cmaes.population_evolver.RSPopulationEvolverGrad(c_alpha, nn_cutoff, c_r, founder, **kwargs)[source]¶

Parameters

c_alpha (float) – coefficient describing the relevance of the gradient term
nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix
c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix
founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.
kwargs (other keyword arguments necessary to initialize a PopulationEvolver) – instance

Notes

Similar to RSPopulationEvolver but gradients are also used. are moved.

cmaes_obj¶: alias of clinamen.cmaes.evolution.GpCMAES

fitness_calc_obj¶: alias of clinamen.cmaes.fitness_calculators.RSFitnessGradientCalculator

class clinamen.cmaes.population_evolver.RSPopulationEvolverMetamodel(metamodel, dataset, nn_cutoff, c_r, founder, min_generation=0, train_kwargs={}, **kwargs)[source]¶

RS Fitness Calculator with a metamodel to be trained on-the-fly

Parameters

metamodel (a Metamodel object that will be used to make the energy) – predictions
dataset (string) – the name of the .hdf5 file which will be used to write/read the training data
nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix
c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix
founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.
min_generation (int. Default 0) – use the meta-model only in the current generation is larger or equal than min_generation
train_kwargs (dict) – the keyword-argument values used to train the metamodel
kwargs (other keyword arguments necessary to initialize a PopulationEvolver) – instance

fitness_calc_obj¶: alias of clinamen.cmaes.fitness_calculators.MetaRSFitnessCalculator

class clinamen.cmaes.population_evolver.SSPopulationEvolver(nn_cutoff, c_r, founder, **kwargs)[source]¶

Add to the initial covariance matrix a rank-s matrix increasing the variance for coordinates representing atoms close to the point defect position

Parameters

nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix
c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix
founder (evpd.core.individual instance) – an Individual object representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.
kwargs (other keyword arguments necessary to initialize a PopulationEvolver) – instance

property atoms_within_cutoff¶: Indices of the atoms within the cutoff

property basis_coefficients¶: The distances of the atoms within the cutoff and the coefficients per atom which are used to add the rank-s matrix to the initial covariance matrix

property nn_cutoff¶: Cutoff selecting the NN to the defect which will form the basis for the selected subspace

property number_of_nn¶: The number of nearest neighbors atoms within self.nn_cutoff

property selected_subspace_basis¶: The basis spanning the selected subspace. Order with respect to the atomic distances from the defect

Objects describing the genotype of individuals and their populations¶

clinamen.evpd.core

The `evpd.core.individual` module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class clinamen.evpd.core.individual.Individual(*args, **kwargs)[source]¶

Class for representing an individual in a population.

calculate_comparison_distances(other)[source]¶

Calculates the relative distance and the maximum distance discrepancy between this individual and another one.

Parameters: other (Individual instance) –
Returns: rel_dist, max_dist – distance between the two instances
Return type: the relative and maximum discrepancy

calculation_required()[source]¶: Returns True if a new energy calculation is required

property chromosome¶: The chromosome of an individual is the set of displacements from an initial configuration. Usually one very similar to the pristine system

clone()[source]¶: Copy the instance, return a new instance

property cost¶: Value of the cost function of the individual (its energy)

property defect_position¶: Location of the eventual defect in the structure

property distances_from_defect¶: Distance of each atom in the system from the defect

property dmax¶: Tolerance for the maximum distance discrepancy between two individuals. This distance is defined as:

\[d_max(i, j) = max_k(|d_i(k) - d_j(k)|)\]

property dmin¶

Tolerance for the minimum distance at which two atoms can be located. Used to reject a structure where two atoms are too close.

Default value = 0.25 minimum bond length in the system

property drel¶: Tolerance for the relative distance between two individuals. The relative distance is defined as:

\[d_{rel}(i, j) = \frac{\sum_k |d_i(k) - d_j(k)|}{\sum_k d_i(k)}\]

property etol¶: Tolerance for comparing costs between two individuals

property fitness¶: Fitness value of the individual

get_forces(*args, **kwargs)[source]¶

Calculate atomic forces.

Ask the attached calculator to calculate the forces and apply constraints. Use apply_constraint=False to get the raw forces.

For molecular dynamics (md=True) we don’t apply the constraint to the forces but to the momenta. When holonomic constraints for rigid linear triatomic molecules are present, ask the constraints to redistribute the forces within each triple defined in the constraints (required for molecular dynamics with this type of constraints).

has_proper_structure()[source]¶: Returns False if any interatomic distance in the system is smaller than self.dmin.

static make_individual_from_ase_atoms(atoms)[source]¶

Takes an ase.Atoms instance and returns an evpd.core.Individual instance.

The result is analogous to using atoms.copy(), but also the calculator will be copied.

property metric_tensor¶: The metric tensor of the cell

property my_name¶: Identifier

optimize_structure(**kwargs)[source]¶

Optimize the structure

Parameters: kwargs (dict) – parameters for running the geometry optimization. A mandatory key is optimizer, which is an ase optimizer. The other key:value pairs are the parameters to supply to optimizer.

set_calculator_factory(calc_factory, calc_parameters)[source]¶

Parameters

calc_factory (a class derived from) – ase.calculators.interface.Calculator or a function generating a calculator
calc_parameters (dict) – keyword-value pairs used to initialize a calc_factory instance

property total_energy_calculations¶: Number of times the energy of the individual has been calculated

update_fitness()[source]¶: Calculate the total energy of the individual

write_poscar(path=None)[source]¶: If path is not None, it is the folder where the POSCAR file will be saved

The `evpd.core.population` module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

exception clinamen.evpd.core.population.BadPopulationMember[source]¶: Raise the exception when one tries to add to a Population instance an object which is not an Individual instance

class clinamen.evpd.core.population.Population(*individuals)[source]¶

A population is a group of individual of a given size.

Parameters: individuals (a list or tuple of individuals.) – the container of individuals forming the population. It can also be a single individual or empty.

property individuals_fitness¶: Return a list with the fitness of each individual

insert(index, value)[source]¶: S.insert(index, value) – insert value before index

Crystal structure fingerprint descriptors and utilities for general descriptors¶

clinamen.descriptors.descriptors_cython
clinamen.descriptors.utils

The `descriptors_cython` module¶

The `descriptors.utils` module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

clinamen.descriptors.utils.read_descriptors_by_id(file_name, ids)[source]¶

Given an id or a list thereof, it returns the eventual descriptors and Jacobians

Parameters

file_name (string) – the hdf5 file from where the descriptors should be fetched
ids (iterable) – the identity keys of the descriptors we want to fetch

Returns

X, DX indices – the descriptors and their Jacobians, each as a list, and a list representing the indices corresponding to the ids in ids that were found. If the Jacobians are not present, None is returned

Return type

tuple

clinamen.descriptors.utils.write_descriptors(file_name, descriptors, descriptors_grads, ids, name=None, flattened=True)[source]¶

Append new data to an existing dataset, if the dataset does not exist, create a new one

Parameters

file_name (string) – the dataset name
descriptors (2D array-like of shape (n, d), if flattened is True.) – n is the number of structures for which the descriptors were calculated. d is the dimensionality of the descriptors. If flattened is False, descriptors can be a multidimensional array of shape (n, …).
descriptors_grads (2D array-like of shape (n, r) if flattened is True.) – Otherwise, it can be a multidimensional array of shape (n, …). It can also be None. If not None, these are the (possibly flattened, if flattened is True) Jacobians of the descriptors.
name (string. Default None) – the system name. A tag that specifies the system when the dataset is created. If the dataset already exists, it checks the it corresponds to system name
ids (array-like of shape (n, )) – for each descriptor, is a string that identifies the structure coresponding to that descriptor

Utilities for the unsupervised classification of clusters¶

clinamen.clustering.misc
clinamen.clustering.stats_tools

The `clustering.misc` module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

clinamen.clustering.misc.calculate_k_distances(dataset, k, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None)[source]¶

Calculate and return the k-NN distances for each point in the data set. It uses sklearn.neighbors.NearestNeighbors, so look at the documentation for the parameters meaning.

Parameters

dataset (2D array) – the dataset
k (int) – the nearest neighbor number to consider

Returns

k_distances – the k-NN distances for each point in the dataset sorted in descending order.

Return type

1D array

clinamen.clustering.misc.find_centroids(data, labels)[source]¶

Finds the centroids locations of the clusters.

Parameters

data (2D array) – The dataset. Each row represents one structure
labels (1D array) – labels[i] is the index of the cluster where data[i] belongs. A label with value -1 is considered to represent noise. Its centroid will not be returned.

Returns

centroids – the key are the clusters indices, the values are the coordinates of the centroid

Return type

dict

clinamen.clustering.misc.get_structure_group_index(structure_name, groups)[source]¶

Given the name of a structure, returns the group index it belongs to.

Parameters

structure_name (string) – the name of the structure
groups (dict) – key:value pairs of cluster indices and a list of the name of the structures belonging to that cluster

Returns

key – the index of the cluster

Return type

int

clinamen.clustering.misc.group_structures_in_clusters(ordered_structures, cluster_labels)[source]¶

Group a list of structure names according to the cluster they belong to.

Parameters

ordered_structures (list) – Ordered list of structure names. The ordering is done by matching the dataset: the i-th element in the dataset is the structure corresponding to ordered_structures[i]
cluster_labels (list) – cluster_labels[i] is the label of the cluster where ordered_structures[i] belongs to.

Returns

groups – keys are cluster labels and the values are lists with the structures belonging to that cluster.

Return type

defaultdict

clinamen.clustering.misc.make_reachability_plot(optics_instance, x_lims=None)[source]¶

Make the reachability plot from a trained scikit learn OPTICS instance

Parameters

optics_instance (scikit learn OPTICS instance) – a trained instance
x_lims (tuple) – x limits to be plotted

clinamen.clustering.misc.plot_cluster_plot(data, labels, title, ordered_structures, plot_kwargs={}, cmap=<matplotlib.colors.LinearSegmentedColormap object>, show_names=True, plot_chull=False, plot_centroids=True, ax=None)[source]¶

Make a scatter plot of the clusters.

Parameters

data (2D array) – The dataset. Each row represents one structure
labels (1D array) – labels[i] is the index of the cluster where data[i] belongs. A label with value -1 is considered to represent noise. Its points are represented by crosses.
title (string) – the plot title
ordered_structures (list) – the i-th element is the structure name for data[i]
plot_kwargs (dict) – key:value pairs to fine-tune the plot
cmap (matplotlib cmap instance. Default cm.jet) – the colormap to be used in the plot
show_names (bool. Default True) – if True, the structure names will be shown in the plot
plot_chull (bool. Default False) – if True, plots also the convex hull of points in the cluster
plot_centroids (bool. Default True) – if True, the centroids of each cluster are also plotted
ax (matplotlib Axes instance or None. Default is None) – the axes for the plot. If None, the current axes is taken. (TODO)

clinamen.clustering.misc.write_clustered_structures(groups, key)[source]¶

Write on a text file all the structures belonging to a given cluster

Parameters

groups (dict) – key:value pairs of cluster indices and a list of the name of the structures belonging to that cluster
key (int) – the cluster index

Returns

fname – the name of the just-written text file

Return type

string

The `clustering.stats_tools` module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

clinamen.clustering.stats_tools.calculate_B_coefficient_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]¶

Calculates the Bhattacharyya coefficient between two normal distributions

Parameters

mean_1 (1D np.ndarray) – the mean vectors of the distributions
mean_2 (1D np.ndarray) – the mean vectors of the distributions
covariance_1 (2D np.ndarray) – the covariance matrices of the distributions
covariance_2 (2D np.ndarray) – the covariance matrices of the distributions

Returns

b_coeff – the Bhattacharyya coefficient

Return type

float

clinamen.clustering.stats_tools.calculate_B_distance_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]¶

Calculates the Bhattacharyya distance between two normal distributions.

Parameters

mean_1 (1D np.ndarray) – the mean vectors of the distributions
mean_2 (1D np.ndarray) – the mean vectors of the distributions
covariance_1 (2D np.ndarray) – the covariance matrices of the distributions
covariance_2 (2D np.ndarray) – the covariance matrices of the distributions

Returns

distance – the Bhattacharyya distance

Return type

float

clinamen.clustering.stats_tools.calculate_H_distance_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]¶

Calculate the Hellinger distance between two gaussians.

Parameters

mean_1 (1D np.ndarray) – the mean vectors of the distributions
mean_2 (1D np.ndarray) – the mean vectors of the distributions
covariance_1 (2D np.ndarray) – the covariance matrices of the distributions
covariance_2 (2D np.ndarray) – the covariance matrices of the distributions

Returns

distance – the Hellinger distance

Return type

float

clinamen.clustering.stats_tools.calculate_KL_divergence_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]¶

Calculate the Kullback-Leibler divergence between two Gaussians: KL(G1 || G2) = E_1[ln G1 - ln G2]

Parameters

mean_1 (1D np.ndarray) – the mean vectors of the distributions
mean_2 (1D np.ndarray) – the mean vectors of the distributions
covariance_1 (2D np.ndarray) – the covariance matrices of the distributions
covariance_2 (2D np.ndarray) – the covariance matrices of the distributions

Returns

divergence – the KL divergence

Return type

float

clinamen.clustering.stats_tools.integral_multivariate_standard_normal_rectangular_region(region)[source]¶

Compute the probability that a normal standard vector assumes values in a rectangular region

Parameters: region (tuple of 2-ple) – region = ((a_1, b_1), (a_2, b_2), … , (a_k, b_k)) the number of tuples gives the dimension of the random vector. Each 2-ple contains the initial and final integration limits on the considered direction
Returns: probability – the corresponding probability
Return type: float

clinamen.clustering.stats_tools.read_cmaes_status(json_status)[source]¶

Read and parse a CMAES status

Parameters: json_status (string) – the path to the json file describing the CMAES status
Returns: data – the dictionary with the retrieved data
Return type: dict

Objects representing the metamodel¶

clinamen.metamodel.metamodel

The `metamodel` module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class clinamen.metamodel.metamodel.BaseExactGPMetaModel(descriptors_database, preprocessing_pipeline=None, std_value=0.01)[source]¶: Basic Exact GP regressor with a RBF kernel for minimal initialization effort.

class clinamen.metamodel.metamodel.BasePCAExactGPMetaModel(descriptors_database, scaler_kwargs, pca_kwargs, std_value=0.01)[source]¶: Basic Exact GP regressor with a RBF kernel for minimal initialization effort. The inputs are automatically passed through a pipeline that scales them and then performs PCA

class clinamen.metamodel.metamodel.ExactGPMetaModel(descriptors_database, mean_function=None, mean_function_kwargs=None, kernel_function=None, kernel_function_kwargs=None, likelihood_function=None, likelihood_function_kwargs=None, optimizer=None, optimizer_kwargs=None, marginal_likelihood_function=None, marginal_likelihood_function_kwargs=None, preprocessing_pipeline=None, std_value=0.01)[source]¶

Class for making a meta-model based on a Gaussian Process Regressor with exact inference.

initialize_model(X_train, y_train)[source]¶

This function initializes the GP model (gpytorch.models) class, which means it initializes the mean function, kernel function, and the likelihood. The function also initializes the optimizer and the marginal log likelihood.

All these initialized objects must be assigned to the respectie attributes:

self._mean_function
self._kernel_function
self._likelihood
self._model
self._optimizer
self._mll

which can then be accessed through the corresponding property

class clinamen.metamodel.metamodel.GPMetaModel(descriptors_database, mean_function=None, mean_function_kwargs=None, kernel_function=None, kernel_function_kwargs=None, likelihood_function=None, likelihood_function_kwargs=None, optimizer=None, optimizer_kwargs=None, marginal_likelihood_function=None, marginal_likelihood_function_kwargs=None, preprocessing_pipeline=None, std_value=0.01)[source]¶

Class for creating a Gaussian Process metamodel.

fit(structures, y, epochs=10000, stopping=0.001, stopping_epochs=10, verbose=False)[source]¶

Train the meta-model

Parameters

population (Iterable of structures of length n_samples.) –
y (np.ndarray of shape (n_samples, )) – the total energy of the structures in structures
epochs (int) – the number of epochs for training the metamodel
stopping (float) – the loss function minimum change to trigger early stopping
stopping_epochs (float) – for how many epochs the loss function should change by less than stopping in order to enforce early stopping
verbose (bool) – If True, prints the loss function every 100 epochs

abstract initialize_model(X_train, y_train)[source]¶

This function initializes the GP model (gpytorch.models) class, which means it initializes the mean function, kernel function, and the likelihood. The function also initializes the optimizer and the marginal log likelihood.

All these initialized objects must be assigned to the respectie attributes:

self._mean_function
self._kernel_function
self._likelihood
self._model
self._optimizer
self._mll

which can then be accessed through the corresponding property

property loaded_state¶: If True, it means that the state of the model has been loaded from an external file

predict(structures)[source]¶

Predict the total energy for each individuals in an iterable of structures.

Parameters

population (Iterable of structures of length n_samples.) –
Returns –
-------- –
mean (np.array) – the predicted energies
std (np.array) – the predicted standard deviations

read_descriptors()[source]¶: Read the descriptors from the hdf5 database file.

save_model(filename='model_state.pth')[source]¶: Save the GP model

write_descriptors(structures)[source]¶: Save the descriptors into the database hdf5 file.

API reference¶

Stand-alone implementation of the CMA-ES¶

The evolution module¶

The fitness_calculators module¶

The population_evolver module¶

Objects describing the genotype of individuals and their populations¶

The evpd.core.individual module¶

The evpd.core.population module¶

Crystal structure fingerprint descriptors and utilities for general descriptors¶

The descriptors_cython module¶

The descriptors.utils module¶

Utilities for the unsupervised classification of clusters¶

The clustering.misc module¶

The clustering.stats_tools module¶

Objects representing the metamodel¶

The metamodel module¶

The `evolution` module¶

The `fitness_calculators` module¶

The `population_evolver` module¶

The `evpd.core.individual` module¶

The `evpd.core.population` module¶

The `descriptors_cython` module¶

The `descriptors.utils` module¶

The `clustering.misc` module¶

The `clustering.stats_tools` module¶

The `metamodel` module¶