API reference¶
Copyright 2020 Marco Arrigoni
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Stand-alone implementation of the CMA-ES¶
The evolution module¶
Copyright 2020 Marco Arrigoni
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class
clinamen.cmaes.evolution.CMAES(strategy_params, mean=None, covariance=None, step_size=None, random_seed=10, terminator=None)[source]¶ Implementation of the covariance matrix adaptation evolution strategy.
- Parameters
strategy_params (StrategyParameters object) – contains the initial strategy parameters
mean (1D NumPy array) – the mean vector. If None, is taken as the zero vector
covariance (2D NumPy array) – the covariance matrix. If None, is taken as the identity matrix
step_size (float) – the global variance. If None, is taken as 1
random_seed (int) – the random seed for the random number generator
terminator (TerminationCriteria instance) – object that keeps track of termination criteria. If None, a default one will be used.
Notes
If given,
meanmust have shape (strategy_params.n, ) and covariance (strategy_params.n,strategy_params.n)-
property
C¶ The covariance matrix for the current generation
-
property
StrategyParameters¶ The current instance of
StrategyParameters
-
property
Terminator¶ The current instance of
TerminationCriteria
-
evolve(manual_mutation=False)[source]¶ Generator for the evolutionary process.
- Parameters
manual_mutation (bool) – When True, mutated individuals must be inserted manually using the method
set_mutated_offspring- Yields
a dictionary with the relevant parameters for the current generation
-
property
g¶ The index of the current generation
-
classmethod
load_status(json_status)[source]¶ From a CMAES status saved as json, returns a CMAES instance initialized with the information contained in
json_status
-
property
m¶ The mean vector for the current generation
-
property
mutated_offspring¶ The offpsring object parameters obtained after the mutation
-
property
offspring¶ The offspring object parameters in the current generation
-
property
pop_size¶ The population size
-
property
random_seed¶ The user random seed
-
save_status(path=None)[source]¶ Save a json file with the data representing the current status of the evolution. Only data needed to restart a CMAES object are saved.
- Parameters
path (str or None) – if not None, is the path to the folder where the .json file will be written
-
set_fitness_calculator(calculator)[source]¶ Set a fitness calculator: any object that can take object parameters representing the individuals (one row = one individual) and implements a method that calculates the fitness of individuals.
Basic interface of this fitness calculator:
A method called
set_object_parameterswhich accepts a 2D NumPy array of shape (self.pop_size,self.dimension)A method called
get_fitnessthat returns an array of shape (self.pop_size, ) with the calculated fitness for each individualEventally, a method called
get_gradientsthat returns an array of shape (self.pop_size,self.dimension) with the calculated gradients
- Parameters
calculator (a fitness calculator instance) –
-
set_mutated_offspring(x)[source]¶ When manual mutation is selected, this method must be used to insert the mutated individuals
- Parameters
x (2D NumPy array of shape (
self.pop_size,self.dimension)) – the object parameters of the mutated individuals
-
property
step_size¶ The step size for the current generation
-
class
clinamen.cmaes.evolution.GpCMAES(*args, **kwargs)[source]¶ -
-
property
gradient_coefficient¶ The parameter controlling the gradient relevance
-
property
-
class
clinamen.cmaes.evolution.StrategyParameters(dimension, pop_size=None, weights=None, c_sigma=None, d_sigma=None, c_c=None, c_1=None, c_mu=None, alpha_cov=None, c_m=None, std_min=None, c_g=None)[source]¶ Class for the initialization, update and tracking of the CMA-ES algorithm.
- Parameters
dimension (int) – dimensionality of the problem.
pop_size (int or None) – population size
weights (tuple with
pop_sizeentries or None) – weights used in the algorithmc_sigma (float in (0, 1) or None) – learning rate for the conjugate evolution path used for step-size control
d_sigma (float > 0 or None) – damping term
c_c (float in [0, 1] or None) – learning rate for the evolution path used in the cumulation procedure
c_1 (float in [0, 1] or None) – learning rate for the rank-1 update of the covariance matrix
c_mu (float in [0, 1] or None) – learning rate for the rank-mu update of the covariance matrix
alpha_cov (float or None) – parameter for calculating default values of the learning rates
c_m (float or None) – learning rate for updating the mean. Generally 1, usually <= 1
std_min (float or None) – increase the global step size if the std of the individuals fitness is below this value
c_g (float or None) – learning rate for the evolution path of the gradient It is used only when the CMAES instance supports gradient usage
Notes
If some parameters are
None, default values will be used. It is suggested to leave all the parameters to their default value, with the exception ofpop_sizeandalpha_covat most.
-
class
clinamen.cmaes.evolution.TerminationCriteria(noeffectaxis=True, noeffectcoord=True, conditioncov=True, equalfunvalues=True, maxiter=1000, tolxup=True, smallstd=1e-15)[source]¶ A class for holding the various termination criteria suggested for the algorithm. If a value is set to None/False, the corresponding criterium will be ingored.
- Parameters
noeffectaxis (bool) –
noeffectcoord (bool) –
conditioncov (bool) –
equalfunvalues (bool) –
maxiter (int) – maximum number of iterations
tolxup (bool) –
smallstd (float) – stop if the fitness std remains below
smallstdfor at least 15 iterations.
The fitness_calculators module¶
Copyright 2020 Marco Arrigoni
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class
clinamen.cmaes.fitness_calculators.FitnessCalculator(population)[source]¶ Fitness calculator for Population objects
- Parameters
population (Population instance) – the current individual population
-
class
clinamen.cmaes.fitness_calculators.MetaRSFitnessCalculator(population, atoms_within_cutoff, metamodel, data='Xy.hdf5', min_generation=0, train_kwargs={})[source]¶ Calculator for fitness function using a surrogate fitness metamodel. The metamodel is used exclusively for predicting the total energy. Forces are saved in the training dataset, but are not used for train and prediction
- Parameters
population (Population instance) – the current individual population
atoms_within_cutoff (list on integers) – the indices of the atoms within the cutoff forming the restricted subspace
metamodel (
MetaModelinstance) – the fitness surrogatedata (string) – the name of the file which is used to save/read the training data. It will be named
data.hdf5min_generation (int. Default 0) – use the meta-model only in the current generation is larger or equal than
min_generationtrain_kwargs (dict) – the keyword-argument pairs to train the metamodel
-
class
clinamen.cmaes.fitness_calculators.RSFitnessCalculator(population, atoms_within_cutoff)[source]¶ Fitness calculator for Population objects on a restricted subspace.
- Parameters
population (Population instance) – the current individual population
atoms_within_cutoff (list on integers) – the indices of the atoms within the cutoff forming the restricted subspace
-
class
clinamen.cmaes.fitness_calculators.RSFitnessGradientCalculator(population, atoms_within_cutoff)[source]¶
-
clinamen.cmaes.fitness_calculators.write_train_hdf5(file_name, population, energies, forces, name=None)[source]¶ Append new data to an existing dataset, if the dataset does not exist, create a new one
- Parameters
file_name (string) – the dataset name
population (Population instance) – the new individuals to be added to the dataset
energies (array-like of shape (n_individuals, )) – the energies of the individuals in
populationforces (2D array-like of shape (n_individuals, 3*no_atoms)) – the forces of the individuals in
populationname (string. Default None) – the system name. A tag that specifies the system when the dataset is created. If the dataset already exists, it checks the it corresponds to system
name
The population_evolver module¶
Copyright 2020 Marco Arrigoni
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class
clinamen.cmaes.population_evolver.AnalizeRun(evolver)[source]¶ Helper class for analyzing the evolution of the population
- Parameters
Evolver (
PopulationEvolverderived instance) – the evolver to be analyzed. If Evolver is set to None, then the class can be used to analyze an already existing simulation dataframe (set with the methodload_dataframe.
-
evolve()[source]¶ Evolve generation-by-generation
- Yields
dataframe (pandas DataFrame updated to the) – current generation
-
initialize()[source]¶ Initialize the evolution
- Returns
dataframe – generation 0
- Return type
pandas DataFrame with the elements for
-
load_dataframe(df)[source]¶ Use this method to analyze a proper simulation dataframe without need of running the whole evolutionary process
- Parameters
df (pandas DataFrame) –
-
static
plot_data_vs_generation(df, keys, other_keys=None, samples=[1], serrors=[0], alpha=0.05, **kwargs)[source]¶ Plot the evolution of
keyand eventuallyother_keys, indfwith respect to the generation number.- Parameters
df (pandas data frame) – it must contains at least 2 columns: ‘generation’, with the generation number, and
key.keys (list of strings) – the column labels in
dfto be plotted. If this is an averaged value, the sample stds are given byseother_keys (None or list of strings) – the eventual other column labels to be plotted
samples (list of int) – if one of
keysis an average, it is its sample size.serrors (list of float) – if one of
keysis an average, it is its sample std. This will be used to calculate the confidence intervalsalpha (float in (0, 1)) – defines the wished (1-alpha)*100% confidence interval
kwargs (dictionary) – keyword-value pairs for tuning the plot parameters see documentation of
pandas.DataFrame.plot.line
-
class
clinamen.cmaes.population_evolver.GpPopulationEvolver(c_alpha, nn_cutoff, c_r, founder, **kwargs)[source]¶ Exploit the gradient during the run.
- Parameters
c_alpha (float) – coefficient describing the relevance of the gradient term
nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix
c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix
founder (evpd.core.individual instance) – an
Individualobject representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.kwargs (other keyword arguments necessary to initialize a
PopulationEvolver) – instance
-
cmaes_obj¶ alias of
clinamen.cmaes.evolution.GpCMAES
-
fitness_calc_obj¶ alias of
clinamen.cmaes.fitness_calculators.FitnessGradientCalculator
-
class
clinamen.cmaes.population_evolver.PopulationEvolver(founder, step_size=0.2, covariance=None, dmin=None, random_seed=10)[source]¶ Evolves a
Populationinstance using the CMA-ES algorithm- Parameters
founder (
evpd.core.individualinstance) – anIndividualobject representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.step_size (float > 0) – initial step size used in the CMA-ES algorithm default 0.2 Angstrom
covariance (None or float or 1D array or 2D array) – the initial covariance matrix. Default is the identity matrix. If
covarianceis a float, then the matrix is diagonal with that value on the diagonal. If it is 1D array, it is still diagonal with that array on the diagonal.dmin (float) – minimum distance between two atoms to consider an individual to be valid. Default None (0.5 of the minimum bond distance)
random_seed (int) – random seed to be used for generating random variates. Default to 10
-
cmaes_obj¶ alias of
clinamen.cmaes.evolution.CMAES
-
property
cmaes_parameters¶ Return a dictionary with the objects needed to initialize the instance’s CMAES object
-
evolve_population()[source]¶ Evolve the current population. Returns a generator with the relevant parameters of the current generation.
-
fitness_calc_obj¶ alias of
clinamen.cmaes.fitness_calculators.FitnessCalculator
-
get_object_parameters()[source]¶ Returns the object parameters as a NumPy 2D array of shape (N, d), where N is the number of individuals in the population and d is the search space dimension
-
property
population¶ The current
Populationinstance
-
save_population(generation)[source]¶ Append the current population to
self.evolution_historyfile- Parameters
generation (int) – the index of the current generation. Used to create a corresponding new group in the hdf5 file
-
set_cmaes(cmaes)[source]¶ Set the a custom
CMAESobject to overwrite the default one.- Parameters
cmaes (instance of
CMAES) –
-
class
clinamen.cmaes.population_evolver.RSPopulationEvolver(nn_cutoff, c_r, founder, **kwargs)[source]¶ Restricted-subspace population evolver: only the genotype for atoms inside a cutoff radius is considered
- Parameters
nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix
c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix
founder (evpd.core.individual instance) – an
Individualobject representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.kwargs (other keyword arguments necessary to initialize a
PopulationEvolver) – instance
Notes
Similar to
SSPopulationEvolverbut only the atoms within the cutoff are moved.-
fitness_calc_obj¶ alias of
clinamen.cmaes.fitness_calculators.RSFitnessCalculator
-
get_object_parameters()[source]¶ Returns the object parameters as a NumPy 2D array of shape (N, d), where N is the number of individuals in the population and d is the dimension of the restricted subspace
-
property
use_reduced_population_size¶ Bool, if True, choose automatically the population size as based on the dimension of the restricted subspace. If False, uses the population size given by the
StrategyParametersinstance given at initialization. Default False.
-
class
clinamen.cmaes.population_evolver.RSPopulationEvolverGrad(c_alpha, nn_cutoff, c_r, founder, **kwargs)[source]¶ - Parameters
c_alpha (float) – coefficient describing the relevance of the gradient term
nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix
c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix
founder (evpd.core.individual instance) – an
Individualobject representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.kwargs (other keyword arguments necessary to initialize a
PopulationEvolver) – instance
Notes
Similar to
RSPopulationEvolverbut gradients are also used. are moved.-
cmaes_obj¶ alias of
clinamen.cmaes.evolution.GpCMAES
-
fitness_calc_obj¶ alias of
clinamen.cmaes.fitness_calculators.RSFitnessGradientCalculator
-
class
clinamen.cmaes.population_evolver.RSPopulationEvolverMetamodel(metamodel, dataset, nn_cutoff, c_r, founder, min_generation=0, train_kwargs={}, **kwargs)[source]¶ RS Fitness Calculator with a metamodel to be trained on-the-fly
- Parameters
metamodel (a
Metamodelobject that will be used to make the energy) – predictionsdataset (string) – the name of the .hdf5 file which will be used to write/read the training data
nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix
c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix
founder (evpd.core.individual instance) – an
Individualobject representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.min_generation (int. Default 0) – use the meta-model only in the current generation is larger or equal than
min_generationtrain_kwargs (dict) – the keyword-argument values used to train the metamodel
kwargs (other keyword arguments necessary to initialize a
PopulationEvolver) – instance
-
fitness_calc_obj¶ alias of
clinamen.cmaes.fitness_calculators.MetaRSFitnessCalculator
-
class
clinamen.cmaes.population_evolver.SSPopulationEvolver(nn_cutoff, c_r, founder, **kwargs)[source]¶ Add to the initial covariance matrix a rank-s matrix increasing the variance for coordinates representing atoms close to the point defect position
- Parameters
nn_cutoff (float) – cutoff radius, in Angstrom, including the atoms used to build the rank-s matrix
c_r (float) – coefficient to control the contribution of the rank-s matrix to the initial covariance matrix
founder (evpd.core.individual instance) – an
Individualobject representing the initial individual. The mean of the population is taken as the atomic position of this individual. It should ideally be an atomic configuration not too far from the global minimum in the PES. The founder must have a calculator set.kwargs (other keyword arguments necessary to initialize a
PopulationEvolver) – instance
-
property
atoms_within_cutoff¶ Indices of the atoms within the cutoff
-
property
basis_coefficients¶ The distances of the atoms within the cutoff and the coefficients per atom which are used to add the rank-s matrix to the initial covariance matrix
-
property
nn_cutoff¶ Cutoff selecting the NN to the defect which will form the basis for the selected subspace
-
property
number_of_nn¶ The number of nearest neighbors atoms within
self.nn_cutoff
-
property
selected_subspace_basis¶ The basis spanning the selected subspace. Order with respect to the atomic distances from the defect
Objects describing the genotype of individuals and their populations¶
clinamen.evpd.core
The evpd.core.individual module¶
Copyright 2020 Marco Arrigoni
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class
clinamen.evpd.core.individual.Individual(*args, **kwargs)[source]¶ Class for representing an individual in a population.
-
calculate_comparison_distances(other)[source]¶ Calculates the relative distance and the maximum distance discrepancy between this individual and another one.
- Parameters
other (Individual instance) –
- Returns
rel_dist, max_dist – distance between the two instances
- Return type
the relative and maximum discrepancy
-
property
chromosome¶ The chromosome of an individual is the set of displacements from an initial configuration. Usually one very similar to the pristine system
-
property
cost¶ Value of the cost function of the individual (its energy)
-
property
defect_position¶ Location of the eventual defect in the structure
-
property
distances_from_defect¶ Distance of each atom in the system from the defect
-
property
dmax¶ Tolerance for the maximum distance discrepancy between two individuals. This distance is defined as:
\[d_max(i, j) = max_k(|d_i(k) - d_j(k)|)\]
-
property
dmin¶ Tolerance for the minimum distance at which two atoms can be located. Used to reject a structure where two atoms are too close.
Default value = 0.25 minimum bond length in the system
-
property
drel¶ Tolerance for the relative distance between two individuals. The relative distance is defined as:
\[d_{rel}(i, j) = \frac{\sum_k |d_i(k) - d_j(k)|}{\sum_k d_i(k)}\]
-
property
etol¶ Tolerance for comparing costs between two individuals
-
property
fitness¶ Fitness value of the individual
-
get_forces(*args, **kwargs)[source]¶ Calculate atomic forces.
Ask the attached calculator to calculate the forces and apply constraints. Use apply_constraint=False to get the raw forces.
For molecular dynamics (md=True) we don’t apply the constraint to the forces but to the momenta. When holonomic constraints for rigid linear triatomic molecules are present, ask the constraints to redistribute the forces within each triple defined in the constraints (required for molecular dynamics with this type of constraints).
-
has_proper_structure()[source]¶ Returns False if any interatomic distance in the system is smaller than
self.dmin.
-
static
make_individual_from_ase_atoms(atoms)[source]¶ Takes an ase.Atoms instance and returns an evpd.core.Individual instance.
The result is analogous to using
atoms.copy(), but also the calculator will be copied.
-
property
metric_tensor¶ The metric tensor of the cell
-
property
my_name¶ Identifier
-
optimize_structure(**kwargs)[source]¶ Optimize the structure
- Parameters
kwargs (dict) – parameters for running the geometry optimization. A mandatory key is optimizer, which is an ase optimizer. The other key:value pairs are the parameters to supply to optimizer.
-
set_calculator_factory(calc_factory, calc_parameters)[source]¶ - Parameters
calc_factory (a class derived from) –
ase.calculators.interface.Calculatoror a function generating a calculatorcalc_parameters (dict) – keyword-value pairs used to initialize a
calc_factoryinstance
-
property
total_energy_calculations¶ Number of times the energy of the individual has been calculated
-
The evpd.core.population module¶
Copyright 2020 Marco Arrigoni
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
exception
clinamen.evpd.core.population.BadPopulationMember[source]¶ Raise the exception when one tries to add to a
Populationinstance an object which is not anIndividualinstance
-
class
clinamen.evpd.core.population.Population(*individuals)[source]¶ A population is a group of individual of a given size.
- Parameters
individuals (a list or tuple of individuals.) – the container of individuals forming the population. It can also be a single individual or empty.
-
property
individuals_fitness¶ Return a list with the fitness of each individual
Crystal structure fingerprint descriptors and utilities for general descriptors¶
clinamen.descriptors.descriptors_cython
The descriptors_cython module¶
The descriptors.utils module¶
Copyright 2020 Marco Arrigoni
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
clinamen.descriptors.utils.read_descriptors_by_id(file_name, ids)[source]¶ Given an id or a list thereof, it returns the eventual descriptors and Jacobians
- Parameters
file_name (string) – the hdf5 file from where the descriptors should be fetched
ids (iterable) – the identity keys of the descriptors we want to fetch
- Returns
X, DX indices – the descriptors and their Jacobians, each as a list, and a list representing the indices corresponding to the ids in
idsthat were found. If the Jacobians are not present, None is returned- Return type
tuple
-
clinamen.descriptors.utils.write_descriptors(file_name, descriptors, descriptors_grads, ids, name=None, flattened=True)[source]¶ Append new data to an existing dataset, if the dataset does not exist, create a new one
- Parameters
file_name (string) – the dataset name
descriptors (2D array-like of shape (n, d), if
flattenedisTrue.) – n is the number of structures for which the descriptors were calculated. d is the dimensionality of the descriptors. IfflattenedisFalse, descriptors can be a multidimensional array of shape (n, …).descriptors_grads (2D array-like of shape (n, r) if
flattenedisTrue.) – Otherwise, it can be a multidimensional array of shape (n, …). It can also beNone. If not None, these are the (possibly flattened, ifflattenedisTrue) Jacobians of the descriptors.name (string. Default None) – the system name. A tag that specifies the system when the dataset is created. If the dataset already exists, it checks the it corresponds to system
nameids (array-like of shape (n, )) – for each descriptor, is a string that identifies the structure coresponding to that descriptor
Utilities for the unsupervised classification of clusters¶
The clustering.misc module¶
Copyright 2020 Marco Arrigoni
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
clinamen.clustering.misc.calculate_k_distances(dataset, k, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None)[source]¶ Calculate and return the k-NN distances for each point in the data set. It uses sklearn.neighbors.NearestNeighbors, so look at the documentation for the parameters meaning.
- Parameters
dataset (2D array) – the dataset
k (int) – the nearest neighbor number to consider
- Returns
k_distances – the k-NN distances for each point in the dataset sorted in descending order.
- Return type
1D array
-
clinamen.clustering.misc.find_centroids(data, labels)[source]¶ Finds the centroids locations of the clusters.
- Parameters
data (2D array) – The dataset. Each row represents one structure
labels (1D array) –
labels[i]is the index of the cluster wheredata[i]belongs. A label with value -1 is considered to represent noise. Its centroid will not be returned.
- Returns
centroids – the key are the clusters indices, the values are the coordinates of the centroid
- Return type
dict
-
clinamen.clustering.misc.get_structure_group_index(structure_name, groups)[source]¶ Given the name of a structure, returns the group index it belongs to.
- Parameters
structure_name (string) – the name of the structure
groups (dict) – key:value pairs of cluster indices and a list of the name of the structures belonging to that cluster
- Returns
key – the index of the cluster
- Return type
int
-
clinamen.clustering.misc.group_structures_in_clusters(ordered_structures, cluster_labels)[source]¶ Group a list of structure names according to the cluster they belong to.
- Parameters
ordered_structures (list) – Ordered list of structure names. The ordering is done by matching the dataset: the i-th element in the dataset is the structure corresponding to
ordered_structures[i]cluster_labels (list) –
cluster_labels[i]is the label of the cluster whereordered_structures[i]belongs to.
- Returns
groups – keys are cluster labels and the values are lists with the structures belonging to that cluster.
- Return type
defaultdict
-
clinamen.clustering.misc.make_reachability_plot(optics_instance, x_lims=None)[source]¶ Make the reachability plot from a trained scikit learn OPTICS instance
- Parameters
optics_instance (scikit learn OPTICS instance) – a trained instance
x_lims (tuple) – x limits to be plotted
-
clinamen.clustering.misc.plot_cluster_plot(data, labels, title, ordered_structures, plot_kwargs={}, cmap=<matplotlib.colors.LinearSegmentedColormap object>, show_names=True, plot_chull=False, plot_centroids=True, ax=None)[source]¶ Make a scatter plot of the clusters.
- Parameters
data (2D array) – The dataset. Each row represents one structure
labels (1D array) –
labels[i]is the index of the cluster wheredata[i]belongs. A label with value -1 is considered to represent noise. Its points are represented by crosses.title (string) – the plot title
ordered_structures (list) – the i-th element is the structure name for
data[i]plot_kwargs (dict) – key:value pairs to fine-tune the plot
cmap (matplotlib cmap instance. Default cm.jet) – the colormap to be used in the plot
show_names (bool. Default True) – if True, the structure names will be shown in the plot
plot_chull (bool. Default False) – if True, plots also the convex hull of points in the cluster
plot_centroids (bool. Default True) – if True, the centroids of each cluster are also plotted
ax (matplotlib Axes instance or None. Default is None) – the axes for the plot. If None, the current axes is taken. (TODO)
-
clinamen.clustering.misc.write_clustered_structures(groups, key)[source]¶ Write on a text file all the structures belonging to a given cluster
- Parameters
groups (dict) – key:value pairs of cluster indices and a list of the name of the structures belonging to that cluster
key (int) – the cluster index
- Returns
fname – the name of the just-written text file
- Return type
string
The clustering.stats_tools module¶
Copyright 2020 Marco Arrigoni
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
clinamen.clustering.stats_tools.calculate_B_coefficient_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]¶ Calculates the Bhattacharyya coefficient between two normal distributions
- Parameters
mean_1 (1D np.ndarray) – the mean vectors of the distributions
mean_2 (1D np.ndarray) – the mean vectors of the distributions
covariance_1 (2D np.ndarray) – the covariance matrices of the distributions
covariance_2 (2D np.ndarray) – the covariance matrices of the distributions
- Returns
b_coeff – the Bhattacharyya coefficient
- Return type
float
-
clinamen.clustering.stats_tools.calculate_B_distance_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]¶ Calculates the Bhattacharyya distance between two normal distributions.
- Parameters
mean_1 (1D np.ndarray) – the mean vectors of the distributions
mean_2 (1D np.ndarray) – the mean vectors of the distributions
covariance_1 (2D np.ndarray) – the covariance matrices of the distributions
covariance_2 (2D np.ndarray) – the covariance matrices of the distributions
- Returns
distance – the Bhattacharyya distance
- Return type
float
-
clinamen.clustering.stats_tools.calculate_H_distance_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]¶ Calculate the Hellinger distance between two gaussians.
- Parameters
mean_1 (1D np.ndarray) – the mean vectors of the distributions
mean_2 (1D np.ndarray) – the mean vectors of the distributions
covariance_1 (2D np.ndarray) – the covariance matrices of the distributions
covariance_2 (2D np.ndarray) – the covariance matrices of the distributions
- Returns
distance – the Hellinger distance
- Return type
float
-
clinamen.clustering.stats_tools.calculate_KL_divergence_gaussians(mean_1, mean_2, covariance_1, covariance_2)[source]¶ Calculate the Kullback-Leibler divergence between two Gaussians: KL(G1 || G2) = E_1[ln G1 - ln G2]
- Parameters
mean_1 (1D np.ndarray) – the mean vectors of the distributions
mean_2 (1D np.ndarray) – the mean vectors of the distributions
covariance_1 (2D np.ndarray) – the covariance matrices of the distributions
covariance_2 (2D np.ndarray) – the covariance matrices of the distributions
- Returns
divergence – the KL divergence
- Return type
float
-
clinamen.clustering.stats_tools.integral_multivariate_standard_normal_rectangular_region(region)[source]¶ Compute the probability that a normal standard vector assumes values in a rectangular region
- Parameters
region (tuple of 2-ple) – region = ((a_1, b_1), (a_2, b_2), … , (a_k, b_k)) the number of tuples gives the dimension of the random vector. Each 2-ple contains the initial and final integration limits on the considered direction
- Returns
probability – the corresponding probability
- Return type
float
Objects representing the metamodel¶
The metamodel module¶
Copyright 2020 Marco Arrigoni
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
class
clinamen.metamodel.metamodel.BaseExactGPMetaModel(descriptors_database, preprocessing_pipeline=None, std_value=0.01)[source]¶ Basic Exact GP regressor with a RBF kernel for minimal initialization effort.
-
class
clinamen.metamodel.metamodel.BasePCAExactGPMetaModel(descriptors_database, scaler_kwargs, pca_kwargs, std_value=0.01)[source]¶ Basic Exact GP regressor with a RBF kernel for minimal initialization effort. The inputs are automatically passed through a pipeline that scales them and then performs PCA
-
class
clinamen.metamodel.metamodel.ExactGPMetaModel(descriptors_database, mean_function=None, mean_function_kwargs=None, kernel_function=None, kernel_function_kwargs=None, likelihood_function=None, likelihood_function_kwargs=None, optimizer=None, optimizer_kwargs=None, marginal_likelihood_function=None, marginal_likelihood_function_kwargs=None, preprocessing_pipeline=None, std_value=0.01)[source]¶ Class for making a meta-model based on a Gaussian Process Regressor with exact inference.
-
initialize_model(X_train, y_train)[source]¶ This function initializes the GP model (
gpytorch.models) class, which means it initializes the mean function, kernel function, and the likelihood. The function also initializes the optimizer and the marginal log likelihood.All these initialized objects must be assigned to the respectie attributes:
self._mean_functionself._kernel_functionself._likelihoodself._modelself._optimizerself._mll
which can then be accessed through the corresponding property
-
-
class
clinamen.metamodel.metamodel.GPMetaModel(descriptors_database, mean_function=None, mean_function_kwargs=None, kernel_function=None, kernel_function_kwargs=None, likelihood_function=None, likelihood_function_kwargs=None, optimizer=None, optimizer_kwargs=None, marginal_likelihood_function=None, marginal_likelihood_function_kwargs=None, preprocessing_pipeline=None, std_value=0.01)[source]¶ Class for creating a Gaussian Process metamodel.
-
fit(structures, y, epochs=10000, stopping=0.001, stopping_epochs=10, verbose=False)[source]¶ Train the meta-model
- Parameters
population (Iterable of structures of length n_samples.) –
y (
np.ndarrayof shape (n_samples, )) – the total energy of the structures instructuresepochs (int) – the number of epochs for training the metamodel
stopping (float) – the loss function minimum change to trigger early stopping
stopping_epochs (float) – for how many epochs the loss function should change by less than
stoppingin order to enforce early stoppingverbose (bool) – If True, prints the loss function every 100 epochs
-
abstract
initialize_model(X_train, y_train)[source]¶ This function initializes the GP model (
gpytorch.models) class, which means it initializes the mean function, kernel function, and the likelihood. The function also initializes the optimizer and the marginal log likelihood.All these initialized objects must be assigned to the respectie attributes:
self._mean_functionself._kernel_functionself._likelihoodself._modelself._optimizerself._mll
which can then be accessed through the corresponding property
-
property
loaded_state¶ If True, it means that the state of the model has been loaded from an external file
-