Code
nablachem.alchemy
nablachem.alchemy.Monomial
A single monomial in the multi-dimensional Taylor expansion.
nablachem.alchemy.Monomial.__init__(prefactor, powers={})
Define the monomial.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefactor |
float
|
Weight or coefficient of the monomial. |
required |
powers |
dict[str, int]
|
Involved variables as keys and the exponent as value, by default {}. |
{}
|
nablachem.alchemy.Monomial.distance(pos, center)
Evaluate the distance term of the Taylor expansion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pos |
dict[str, float]
|
The position at which the Monomial is evaluated. Keys are the variable names, values are the positions. |
required |
center |
dict[str, float]
|
The center of the Taylor expansion. Keys are the variable names, values are the positions. |
required |
Returns:
Type | Description |
---|---|
float
|
Distance |
nablachem.alchemy.Monomial.prefactor()
Calculates the Taylor expansion prefactor.
Returns:
Type | Description |
---|---|
float
|
Prefactor for the summation in the Taylor expansion. |
nablachem.alchemy.MultiTaylor
Multi-dimensional multi-variate arbitrary order Taylor expansion from any evenly spaced finite difference stencil.
Examples:
>>> import pandas as pd
>>> df = pd.read_csv("some_file.csv")
>>> df.columns
Index(['RX', 'RY', 'RZ', 'QX', 'QY', 'QZ', 'E', 'BETA1', 'BETA2',
'SIGMA'],
dtype='object')
>>> mt = MultiTaylor(df, outputs="BETA1 BETA2 SIGMA".split())
>>> spatial_center, electronic_center = 3, 2.5
>>> mt.reset_center(
RX=spatial_center,
RY=spatial_center,
RZ=spatial_center,
QX=electronic_center,
QY=electronic_center,
QZ=electronic_center,
)
>>> mt.reset_filter(E=4)
>>> mt.build_model(2)
>>> mt.query(RX=3.1, RY=3.1, RZ=3.1, QX=2.4, QY=2.4, QZ=2.4)
{'BETA1': 0.022412699999999976,
'BETA2': 0.014047600000000134,
'SIGMA': 0.0018744333333333316}
nablachem.alchemy.MultiTaylor.__init__(dataframe, outputs)
Initialize the Taylor expansion from a dataframe of data points forming the superset of stencils.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe |
DataFrame
|
Holds all data points available for the vicinity of the future center of the expansion. |
required |
outputs |
list[str]
|
Those columns of the dataframe that are considered to be outputs rather than input coordinates. |
required |
nablachem.alchemy.MultiTaylor.build_model(orders, additional_terms=[])
Sets up the model for a specific expansion order or list of terms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
orders |
int
|
All terms are included in the expansion up to this order. |
required |
additional_terms |
list[tuple[str]]
|
The terms to ADDITIONALLY include, i.e. list of tuples of column names. To only include d/dx, give [('x',)] To only include d^2/dx^2, give [('x', 'x')] To only include d^2/dxdy, give [('x', 'y')] To include all three, give [('x',), ('x', 'x'), ('x', 'y')] |
[]
|
Raises:
Type | Description |
---|---|
NotImplementedError
|
Center needs to be given in dataframe. |
ValueError
|
Center is not unique. |
ValueError
|
Duplicate points in the dataset. |
ValueError
|
Invalid column names for additonal terms. |
nablachem.alchemy.MultiTaylor.maximize(target, bounds)
See _optimize.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target |
str
|
Column name to maximize. |
required |
bounds |
dict[str, tuple[float, float]]
|
Bounds for the search space. |
required |
Returns:
Type | Description |
---|---|
dict[str, float]
|
Optimal position found. |
nablachem.alchemy.MultiTaylor.minimize(target, bounds)
See _optimize.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target |
str
|
Column name to minimize. |
required |
bounds |
dict[str, tuple[float, float]]
|
Bounds for the search space. |
required |
Returns:
Type | Description |
---|---|
dict[str, float]
|
Optimal position found. |
nablachem.alchemy.MultiTaylor.query(**kwargs)
Evaluate the Taylor expansion at a given point.
Returns:
Type | Description |
---|---|
float
|
Value from all terms. |
nablachem.alchemy.MultiTaylor.query_detail(output, **kwargs)
Breaks down the Taylor expansion into its monomials.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output |
str
|
The output variable for which this analysis is done. |
required |
Returns:
Type | Description |
---|---|
dict[tuple[str, int], float]
|
Keys are the variable names and the exponents, values are the contributions from each monomial. |
nablachem.alchemy.MultiTaylor.reset_center(**kwargs)
Sets the expansion center from named arguments for each column.
nablachem.alchemy.MultiTaylor.reset_filter(**kwargs)
Sets the filter for the dataframe from named arguments for each column.
All columns which are not filtered and not outputs are considered to be input coordinates.
nablachem.space
nablachem.space.ApproximateCounter
nablachem.space.ApproximateCounter.count(search_space, natoms, selection=None)
Counts the total number of molecules in a search space.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_space |
SearchSpace
|
The search space. |
required |
natoms |
int
|
The number of atoms to restrict to. |
required |
selection |
Q
|
A subselection based on a query string, by default None |
None
|
Returns:
Type | Description |
---|---|
int
|
Total count of molecules in this search space. |
nablachem.space.ApproximateCounter.count_cases(search_space, natoms)
Counts the total number of stoichiometries in a search space.
Note that different stoichiometries could yield the same sum formula.
Note that this only returns cases where all valences are saturated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_space |
SearchSpace
|
The search space. |
required |
natoms |
int
|
The number of atoms for which to count. |
required |
Returns:
Type | Description |
---|---|
int
|
Total number of stoichiometries. |
nablachem.space.ApproximateCounter.count_one(stoichiometry, natoms)
Counts the total number of molecules in a given stoichiometry.
The redundant specification of the number of atoms is a performance tweak.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stoichiometry |
AtomStoichiometry
|
The stoichiometry to count. |
required |
natoms |
int
|
Number of atoms in that stoichiometry. |
required |
Returns:
Type | Description |
---|---|
int
|
Total count of molecules. |
nablachem.space.ApproximateCounter.count_one_bare(label, natoms, cached_degree_sequence=False)
Counts the number of molecules of a given colored degree sequence.
The last two arguments are performance tweaks and not strictly necessary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
label |
tuple[int]
|
Degree sequence in groups ((degree, natoms), (degree, natoms)) |
required |
natoms |
int
|
The total number of atoms. |
required |
cached_degree_sequence |
bool
|
Whether this case has the same pure degree sequence of the previous call, by default False |
False
|
Returns:
Type | Description |
---|---|
int
|
Total count. |
nablachem.space.ApproximateCounter.count_sum_formulas(search_space, natoms)
Counts the total number of sum formulas in a search space.
Note that this only returns cases where all valences are saturated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_space |
SearchSpace
|
The search space. |
required |
natoms |
int
|
The number of atoms for which to count. |
required |
Returns:
Type | Description |
---|---|
int
|
Total number of sum formulas. |
nablachem.space.ApproximateCounter.estimate_edit_tree_average_path_length(canonical_label, ngraphs)
staticmethod
Estimates the average path length via graph edit distance heuristics.
This is done by choosing ngraphs
random molecules and calculating their pairwise graph distance
by finding the minimal edit distance between them. The final answer is then averaged over all unique
pairs in this list. Therefore runtime scales quadratically with ngraphs
.
The shorted edit distance is defined as the Wasserstein metric between adjacency matrices of two molecules.
The function also returns a statistic of the success of the different strategies to propose the minimal edit distance. For a description of those strategies, see the docstrings of the local functions.
canonical_label, ngraphs, total_path_length / (ngraphs * (ngraphs - 1) / 2), dict(strategy_score),
Parameters:
Name | Type | Description | Default |
---|---|---|---|
canonical_label |
str
|
The case for which to run the computation. Format is "degree.natoms_degree.natoms_degree.natoms". |
required |
ngraphs |
int
|
Number of graphs to use for averaging. Suggested to be 30-50. |
required |
Returns:
Type | Description |
---|---|
tuple[str, int, float, dict[str, int]]
|
The canonical label, the number of graphs, the average path length, the success counts of the individual strategies. |
nablachem.space.ApproximateCounter.estimated_in_cache(maxsize=None)
Builds a list of all those degree sequences that are estimated only.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
maxsize |
int
|
Cap of the estimated size of the number of graphs with that degree sequence, by default None |
None
|
Returns:
Type | Description |
---|---|
list[str]
|
List of canonical labels |
nablachem.space.ApproximateCounter.missing_parameters(search_space, natoms, pure_only, selection=None)
Returns the colored degree sequences for which no parameters are available in the database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_space |
SearchSpace
|
The search space. |
required |
natoms |
int
|
Number of atoms to check for. |
required |
pure_only |
bool
|
Whether only pure degree sequences should be precomputed for this space. |
required |
selection |
Q
|
Subselection by query string, by default None |
None
|
Returns:
Type | Description |
---|---|
list[tuple[int]]
|
The colored degree sequences for which there are no parameters. |
nablachem.space.ApproximateCounter.random_sample(search_space, natoms, nmols, selection=None)
Builds a fixed size random sample from a search space for a given number of atoms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_space |
SearchSpace
|
The total search space. |
required |
natoms |
int
|
The total number of atoms of the chosen molecules. |
required |
nmols |
int
|
Number of random molecules to be drawn. |
required |
selection |
Q
|
Selecting subsets via query string, by default None |
None
|
Returns:
Type | Description |
---|---|
list[Molecule]
|
Random molecules. |
Raises:
Type | Description |
---|---|
ValueError
|
If selection is empty. |
nablachem.space.ApproximateCounter.sample_connected(spec)
staticmethod
Find a random connected multigraph of a given degree sequence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spec |
str
|
Canonical label from which the degree sequence is extracted. |
required |
Returns:
Type | Description |
---|---|
MultiGraph
|
The resulting graph |
nablachem.space.ApproximateCounter.score_database(search_space, natoms, sum_formulas, selection=None)
Implements the Kolmogorov-Smirnov statistic comparing the distribution of the database to the expected distribution.
The best score is 0, the worst is 1.
This does not test the distribution of molecules within a given sum formula.
nablachem.space.ApproximateCounter.spec_to_sequence(spec)
staticmethod
Converts a canonical label to a degree sequence and pseudoelement labels.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spec |
str
|
Canonical label of the pseudomolecule ("degree.natoms_degree.natoms"). |
required |
Returns:
Type | Description |
---|---|
tuple[list[int], list[str]]
|
Degree sequence and artificial element labels. |
nablachem.space.ExactCounter
Python API for exact counts of molecular graphs via surge.
Note that this implementation avoids the built-in pruning of "infeasible" molecular graphs in surge by defining non-standard element labels.
nablachem.space.SearchSpace
nablachem.space.SearchSpace.covered_search_space(kind)
staticmethod
Returns the pre-defined chemical spaces from the original publication
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kind |
str
|
Label, either A or B. |
required |
Returns:
Type | Description |
---|---|
SearchSpace
|
The chosen space. |
nablachem.space.SearchSpace.list_cases_bare(natoms, degree_sequences_only=False, pure_sequences_only=False, progress=True)
Lists all possible stoichiometries for a given number of atoms.
If degree_sequences_only is set to True, only unique degree sequences are returned, i.e. the element names are not considered.
Optimized for performance, so yields tuples. Use list_cases() for a more user-friendly interface.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
natoms |
int
|
Number of atoms in the molecule. |
required |
degree_sequences_only |
bool
|
Flag to switch to degree sequence enumeration, by default False |
False
|
pure_sequences_only |
bool
|
Skips sequences where atoms of one valence belong to more than one element label. Implies degree_sequences_only. |
False
|
progress |
bool
|
Whether to show a progress bar, by default True |
True
|
Yields:
Type | Description |
---|---|
Iterator[tuple[str, int, int]] | Iterator[tuple[int, int]]
|
Either tuples of (element, valence, count) or (valence, count). Guaranteed to be sorted by (valence, count). |
nablachem.utils
nablachem.utils.integer_partition(total, maxelements)
Builds all integer partitions of total split into maxelements parts.