Skip to content

Code

nablachem.alchemy

nablachem.alchemy.Monomial

A single monomial in the multi-dimensional Taylor expansion.

nablachem.alchemy.Monomial.__init__(prefactor, powers={})

Define the monomial.

Parameters:

Name Type Description Default
prefactor float

Weight or coefficient of the monomial.

required
powers dict[str, int]

Involved variables as keys and the exponent as value, by default {}.

{}

nablachem.alchemy.Monomial.distance(pos, center)

Evaluate the distance term of the Taylor expansion.

Parameters:

Name Type Description Default
pos dict[str, float]

The position at which the Monomial is evaluated. Keys are the variable names, values are the positions.

required
center dict[str, float]

The center of the Taylor expansion. Keys are the variable names, values are the positions.

required

Returns:

Type Description
float

Distance

nablachem.alchemy.Monomial.prefactor()

Calculates the Taylor expansion prefactor.

Returns:

Type Description
float

Prefactor for the summation in the Taylor expansion.

nablachem.alchemy.MultiTaylor

Multi-dimensional multi-variate arbitrary order Taylor expansion from any evenly spaced finite difference stencil.

Examples:

>>> import pandas as pd
>>> df = pd.read_csv("some_file.csv")
>>> df.columns
Index(['RX', 'RY', 'RZ', 'QX', 'QY', 'QZ', 'E', 'BETA1', 'BETA2',
   'SIGMA'],
  dtype='object')
>>> mt = MultiTaylor(df, outputs="BETA1 BETA2 SIGMA".split())
>>> spatial_center, electronic_center = 3, 2.5
>>> mt.reset_center(
    RX=spatial_center,
    RY=spatial_center,
    RZ=spatial_center,
    QX=electronic_center,
    QY=electronic_center,
    QZ=electronic_center,
)
>>> mt.reset_filter(E=4)
>>> mt.build_model(2)
>>> mt.query(RX=3.1, RY=3.1, RZ=3.1, QX=2.4, QY=2.4, QZ=2.4)
{'BETA1': 0.022412699999999976,
'BETA2': 0.014047600000000134,
'SIGMA': 0.0018744333333333316}

nablachem.alchemy.MultiTaylor.__init__(dataframe, outputs)

Initialize the Taylor expansion from a dataframe of data points forming the superset of stencils.

Parameters:

Name Type Description Default
dataframe DataFrame

Holds all data points available for the vicinity of the future center of the expansion.

required
outputs list[str]

Those columns of the dataframe that are considered to be outputs rather than input coordinates.

required

nablachem.alchemy.MultiTaylor.build_model(orders, additional_terms=[])

Sets up the model for a specific expansion order or list of terms.

Parameters:

Name Type Description Default
orders int

All terms are included in the expansion up to this order.

required
additional_terms list[tuple[str]]

The terms to ADDITIONALLY include, i.e. list of tuples of column names.

To only include d/dx, give [('x',)] To only include d^2/dx^2, give [('x', 'x')] To only include d^2/dxdy, give [('x', 'y')] To include all three, give [('x',), ('x', 'x'), ('x', 'y')]

[]

Raises:

Type Description
NotImplementedError

Center needs to be given in dataframe.

ValueError

Center is not unique.

ValueError

Duplicate points in the dataset.

ValueError

Invalid column names for additonal terms.

nablachem.alchemy.MultiTaylor.maximize(target, bounds)

See _optimize.

Parameters:

Name Type Description Default
target str

Column name to maximize.

required
bounds dict[str, tuple[float, float]]

Bounds for the search space.

required

Returns:

Type Description
dict[str, float]

Optimal position found.

nablachem.alchemy.MultiTaylor.minimize(target, bounds)

See _optimize.

Parameters:

Name Type Description Default
target str

Column name to minimize.

required
bounds dict[str, tuple[float, float]]

Bounds for the search space.

required

Returns:

Type Description
dict[str, float]

Optimal position found.

nablachem.alchemy.MultiTaylor.query(**kwargs)

Evaluate the Taylor expansion at a given point.

Returns:

Type Description
float

Value from all terms.

nablachem.alchemy.MultiTaylor.query_detail(output, **kwargs)

Breaks down the Taylor expansion into its monomials.

Parameters:

Name Type Description Default
output str

The output variable for which this analysis is done.

required

Returns:

Type Description
dict[tuple[str, int], float]

Keys are the variable names and the exponents, values are the contributions from each monomial.

nablachem.alchemy.MultiTaylor.reset_center(**kwargs)

Sets the expansion center from named arguments for each column.

nablachem.alchemy.MultiTaylor.reset_filter(**kwargs)

Sets the filter for the dataframe from named arguments for each column.

All columns which are not filtered and not outputs are considered to be input coordinates.

nablachem.space

nablachem.space.ApproximateCounter

nablachem.space.ApproximateCounter.count(search_space, natoms, selection=None)

Counts the total number of molecules in a search space.

Parameters:

Name Type Description Default
search_space SearchSpace

The search space.

required
natoms int

The number of atoms to restrict to.

required
selection Q

A subselection based on a query string, by default None

None

Returns:

Type Description
int

Total count of molecules in this search space.

nablachem.space.ApproximateCounter.count_cases(search_space, natoms)

Counts the total number of stoichiometries in a search space.

Note that different stoichiometries could yield the same sum formula.

Note that this only returns cases where all valences are saturated.

Parameters:

Name Type Description Default
search_space SearchSpace

The search space.

required
natoms int

The number of atoms for which to count.

required

Returns:

Type Description
int

Total number of stoichiometries.

nablachem.space.ApproximateCounter.count_one(stoichiometry, natoms)

Counts the total number of molecules in a given stoichiometry.

The redundant specification of the number of atoms is a performance tweak.

Parameters:

Name Type Description Default
stoichiometry AtomStoichiometry

The stoichiometry to count.

required
natoms int

Number of atoms in that stoichiometry.

required

Returns:

Type Description
int

Total count of molecules.

nablachem.space.ApproximateCounter.count_one_bare(label, natoms, cached_degree_sequence=False)

Counts the number of molecules of a given colored degree sequence.

The last two arguments are performance tweaks and not strictly necessary.

Parameters:

Name Type Description Default
label tuple[int]

Degree sequence in groups ((degree, natoms), (degree, natoms))

required
natoms int

The total number of atoms.

required
cached_degree_sequence bool

Whether this case has the same pure degree sequence of the previous call, by default False

False

Returns:

Type Description
int

Total count.

nablachem.space.ApproximateCounter.count_sum_formulas(search_space, natoms)

Counts the total number of sum formulas in a search space.

Note that this only returns cases where all valences are saturated.

Parameters:

Name Type Description Default
search_space SearchSpace

The search space.

required
natoms int

The number of atoms for which to count.

required

Returns:

Type Description
int

Total number of sum formulas.

nablachem.space.ApproximateCounter.estimate_edit_tree_average_path_length(canonical_label, ngraphs) staticmethod

Estimates the average path length via graph edit distance heuristics.

This is done by choosing ngraphs random molecules and calculating their pairwise graph distance by finding the minimal edit distance between them. The final answer is then averaged over all unique pairs in this list. Therefore runtime scales quadratically with ngraphs.

The shorted edit distance is defined as the Wasserstein metric between adjacency matrices of two molecules.

The function also returns a statistic of the success of the different strategies to propose the minimal edit distance. For a description of those strategies, see the docstrings of the local functions.

canonical_label, ngraphs, total_path_length / (ngraphs * (ngraphs - 1) / 2), dict(strategy_score),

Parameters:

Name Type Description Default
canonical_label str

The case for which to run the computation. Format is "degree.natoms_degree.natoms_degree.natoms".

required
ngraphs int

Number of graphs to use for averaging. Suggested to be 30-50.

required

Returns:

Type Description
tuple[str, int, float, dict[str, int]]

The canonical label, the number of graphs, the average path length, the success counts of the individual strategies.

nablachem.space.ApproximateCounter.estimated_in_cache(maxsize=None)

Builds a list of all those degree sequences that are estimated only.

Parameters:

Name Type Description Default
maxsize int

Cap of the estimated size of the number of graphs with that degree sequence, by default None

None

Returns:

Type Description
list[str]

List of canonical labels

nablachem.space.ApproximateCounter.missing_parameters(search_space, natoms, pure_only, selection=None)

Returns the colored degree sequences for which no parameters are available in the database.

Parameters:

Name Type Description Default
search_space SearchSpace

The search space.

required
natoms int

Number of atoms to check for.

required
pure_only bool

Whether only pure degree sequences should be precomputed for this space.

required
selection Q

Subselection by query string, by default None

None

Returns:

Type Description
list[tuple[int]]

The colored degree sequences for which there are no parameters.

nablachem.space.ApproximateCounter.random_sample(search_space, natoms, nmols, selection=None)

Builds a fixed size random sample from a search space for a given number of atoms.

Parameters:

Name Type Description Default
search_space SearchSpace

The total search space.

required
natoms int

The total number of atoms of the chosen molecules.

required
nmols int

Number of random molecules to be drawn.

required
selection Q

Selecting subsets via query string, by default None

None

Returns:

Type Description
list[Molecule]

Random molecules.

Raises:

Type Description
ValueError

If selection is empty.

nablachem.space.ApproximateCounter.sample_connected(spec) staticmethod

Find a random connected multigraph of a given degree sequence.

Parameters:

Name Type Description Default
spec str

Canonical label from which the degree sequence is extracted.

required

Returns:

Type Description
MultiGraph

The resulting graph

nablachem.space.ApproximateCounter.score_database(search_space, natoms, sum_formulas, selection=None)

Implements the Kolmogorov-Smirnov statistic comparing the distribution of the database to the expected distribution.

The best score is 0, the worst is 1.

This does not test the distribution of molecules within a given sum formula.

nablachem.space.ApproximateCounter.spec_to_sequence(spec) staticmethod

Converts a canonical label to a degree sequence and pseudoelement labels.

Parameters:

Name Type Description Default
spec str

Canonical label of the pseudomolecule ("degree.natoms_degree.natoms").

required

Returns:

Type Description
tuple[list[int], list[str]]

Degree sequence and artificial element labels.

nablachem.space.ExactCounter

Python API for exact counts of molecular graphs via surge.

Note that this implementation avoids the built-in pruning of "infeasible" molecular graphs in surge by defining non-standard element labels.

nablachem.space.SearchSpace

nablachem.space.SearchSpace.covered_search_space(kind) staticmethod

Returns the pre-defined chemical spaces from the original publication

Parameters:

Name Type Description Default
kind str

Label, either A or B.

required

Returns:

Type Description
SearchSpace

The chosen space.

nablachem.space.SearchSpace.list_cases_bare(natoms, degree_sequences_only=False, pure_sequences_only=False, progress=True)

Lists all possible stoichiometries for a given number of atoms.

If degree_sequences_only is set to True, only unique degree sequences are returned, i.e. the element names are not considered.

Optimized for performance, so yields tuples. Use list_cases() for a more user-friendly interface.

Parameters:

Name Type Description Default
natoms int

Number of atoms in the molecule.

required
degree_sequences_only bool

Flag to switch to degree sequence enumeration, by default False

False
pure_sequences_only bool

Skips sequences where atoms of one valence belong to more than one element label. Implies degree_sequences_only.

False
progress bool

Whether to show a progress bar, by default True

True

Yields:

Type Description
Iterator[tuple[str, int, int]] | Iterator[tuple[int, int]]

Either tuples of (element, valence, count) or (valence, count). Guaranteed to be sorted by (valence, count).

nablachem.utils

nablachem.utils.integer_partition(total, maxelements)

Builds all integer partitions of total split into maxelements parts.