Chemical Space
Exploring chemical space
nablachem.space.SearchSpace(elements: str = None)
nablachem.space.SearchSpace.max_valence
property
nablachem.space.SearchSpace.add_element(element: Element)
nablachem.space.SearchSpace.covered_search_space(kind: str)
staticmethod
Returns the pre-defined chemical spaces from the original publication
Parameters
kind : str Label, either A or B.
Returns
SearchSpace The chosen space.
nablachem.space.SearchSpace.get_elements_from_valence(valence)
nablachem.space.SearchSpace.list_cases(natoms: int, progress: bool = True) -> Iterator[AtomStoichiometry]
nablachem.space.SearchSpace.list_cases_bare(natoms: int, degree_sequences_only: bool = False, pure_sequences_only: bool = False, progress: bool = True) -> Iterator[tuple[str, int, int]] | Iterator[tuple[int, int]]
Lists all possible stoichiometries for a given number of atoms.
If degree_sequences_only is set to True, only unique degree sequences are returned, i.e. the element names are not considered.
Optimized for performance, so yields tuples. Use list_cases() for a more user-friendly interface.
Parameters
natoms : int Number of atoms in the molecule. degree_sequences_only : bool, optional Flag to switch to degree sequence enumeration, by default False pure_sequences_only : bool, optional Skips sequences where atoms of one valence belong to more than one element label. Implies degree_sequences_only. progress : bool, optional Whether to show a progress bar, by default True
Yields
Iterator[tuple[str, int, int]] | Iterator[tuple[int, int]] Either tuples of (element, valence, count) or (valence, count). Guaranteed to be sorted by (valence, count).
nablachem.space.ApproximateCounter(other_cachedirs: list[pathlib.Path] = None, show_progress=True)
nablachem.space.ApproximateCounter.count(search_space: SearchSpace, natoms: int, selection: Q = None) -> int
Counts the total number of molecules in a search space.
Parameters
search_space : SearchSpace The search space. natoms : int The number of atoms to restrict to. selection : Q, optional A subselection based on a query string, by default None
Returns
int Total count of molecules in this search space.
nablachem.space.ApproximateCounter.count_cases(search_space: SearchSpace, natoms: int) -> int
Counts the total number of stoichiometries in a search space.
Note that different stoichiometries could yield the same sum formula.
Note that this only returns cases where all valences are saturated.
Parameters
search_space : SearchSpace The search space. natoms : int The number of atoms for which to count.
Returns
int Total number of stoichiometries.
nablachem.space.ApproximateCounter.count_one(stoichiometry: AtomStoichiometry, natoms: int) -> int
Counts the total number of molecules in a given stoichiometry.
The redundant specification of the number of atoms is a performance tweak.
Parameters
stoichiometry : AtomStoichiometry The stoichiometry to count. natoms : int Number of atoms in that stoichiometry.
Returns
int Total count of molecules.
nablachem.space.ApproximateCounter.count_one_bare(label: tuple[int], natoms: int, cached_degree_sequence: bool = False) -> int
Counts the number of molecules of a given colored degree sequence.
The last two arguments are performance tweaks and not strictly necessary.
Parameters
label : tuple[int] Degree sequence in groups ((degree, natoms), (degree, natoms)) natoms : int The total number of atoms. cached_degree_sequence : bool, optional Whether this case has the same pure degree sequence of the previous call, by default False
Returns
int Total count.
nablachem.space.ApproximateCounter.count_sum_formulas(search_space: SearchSpace, natoms: int) -> int
Counts the total number of sum formulas in a search space.
Note that this only returns cases where all valences are saturated.
Parameters
search_space : SearchSpace The search space. natoms : int The number of atoms for which to count.
Returns
int Total number of sum formulas.
nablachem.space.ApproximateCounter.estimate_edit_tree_average_path_length(canonical_label: str, ngraphs: int)
staticmethod
Estimates the average path length via graph edit distance heuristics.
This is done by choosing ngraphs
random molecules and calculating their pairwise graph distance
by finding the minimal edit distance between them. The final answer is then averaged over all unique
pairs in this list. Therefore runtime scales quadratically with ngraphs
.
The shorted edit distance is defined as the Wasserstein metric between adjacency matrices of two molecules.
The function also returns a statistic of the success of the different strategies to propose the minimal edit distance. For a description of those strategies, see the docstrings of the local functions.
canonical_label, ngraphs, total_path_length / (ngraphs * (ngraphs - 1) / 2), dict(strategy_score),
Parameters
canonical_label : str The case for which to run the computation. Format is "degree.natoms_degree.natoms_degree.natoms". ngraphs : int Number of graphs to use for averaging. Suggested to be 30-50.
Returns
tuple[str, int, float, dict[str, int]] The canonical label, the number of graphs, the average path length, the success counts of the individual strategies.
nablachem.space.ApproximateCounter.estimated_in_cache(maxsize: int = None) -> list[str]
Builds a list of all those degree sequences that are estimated only.
Parameters
maxsize : int, optional Cap of the estimated size of the number of graphs with that degree sequence, by default None
Returns
list[str] List of canonical labels
nablachem.space.ApproximateCounter.factorial(n)
cached
staticmethod
nablachem.space.ApproximateCounter.falling_factorial(n, k)
cached
staticmethod
nablachem.space.ApproximateCounter.missing_parameters(search_space: SearchSpace, natoms: int, pure_only: bool, selection: Q = None)
Returns the colored degree sequences for which no parameters are available in the database.
Parameters
search_space : SearchSpace The search space. natoms : int Number of atoms to check for. pure_only : bool Whether only pure degree sequences should be precomputed for this space. selection : Q, optional Subselection by query string, by default None
Returns
list[tuple[int]] The colored degree sequences for which there are no parameters.
nablachem.space.ApproximateCounter.random_sample(search_space: SearchSpace, natoms: int, nmols: int, selection: Q = None) -> list[Molecule]
Builds a fixed size random sample from a search space for a given number of atoms.
Parameters
search_space : SearchSpace The total search space. natoms : int The total number of atoms of the chosen molecules. nmols : int Number of random molecules to be drawn. selection : Q, optional Selecting subsets via query string, by default None
Returns
list[Molecule] Random molecules.
Raises
ValueError If selection is empty.
nablachem.space.ApproximateCounter.sample_connected(spec: str) -> nx.MultiGraph
staticmethod
Find a random connected multigraph of a given degree sequence.
Parameters
spec : str Canonical label from which the degree sequence is extracted.
Returns
nx.MultiGraph The resulting graph
nablachem.space.ApproximateCounter.score_database(search_space: SearchSpace, natoms: int, sum_formulas: dict[str, int], selection: Q = None) -> float
Implements the Kolmogorov-Smirnov statistic comparing the distribution of the database to the expected distribution.
The best score is 0, the worst is 1.
This does not test the distribution of molecules within a given sum formula.
nablachem.space.ApproximateCounter.spec_to_sequence(spec: str) -> tuple[list[int], list[str]]
staticmethod
Converts a canonical label to a degree sequence and pseudoelement labels.
Parameters
spec : str Canonical label of the pseudomolecule ("degree.natoms_degree.natoms").
Returns
tuple[list[int], list[str]] Degree sequence and artificial element labels.
nablachem.space.ExactCounter(binary: str, timeout: int = None)
Python API for exact counts of molecular graphs via surge.
Note that this implementation avoids the built-in pruning of "infeasible" molecular graphs in surge by defining non-standard element labels.
Uses surge (https://doi.org/10.1186/s13321-022-00604-9) which in turn leverages nauty.
Sets up the environment.
Parameters
binary : str Path to the surge binary. timeout : int, optional Limits the total runtime for counting any one chemical formula, by default None