Chemical Space

Estimating the intrinsic dimensionality

`nablachem.dimensionality.Estimator(gradient_energy: np.ndarray, hessian_energy: np.ndarray, gradient_property: np.ndarray, hessian_property: np.ndarray, dt: float, scaling_groups: list[bool] = None)`

Estimate the intrinsic dimensionality (ID) of a scalar property in chemical space.

Parameters:

Name	Type	Description	Default
`gradient_energy`	`ndarray`	Gradient of the energy with respect to 4N atomic coordinates (Z, x, y, z).	required
`hessian_energy`	`ndarray`	Hessian of the energy with respect to 4N atomic coordinates.	required
`gradient_property`	`ndarray`	Gradient of the scalar property to be approximated (can be energy or another property).	required
`hessian_property`	`ndarray`	Hessian of the scalar property to be approximated.	required
`dt`	`float`	Threshold used to detect degeneracy between eigenvalues.	required
`scaling_groups`	`list[bool]`	Boolean mask indicating which variables belong to chemical coordinates (True) versus spatial coordinates (False). This allows separate scaling of both groups (from different physical origins) to make the Hessian matrix's eigenvalues degenerate.	`None`

`nablachem.dimensionality.Estimator.getID()`

Compute the intrinsic dimensionality (ID) by incrementally selecting eigenvectors that minimize the property estimation error.

Returns:

Name	Type	Description
`dict`		Dictionary with the following keys: - "ID" (list of int): List of dimensionality values (after degeneracy correction). - "Error" (list of float): Corresponding estimation errors. - "natoms" (int): Number of atoms (N).

`nablachem.dimensionality.Estimator.get_bounds_analytical(mod_vec: np.ndarray, g: np.ndarray, h: np.ndarray) -> list[list[float]]`

Determines thermally accessible bounds for each eigenvector direction using energy.

Parameters:

Name	Type	Description	Default
`mod_vec`	`ndarray`	Eigenvectors of the Hessian.	required
`g`	`ndarray`	Energy gradient.	required
`h`	`ndarray`	Energy Hessian.	required

Returns:

Type	Description
`list[list[float]]`	list[list[float]]: Bounds (positive and negative) for each eigenvector direction.

Exploring chemical space

`nablachem.space.SearchSpace(elements: str = None)`

`nablachem.space.SearchSpace.max_valence` `property`

`nablachem.space.SearchSpace.add_element(element: Element)`

`nablachem.space.SearchSpace.covered_search_space(kind: str)` `staticmethod`

Returns the pre-defined chemical spaces from the original publication

Parameters

kind : str Label, either A or B.

Returns

SearchSpace The chosen space.

`nablachem.space.SearchSpace.get_elements_from_valence(valence)`

`nablachem.space.SearchSpace.list_cases(natoms: int, progress: bool = True) -> Iterator[AtomStoichiometry]`

`nablachem.space.SearchSpace.list_cases_bare(natoms: int, degree_sequences_only: bool = False, pure_sequences_only: bool = False, progress: bool = True) -> Iterator[tuple[str, int, int]] | Iterator[tuple[int, int]]`

Lists all possible stoichiometries for a given number of atoms.

If degree_sequences_only is set to True, only unique degree sequences are returned, i.e. the element names are not considered.

Optimized for performance, so yields tuples. Use list_cases() for a more user-friendly interface.

Parameters

natoms : int Number of atoms in the molecule. degree_sequences_only : bool, optional Flag to switch to degree sequence enumeration, by default False pure_sequences_only : bool, optional Skips sequences where atoms of one valence belong to more than one element label. Implies degree_sequences_only. progress : bool, optional Whether to show a progress bar, by default True

Yields

Iterator[tuple[str, int, int]] | Iterator[tuple[int, int]] Either tuples of (element, valence, count) or (valence, count). Guaranteed to be sorted by (valence, count).

`nablachem.space.ApproximateCounter(show_progress=True)`

`nablachem.space.ApproximateCounter.count(search_space: SearchSpace, natoms: int, selection: Q = None) -> int`

Counts the total number of molecules in a search space.

Parameters

search_space : SearchSpace The search space. natoms : int The number of atoms to restrict to. selection : Q, optional A subselection based on a query string, by default None

Returns

int Total count of molecules in this search space.

`nablachem.space.ApproximateCounter.count_cases(search_space: SearchSpace, natoms: int) -> int`

Counts the total number of stoichiometries in a search space.

Note that different stoichiometries could yield the same sum formula.

Note that this only returns cases where all valences are saturated.

Parameters

search_space : SearchSpace The search space. natoms : int The number of atoms for which to count.

Returns

int Total number of stoichiometries.

`nablachem.space.ApproximateCounter.count_one(stoichiometry: AtomStoichiometry, natoms: int, validated: bool = False) -> int`

Counts the total number of molecules in a given stoichiometry.

The redundant specification of the number of atoms is a performance tweak.

Parameters

stoichiometry : AtomStoichiometry The stoichiometry to count. natoms : int Number of atoms in that stoichiometry. validated : bool, optional Whether the given degree sequence needs to be checked for feasibility. When in doubt, True

Returns

int Total count of molecules.

`nablachem.space.ApproximateCounter.count_one_bare(label: tuple[int], natoms: int, cached_degree_sequence: bool = False, validated: bool = False) -> int`

Counts the number of molecules of a given colored degree sequence.

The last two arguments are performance tweaks and not strictly necessary.

Parameters

label : tuple[int] Degree sequence in groups ((degree, natoms), (degree, natoms)) natoms : int The total number of atoms. cached_degree_sequence : bool, optional Whether this case has the same pure degree sequence of the previous call, by default False validated : bool, optional Whether the given degree sequence needs to be checked for feasibility. When in doubt, True

Returns

int Total count.

`nablachem.space.ApproximateCounter.count_sum_formulas(search_space: SearchSpace, natoms: int) -> int`

Counts the total number of sum formulas in a search space.

Note that this only returns cases where all valences are saturated.

Parameters

search_space : SearchSpace The search space. natoms : int The number of atoms for which to count.

Returns

int Total number of sum formulas.

`nablachem.space.ApproximateCounter.estimated_in_cache(maxsize: int = None) -> Generator[str, None, None]`

Builds a list of all those degree sequences that are estimated only.

Parameters

maxsize : int, optional Cap of the estimated size of the number of graphs with that degree sequence, by default None

Returns

list[str] List of canonical labels

`nablachem.space.ApproximateCounter.missing_parameters(search_space: SearchSpace, natoms: int, pure_only: bool, selection: Q = None)`

Returns the colored degree sequences for which no parameters are available in the database.

Parameters

search_space : SearchSpace The search space. natoms : int Number of atoms to check for. pure_only : bool Whether only pure degree sequences should be precomputed for this space. selection : Q, optional Subselection by query string, by default None

Returns

list[tuple[int]] The colored degree sequences for which there are no parameters.

`nablachem.space.ApproximateCounter.sample_connected(spec: str) -> nx.MultiGraph` `staticmethod`

Find a random connected multigraph of a given degree sequence.

Parameters

spec : str Canonical label from which the degree sequence is extracted.

Returns

nx.MultiGraph The resulting graph

`nablachem.space.ApproximateCounter.spec_to_sequence(spec: str) -> tuple[list[int], list[str]]` `staticmethod`

Converts a canonical label to a degree sequence and pseudoelement labels.

Parameters

spec : str Canonical label of the pseudomolecule ("degree.natoms_degree.natoms").

Returns

tuple[list[int], list[str]] Degree sequence and artificial element labels.

`nablachem.space.ExactCounter(binary: str, timeout: int = None)`

Python API for exact counts of molecular graphs via surge.

Note that this implementation avoids the built-in pruning of "infeasible" molecular graphs in surge by defining non-standard element labels.

Uses surge (https://doi.org/10.1186/s13321-022-00604-9) which in turn leverages nauty.

Requires removal of the following line in surge.c as artificial constraint: if (deg[i] + hyd[i] > 4 && hyd[i] > 0) return;

Sets up the environment.

Parameters

binary : str Path to the surge binary. timeout : int, optional Limits the total runtime for counting any one chemical formula, by default None

Chemical Space

Estimating the intrinsic dimensionality

nablachem.dimensionality.Estimator(gradient_energy: np.ndarray, hessian_energy: np.ndarray, gradient_property: np.ndarray, hessian_property: np.ndarray, dt: float, scaling_groups: list[bool] = None)

nablachem.dimensionality.Estimator.getID()

nablachem.dimensionality.Estimator.get_bounds_analytical(mod_vec: np.ndarray, g: np.ndarray, h: np.ndarray) -> list[list[float]]

Exploring chemical space

nablachem.space.SearchSpace(elements: str = None)

nablachem.space.SearchSpace.max_valence property

nablachem.space.SearchSpace.add_element(element: Element)

nablachem.space.SearchSpace.covered_search_space(kind: str) staticmethod

Parameters

Returns

nablachem.space.SearchSpace.get_elements_from_valence(valence)

nablachem.space.SearchSpace.list_cases(natoms: int, progress: bool = True) -> Iterator[AtomStoichiometry]

nablachem.space.SearchSpace.list_cases_bare(natoms: int, degree_sequences_only: bool = False, pure_sequences_only: bool = False, progress: bool = True) -> Iterator[tuple[str, int, int]] | Iterator[tuple[int, int]]

Parameters

Yields

nablachem.space.ApproximateCounter(show_progress=True)

nablachem.space.ApproximateCounter.count(search_space: SearchSpace, natoms: int, selection: Q = None) -> int

Parameters

Returns

nablachem.space.ApproximateCounter.count_cases(search_space: SearchSpace, natoms: int) -> int

Parameters

Returns

nablachem.space.ApproximateCounter.count_one(stoichiometry: AtomStoichiometry, natoms: int, validated: bool = False) -> int

Parameters

Returns

nablachem.space.ApproximateCounter.count_one_bare(label: tuple[int], natoms: int, cached_degree_sequence: bool = False, validated: bool = False) -> int

Parameters

Returns

nablachem.space.ApproximateCounter.count_sum_formulas(search_space: SearchSpace, natoms: int) -> int

Parameters

Returns

nablachem.space.ApproximateCounter.estimated_in_cache(maxsize: int = None) -> Generator[str, None, None]

Parameters

Returns

nablachem.space.ApproximateCounter.missing_parameters(search_space: SearchSpace, natoms: int, pure_only: bool, selection: Q = None)

Parameters

Returns

nablachem.space.ApproximateCounter.sample_connected(spec: str) -> nx.MultiGraph staticmethod

Parameters

Returns

nablachem.space.ApproximateCounter.spec_to_sequence(spec: str) -> tuple[list[int], list[str]] staticmethod

Parameters

Returns

nablachem.space.ExactCounter(binary: str, timeout: int = None)

Parameters

nablachem.space.ExactCounter.count(search_space: SearchSpace, natoms: int) -> int

nablachem.space.ExactCounter.count_one(stoichiometry: AtomStoichiometry)

nablachem.space.ExactCounter.list(search_space: SearchSpace, natoms: int, selection: Q = None) -> Iterator[Molecule]

nablachem.space.ExactCounter.list_one(stoichiometry: AtomStoichiometry) -> Iterator[Molecule]

nablachem.space.ExactCounter.load(path)

nablachem.space.ExactCounter.save(path)

`nablachem.dimensionality.Estimator(gradient_energy: np.ndarray, hessian_energy: np.ndarray, gradient_property: np.ndarray, hessian_property: np.ndarray, dt: float, scaling_groups: list[bool] = None)`

`nablachem.dimensionality.Estimator.getID()`

`nablachem.dimensionality.Estimator.get_bounds_analytical(mod_vec: np.ndarray, g: np.ndarray, h: np.ndarray) -> list[list[float]]`

`nablachem.space.SearchSpace(elements: str = None)`

`nablachem.space.SearchSpace.max_valence` `property`

`nablachem.space.SearchSpace.add_element(element: Element)`

`nablachem.space.SearchSpace.covered_search_space(kind: str)` `staticmethod`

`nablachem.space.SearchSpace.get_elements_from_valence(valence)`

`nablachem.space.SearchSpace.list_cases(natoms: int, progress: bool = True) -> Iterator[AtomStoichiometry]`

`nablachem.space.SearchSpace.list_cases_bare(natoms: int, degree_sequences_only: bool = False, pure_sequences_only: bool = False, progress: bool = True) -> Iterator[tuple[str, int, int]] | Iterator[tuple[int, int]]`

`nablachem.space.ApproximateCounter(show_progress=True)`

`nablachem.space.ApproximateCounter.count(search_space: SearchSpace, natoms: int, selection: Q = None) -> int`

`nablachem.space.ApproximateCounter.count_cases(search_space: SearchSpace, natoms: int) -> int`

`nablachem.space.ApproximateCounter.count_one(stoichiometry: AtomStoichiometry, natoms: int, validated: bool = False) -> int`

`nablachem.space.ApproximateCounter.count_one_bare(label: tuple[int], natoms: int, cached_degree_sequence: bool = False, validated: bool = False) -> int`

`nablachem.space.ApproximateCounter.count_sum_formulas(search_space: SearchSpace, natoms: int) -> int`

`nablachem.space.ApproximateCounter.estimated_in_cache(maxsize: int = None) -> Generator[str, None, None]`

`nablachem.space.ApproximateCounter.missing_parameters(search_space: SearchSpace, natoms: int, pure_only: bool, selection: Q = None)`

`nablachem.space.ApproximateCounter.sample_connected(spec: str) -> nx.MultiGraph` `staticmethod`

`nablachem.space.ApproximateCounter.spec_to_sequence(spec: str) -> tuple[list[int], list[str]]` `staticmethod`

`nablachem.space.ExactCounter(binary: str, timeout: int = None)`

`nablachem.space.ExactCounter.count(search_space: SearchSpace, natoms: int) -> int`

`nablachem.space.ExactCounter.count_one(stoichiometry: AtomStoichiometry)`

`nablachem.space.ExactCounter.list(search_space: SearchSpace, natoms: int, selection: Q = None) -> Iterator[Molecule]`

`nablachem.space.ExactCounter.list_one(stoichiometry: AtomStoichiometry) -> Iterator[Molecule]`

`nablachem.space.ExactCounter.load(path)`

`nablachem.space.ExactCounter.save(path)`