fraggle_similarity#
- skfp.distances.fraggle_similarity(mol_query: Mol, mol_ref: Mol, tversky_threshold: float = 0.8) float #
Fraggle similarity between molecules.
Computes the Fraggle similarity [1] [2] between two RDKit
Mol
objects. It is a hybrid between substructure and similarity search, using a “fuzzy” matching based on molecule fragments and Tanimoto similarity of path-based fingerprints.It is designed to properly recognize small changes in the “middle” of a molecule, where typical fingerprint-based measures would result in a very low similarity.
This is an asymmetric measure, with query and reference molecules. It consists of 3 phases: query fragmentation, fragment-reference Tversky similarity comparison, and Tanimoto similarity comparison using RDKit fingerprints.
Query molecule is fragmented into “interesting” substructures by acyclic and ring cuts, leaving only “large” parts of a molecule (>60%). Fragements are then compared with the reference molecule using Tversky similarity [3] (alpha=0.95, beta=0.05), keeping those with at least
tversky_threshold
similarity. Lastly, the Tanimoto similarity of RDKit fingerprints with path length 5 is computed for kept fragments and the reference molecule. The highest one is the Fraggle similarity value.The calculated similarity falls within the range \([0, 1]\). Passing all-zero vectors to this function results in a similarity of 0. Note that this measure is asymmetric, and order of query and reference molecules matters.
- Parameters:
mol_query (RDKit
Mol
object) – Query molecule.mol_ref (RDKit
Mol
object) – Reference molecule.tversky_threshold (float, default=0.8) – Required minimal Tversky similarity [3] between a fragment and the reference molecule.
- Returns:
similarity – Fraggle similarity between
mol_query
andmol_ref
.- Return type:
float
References
Examples
>>> from rdkit.Chem import MolFromSmiles >>> from skfp.distances import fraggle_similarity >>> mol_query = MolFromSmiles("COc1cc(CN2CCC(NC(=O)c3cncc(C)c3)CC2)c(OC)c2ccccc12") >>> mol_ref = MolFromSmiles("COc1ccccc1") >>> sim = fraggle_similarity(mol_query, mol_ref) >>> sim 0.1640625