mcs_similarity#

skfp.distances.mcs_similarity(mol_a: Mol, mol_b: Mol, timeout: int = 3600) float#

MCS similarity between molecules.

Computes the Maximum Common Substructure (MCS) similarity [1] between two RDKit Mol objects, using the formula:

\[sim(mol_a, mol_b) = \frac{numAtoms(MCS(mol_a, mol_b))} {numAtoms(mol_a) + numAtoms(mol_b) - numAtoms(MCS(mol_a, mol_b))}\]

Number of atoms in MCS measures the structural overlap between molecules. FMCS algorithm [2] [3] [4] [5] is used for MCS computation. This measure penalizes the difference in size (number of atoms) between molecules.

The calculated similarity falls within the range \([0, 1]\).

Parameters:
  • mol_a (RDKit Mol object) – First molecule.

  • mol_b (RDKit Mol object) – Second molecule.

  • timeout (int, default=3600) – MCS computation timeout.

Returns:

similarity – MCS similarity between mol_a and mol_b.

Return type:

float

References

Examples

>>> from rdkit.Chem import MolFromSmiles
>>> from skfp.distances import mcs_similarity
>>> mol_a = MolFromSmiles("COc1cc(CN2CCC(NC(=O)c3cncc(C)c3)CC2)c(OC)c2ccccc12")
>>> mol_b = MolFromSmiles("COc1ccccc1")
>>> sim = mcs_similarity(mol_a, mol_b)
>>> sim
0.25806451612903225