tanimoto_count_similarity#

skfp.distances.tanimoto_count_similarity(vec_a: ndarray | csr_array, vec_b: ndarray | csr_array) float#

Tanimoto similarity for vectors of count values.

Computes the Tanimoto similarity [1] for count data between two input arrays or sparse matrices using the formula:

\[sim(vec_a, vec_b) = \frac{vec_a \cdot vec_b}{\|vec_a\|^2 + \|vec_b\|^2 - vec_a \cdot vec_b}\]

Calculated similarity falls within the range of [0, 1]. Passing all-zero vectors to this function results in similarity of 1.

Note that Numpy version is optimized with Numba JIT compiler, resulting in significantly faster performance compared to SciPy sparse arrays. First usage may be slightly slower due to Numba compilation.

Parameters:
  • vec_a ({ndarray, sparse matrix}) – First count input array or sparse matrix.

  • vec_b ({ndarray, sparse matrix}) – Second count input array or sparse matrix.

Returns:

similarity – Tanimoto similarity between vec_a and vec_b.

Return type:

float

References

Examples

>>> from skfp.distances import tanimoto_count_similarity
>>> import numpy as np
>>> vec_a = np.array([7, 1, 1])
>>> vec_b = np.array([7, 1, 2])
>>> sim = tanimoto_count_similarity(vec_a, vec_b)
>>> sim  
0.9811320754716981
>>> from skfp.distances import tanimoto_count_similarity
>>> from scipy.sparse import csr_array
>>> vec_a = csr_array([[7, 1, 1]])
>>> vec_b = csr_array([[7, 1, 2]])
>>> sim = tanimoto_count_similarity(vec_a, vec_b)
>>> sim  
0.9811320754716981