kulczynski_binary_similarity#
- skfp.distances.kulczynski_binary_similarity(vec_a: ndarray | csr_array, vec_b: ndarray | csr_array) float #
Kulczynski similarity for vectors of binary values.
Computes the Kulczynski II similarity [1] [2] [3] for binary data between two input arrays or sparse matrices using the formula:
\[sim(x, y) = \frac{1}{2} \left( \frac{a}{a+b} + \frac{a}{a+c} \right)\]where \(a\), \(b\) and \(c\) correspond to the number of bit relations between the two vectors:
\(a\) - both are 1 (\(|x \cap y|\), common “on” bits)
\(b\) - \(x\) is 1, \(y\) is 0
\(c\) - \(x\) is 0, \(y\) is 1
Note that this is the second Kulczynski similarity, also used by RDKit. It differs from Kulczynski I similarity from e.g. SciPy.
The calculated similarity falls within the range \([0, 1]\). Passing two all-zero vectors to this function results in a similarity of 1. However, when only one is all-zero (i.e. \(a+b=0\) or \(a+c=0\)), the similarity is 0.
- Parameters:
vec_a ({ndarray, sparse matrix}) – First binary input array or sparse matrix.
vec_b ({ndarray, sparse matrix}) – Second binary input array or sparse matrix.
- Returns:
similarity – Kulczynski similarity between
vec_a
andvec_b
.- Return type:
float
References
Examples
>>> from skfp.distances import kulczynski_binary_similarity >>> import numpy as np >>> vec_a = np.array([1, 0, 1]) >>> vec_b = np.array([1, 0, 1]) >>> sim = kulczynski_binary_similarity(vec_a, vec_b) >>> sim 1.0
>>> from scipy.sparse import csr_array >>> vec_a = csr_array([[1, 0, 1]]) >>> vec_b = csr_array([[1, 0, 1]]) >>> sim = kulczynski_binary_similarity(vec_a, vec_b) >>> sim 1.0