harris_lahey_binary_similarity#

skfp.distances.harris_lahey_binary_similarity(vec_a: ndarray | csr_array, vec_b: ndarray | csr_array, normalized: bool = False) float#

Harris-Lahey similarity for vectors of binary values.

Computes the Harris-Lahey similarity [1] [2] [3] for binary data between two input arrays or sparse matrices, using the formula:

\[sim(x, y) = \frac{a * (2d + b + c)}{2 * (a + b + c)} + \frac{d * (2a + b + c)}{2 * (b + c + d)}\]

where \(a\), \(b\), \(c\) and \(d\) correspond to the number of bit relations between the two vectors:

  • \(a\) - both are 1 (\(|x \cap y|\), common “on” bits)

  • \(b\) - \(x\) is 1, \(y\) is 0

  • \(c\) - \(x\) is 0, \(y\) is 1

  • \(d\) - both are 0

The calculated similarity falls within the range \([0, n]\), where \(n\) is the length of vectors. Use normalized argument to scale the similarity to range \([0, 1]\). Passing all-zero or all-one vectors to this function results in a similarity of \(n\).

Parameters:
  • vec_a ({ndarray, sparse matrix}) – First binary input array or sparse matrix.

  • vec_b ({ndarray, sparse matrix}) – Second binary input array or sparse matrix.

  • normalized (bool, default=False) – Whether to divide the resulting similarity by length of vectors (their number of elements), to normalize values to range [0, 1].

Returns:

similarity – Harris-Lahey similarity between vec_a and vec_b.

Return type:

float

References

Examples

>>> from skfp.distances import harris_lahey_binary_similarity
>>> import numpy as np
>>> vec_a = np.array([1, 0, 1])
>>> vec_b = np.array([1, 0, 1])
>>> sim = harris_lahey_binary_similarity(vec_a, vec_b)
>>> sim
3.0
>>> from scipy.sparse import csr_array
>>> vec_a = csr_array([[1, 0, 1]])
>>> vec_b = csr_array([[1, 0, 1]])
>>> sim = harris_lahey_binary_similarity(vec_a, vec_b)
>>> sim
3.0