mcconnaughey_binary_similarity#

skfp.distances.mcconnaughey_binary_similarity(vec_a: ndarray | csr_array, vec_b: ndarray | csr_array, normalized: bool = False) float#

McConnaughey similarity for vectors of binary values.

Computes the McConnaughey similarity [1] [2] [3] for binary data between two input arrays or sparse matrices, using the formula:

\[sim(a, b) = \frac{(|a \cap b| \cdot (|a| + |b|) - |a| \cdot |b|}{|a| \cdot |b|} = \frac{|a \cap b|}{|a|} + \frac{|a \cap b|}{|b|} - 1\]

The calculated similarity falls within the range \([-1, 1]\). Use normalized argument to scale the similarity to range \([0, 1]\). Passing two all-zero vectors to this function results in a similarity of 1. Passing only one all-zero vector results in a similarity of -1 for the non-normalized variant and 0 for the normalized variant.

Parameters:
  • vec_a ({ndarray, sparse matrix}) – First binary input array or sparse matrix.

  • vec_b ({ndarray, sparse matrix}) – Second binary input array or sparse matrix.

  • normalized (bool, default=False) – Whether to normalize values to range [0, 1] by adding one and dividing the result by 2.

Returns:

similarity – McConnaughey similarity between vec_a and vec_b.

Return type:

float

References

Examples

>>> from skfp.distances import mcconnaughey_binary_similarity
>>> import numpy as np
>>> vec_a = np.array([1, 0, 1])
>>> vec_b = np.array([1, 0, 1])
>>> sim = mcconnaughey_binary_similarity(vec_a, vec_b)
>>> sim
1.0
>>> from scipy.sparse import csr_array
>>> vec_a = csr_array([[1, 0, 1]])
>>> vec_b = csr_array([[1, 0, 1]])
>>> sim = mcconnaughey_binary_similarity(vec_a, vec_b)
>>> sim
1.0