ct4_count_similarity#

skfp.distances.ct4_count_similarity(vec_a: ndarray | csr_array, vec_b: ndarray | csr_array) float#

Consonni–Todeschini 4 similarity for vectors of count values.

Computes the Consonni–Todeschini 4 similarity [1] [2] [3] for count data between two input arrays or sparse matrices, using the formula:

\[sim(a, b) = \frac{\log (1 + a \cdot b)}{\log (1 + \|a\|^2 + \|b\|^2 - a \cdot b)}\]

The calculated similarity falls within the range \([0, 1]\). Passing all-zero vectors to this function results in similarity of 1.

Parameters:
  • vec_a ({ndarray, sparse matrix}) – First count input array or sparse matrix.

  • vec_b ({ndarray, sparse matrix}) – Second count input array or sparse matrix.

Returns:

similarity – CT4 similarity between vec_a and vec_b.

Return type:

float

References

Examples

>>> from skfp.distances import ct4_count_similarity
>>> import numpy as np
>>> vec_a = np.array([7, 1, 1])
>>> vec_b = np.array([7, 1, 2])
>>> sim = ct4_count_similarity(vec_a, vec_b)
>>> sim
0.9953140617275088
>>> from scipy.sparse import csr_array
>>> vec_a = csr_array([[7, 1, 1]])
>>> vec_b = csr_array([[7, 1, 2]])
>>> sim = ct4_count_similarity(vec_a, vec_b)
>>> sim
0.9953140617275088