enrichment_factor#

skfp.metrics.enrichment_factor(y_true: ndarray | list[int], y_score: ndarray | list[float], fraction: float = 0.05) float#

Enrichment factor (EF).

EF at fraction X is calculated as the number of actives found by the model, divided by the expected number of actives from a random ranking. See also [1] for details.

We have n actives, N test molecules, and percentage fraction of the top molecules (e.g. 0.05). Take fraction * N compounds with highest y_score values, and mark the number of actives among them as a. Random classifier would get on average fraction * n actives. Enrichment factor EF(X) is then defined as a ratio:

\[EF(X) = \frac{a}{X*n}\]

Minimal value is 0. Maximal value depends on the fraction of actives in the dataset, and is equal to 1/X if X >= n/N, and N/n otherwise. Model as good as random guessing would get 1. Note that values depend on the ratio of actives in the dataset and fraction value.

Parameters:
  • y_true (array-like of shape (n_samples,)) – Ground truth (correct) target values.

  • y_score (array-like of shape (n_samples,)) – Target scores, e.g. probability of the positive class, or similarities to active compounds.

  • fraction (float, default=0.05) – Fraction of the dataset used for calculating the enrichment. Common values are 0.01 and 0.05. Note that this value affects the possible value range.

Returns:

score – Enrichment factor value.

Return type:

float

References

Examples

>>> import numpy as np
>>> from skfp.metrics import enrichment_factor
>>> y_true = [0, 0, 1]
>>> y_score = [0.1, 0.2, 0.7]
>>> enrichment_factor(y_true, y_score)
3.0