load_cyp2c19_veith#

skfp.datasets.tdc.adme.load_cyp2c19_veith(data_dir: str | PathLike | None = None, as_frame: bool = False, verbose: bool = False) DataFrame | tuple[list[str]] | ndarray#

Load the CYP2C19 subset of CYP P450 Veith dataset.

The CYP P450 genes are involved in the formation and breakdown (metabolism) of various molecules and chemicals within cells. The task is to predict CYP2C19 inhibition. CYP2C19 gene provides instructions for making an enzyme called the endoplasmic reticulum, which is involved in protein processing and transport [1] [2].

All CYP P450 Veith subsets:

This dataset is a part of “metabolism” subset of ADME tasks.

Tasks

1

Task type

classification

Total samples

12665

Recommended split

scaffold

Recommended metric

AUPRC

Parameters:
  • data_dir ({None, str, path-like}, default=None) – Path to the root data directory. If None, currently set scikit-learn directory is used, by default $HOME/scikit_learn_data.

  • as_frame (bool, default=False) – If True, returns the raw DataFrame with columns: “SMILES”, “label”. Otherwise, returns SMILES as list of strings, and labels as a NumPy array (1D integer binary vector).

  • verbose (bool, default=False) – If True, progress bar will be shown for downloading or loading files.

Returns:

data – Depending on the as_frame argument, one of: - Pandas DataFrame with columns: “SMILES”, “label” - tuple of: list of strings (SMILES), NumPy array (labels)

Return type:

pd.DataFrame or tuple(list[str], np.ndarray)

References