load_cyp1a2_veith#

skfp.datasets.tdc.adme.load_cyp1a2_veith(data_dir: str | PathLike | None = None, as_frame: bool = False, verbose: bool = False) → DataFrame | tuple[list[str]] | ndarray#

Load the CYP1A2 subset of CYP P450 Veith dataset.

The CYP P450 genes are involved in the formation and breakdown (metabolism) of various molecules and chemicals within cells. The task is to predict CYP1A2 inhibition. CYP1A2 localizes to the endoplasmic reticulum and its expression is induced by some polycyclic aromatic hydrocarbons (PAHs), some of which are found in cigarette smoke. It is able to metabolize some PAHs to carcinogenic intermediates. Other xenobiotic substrates for this enzyme include caffeine, aflatoxin B1, and acetaminophen [1] [2].

This dataset is a part of “metabolism” subset of ADME tasks.

All CYP P450 Veith subsets:

load_cyp1a2_veith()

load_cyp2c9_veith()

load_cyp2c19_veith()

load_cyp2d6_veith()

load_cyp3a4_veith()

Tasks	1
Task type	classification
Total samples	12579
Recommended split	scaffold
Recommended metric	AUPRC

Parameters:

data_dir ({None, str, path-like}, default=None) – Path to the root data directory. If None, currently set scikit-learn directory is used, by default $HOME/scikit_learn_data.
as_frame (bool, default=False) – If True, returns the raw DataFrame with columns: “SMILES”, “label”. Otherwise, returns SMILES as list of strings, and labels as a NumPy array (1D integer binary vector).
verbose (bool, default=False) – If True, progress bar will be shown for downloading or loading files.

Returns:

data – Depending on the as_frame argument, one of: - Pandas DataFrame with columns: “SMILES”, “label” - tuple of: list of strings (SMILES), NumPy array (labels)

Return type:

pd.DataFrame or tuple(list[str], np.ndarray)

References

load_cyp1a2_veith#

This Page