load_ogb_splits#

skfp.datasets.moleculenet.load_ogb_splits(dataset_name: str, data_dir: str | PathLike | None = None, as_dict: bool = False, verbose: bool = False) tuple[list[int], list[int], list[int]] | dict[str, list[int]]#

Load and return the MoleculeNet dataset splits from Open Graph Benchmark (OGB).

OGB [1] uses precomputed scaffold split with 80/10/10% split between train/valid/test subsets. Test set consists of the smallest scaffold groups, and follows MoleculeNet paper [2]. Those splits are widely used in literature.

Dataset names here are the same as returned by load_moleculenet_benchmark function, and are case-sensitive.

Parameters:
  • dataset_name ({"ESOL", "FreeSolv", "Lipophilicity","BACE", "BBBP", "HIV", "ClinTox",) – “MUV”, “SIDER”, “Tox21”, “ToxCast”, “PCBA”} Name of the dataset to loads splits for.

  • data_dir ({None, str, path-like}, default=None) – Path to the root data directory. If None, currently set scikit-learn directory is used, by default $HOME/scikit_learn_data.

  • as_dict (bool, default=False) – If True, returns the splits as dictionary with keys “train”, “valid” and “test”, and index lists as values. Otherwise, returns three lists with splits indexes.

  • verbose (bool, default=False) – If True, progress bar will be shown for downloading or loading files.

Returns:

data – Depending on the as_dict argument, one of: - three lists of integer indexes - dictionary with “train”, “valid” and “test” keys, and values as lists with splits indexes

Return type:

tuple(list[int], list[int], list[int]) or dict

References