load_lrgb_mol_splits#
- skfp.datasets.lrgb.load_lrgb_mol_splits(dataset_name: str, data_dir: str | PathLike | None = None, as_dict: bool = False, verbose: bool = False) tuple[list[int], list[int], list[int]] | dict[str, list[int]] #
Load and return the official LRGB splits for molecular datasets.
Long Range Graph Benchmark (LRGB) [1] uses precomputed stratified random split for both Peptides-func and Peptides-struct datasets.
Dataset names here are the same as returned by load_moleculenet_benchmark function, and are case-sensitive.
- Parameters:
dataset_name ({"Peptides-func", "Peptides-struct"}) – Name of the dataset to loads splits for.
data_dir ({None, str, path-like}, default=None) – Path to the root data directory. If
None
, currently set scikit-learn directory is used, by default $HOME/scikit_learn_data.as_dict (bool, default=False) – If True, returns the splits as dictionary with keys “train”, “valid” and “test”, and index lists as values. Otherwise, returns three lists with splits indexes.
verbose (bool, default=False) – If True, progress bar will be shown for downloading or loading files.
- Returns:
data – Depending on the as_dict argument, one of: - three lists of integer indexes - dictionary with “train”, “valid” and “test” keys, and values as lists with splits indexes
- Return type:
tuple(list[int], list[int], list[int]) or dict
References