run_in_parallel#

skfp.utils.run_in_parallel(func: Callable, data: Sequence, n_jobs: int | None = None, batch_size: int | None = None, flatten_results: bool = False, verbose: int = 0) list#

Run a function in parallel on provided data in batches, using joblib.

Results are returned in the same order as input data. func function must take batch of data, e.g. list of integers, not a single integer.

If func returns lists, the result will be a list of lists. To get a flat list of results, use flatten_results=True.

Note that progress bar for verbose option tracks processing of data batches, not individual data points.

Parameters:
  • func (Callable) – The function to run in parallel. It must take only a single argument, a batch of data.

  • data ({sequence, array-like} of shape (n_samples,)) – Sequence containing data to process.

  • n_jobs (int, default=None) – The number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Scikit-learn documentation on n_jobs for more details.

  • batch_size (int, default=None) – Number of inputs processed in each batch. None divides input data into equal-sized parts, as many as n_jobs.

  • flatten_results (bool, default=False) – Whether to flatten the results, e.g. to change list of lists of integers into a list of integers.

  • verbose (int, default=0) – Controls the verbosity. If higher than zero, progress bar will be shown, tracking the processing of batches.

Returns:

X – The processed data. If processing function returns functions, this will be a list of lists.

Return type:

list of length (n_samples,)

Examples

>>> from skfp.utils import run_in_parallel
>>> func = lambda X: [x + 1 for x in X]
>>> data = list(range(10))
>>> run_in_parallel(func, data, n_jobs=-1, batch_size=1)
[[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]
>>> run_in_parallel(func, data, n_jobs=-1, batch_size=1, flatten_results=True)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]