run_in_parallel#
- skfp.utils.run_in_parallel(func: Callable, data: Sequence, n_jobs: int | None = None, batch_size: int | None = None, flatten_results: bool = False, verbose: int | dict = 0) list #
Run a function in parallel on provided data in batches, using joblib.
Results are returned in the same order as input data.
func
function must take batch of data, e.g. list of integers, not a single integer.If
func
returns lists, the result will be a list of lists. To get a flat list of results, useflatten_results=True
.Note that progress bar for
verbose
option tracks processing of data batches, not individual data points.- Parameters:
func (Callable) – The function to run in parallel. It must take only a single argument, a batch of data.
data ({sequence, array-like} of shape (n_samples,)) – Sequence containing data to process.
n_jobs (int, default=None) – The number of jobs to run in parallel.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Scikit-learn documentation onn_jobs
for more details.batch_size (int, default=None) – Number of inputs processed in each batch.
None
divides input data into equal-sized parts, as many asn_jobs
.flatten_results (bool, default=False) – Whether to flatten the results, e.g. to change list of lists of integers into a list of integers.
verbose (int or dict, default=0) – Controls the verbosity. If higher than zero, progress bar will be shown, tracking the processing of batches. If
dict
object is provided, it will be used to configure thetqdm
progress bar.
- Returns:
X – The processed data. If processing function returns functions, this will be a list of lists.
- Return type:
list of length (n_samples,)
Examples
>>> from skfp.utils import run_in_parallel >>> func = lambda X: [x + 1 for x in X] >>> data = list(range(10)) >>> run_in_parallel(func, data, n_jobs=-1, batch_size=1) [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]] >>> run_in_parallel(func, data, n_jobs=-1, batch_size=1, flatten_results=True) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]