You can use the threadpoolctl (Thread-pool Controls) Python package to limit the number of threads used in native libraries (e.g., sklearn, NumPy, xgboost, etc.) that handle their own internal threadpool (BLAS and OpenMP implementations).
To install this package, run the following command on the terminal:
sudo pip install threadpoolctl
To check the current state of the threadpool-enabled runtime libraries that are loaded when importing Python packages, you can use the threadpool_info() function.
Here are the examples for numpy and xgboost:
>>> from threadpoolctl import threadpool_info
>>> from pprint import pprint
>>> import numpy
>>> pprint(threadpool_info())
[{'filepath': '/usr/local/lib/python3.8/dist-packages/numpy.libs/libopenblasp-r0-2d23e62b.3.17.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.17'}]
>>> import xgboost
>>> pprint(threadpool_info())
[{'filepath': '/usr/local/lib/python3.8/dist-packages/numpy.libs/libopenblasp-r0-2d23e62b.3.17.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.17'},
{'filepath': '/usr/local/lib/python3.8/dist-packages/scikit_learn.libs/libgomp-f7e03b3e.so.1.0.0',
'internal_api': 'openmp',
'num_threads': 8,
'prefix': 'libgomp',
'user_api': 'openmp',
'version': None},
{'filepath': '/usr/local/lib/python3.8/dist-packages/scipy.libs/libopenblasp-r0-085ca80a.3.9.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.9'},
{'filepath': '/usr/local/lib/python3.8/dist-packages/xgboost.libs/libgomp-a34b3233.so.1.0.0',
'internal_api': 'openmp',
'num_threads': 8,
'prefix': 'libgomp',
'user_api': 'openmp',
'version': None}]
To limit the number of threads, you can use threadpool_limits() function. Here is an example that sets the thread limit to 4 for BLAS implementation. For OpenMP, you need to use user_api='openmp'.
>>> from threadpoolctl import threadpool_limits
>>> import numpy as np
>>> with threadpool_limits(limits=4, user_api='blas'):
... a = np.random.randn(1000, 1000)
... aa = a@a
...