Python acceleration with hardware-optimized computing

A guide to linking numpy, scipy, and scikit-learn to BLAS and LAPACK for hardware-optimized linear algebra execution

Who is this guide for?

This guide is for anyone working with the Python scientific computing stack and looking to accelerate the runtime performance of their programs. It’s applicability spans many use cases -- from running code locally to setting up production runtime environments for data processing and model execution. I wrote it to be comprehensive enough to help folks in a variety of situations, such as:

  • You’ve never heard of hardware optimization and just want a few quick modifications to your environment setup.
  • You’re deep in the rabbit hole of environment debugging and looking for tips to help dig yourself out.
  • You’re somewhere in the middle, or maybe you just want to read up more broadly on this technology.

Who am I?

I’ve worked as a data scientist and machine learning engineer at Capital One for about four and a half years and have over a decade of experience programming in Python. During that time, I’ve worked extensively with the Python scientific computing stack and contributed to a variety of open source projects, including gensim and pyarrow. I’ve explored everything from cython to numba to f2py to speed up dozens of Python programs and have often found that vectorizing code in NumPy and leveraging hardware optimizations is the lowest hanging fruit. After delving into dozens of articles on this topic and helping many colleagues troubleshoot their environments, it struck me that it would be great to have all this content in one place.

Overview of this guide

So let’s get to it! The Python scientific computing stack has a few very core libraries, including NumPy, SciPy, and scikit-learn. At the core of these libraries are a bunch of linear algebra routines. This guide explains how to link the hardware-optimized libraries BLAS and LAPACK to these core Python libraries. This will make them run much faster–often by an order of magnitude. This can be hard and confusing because these libraries are not written in Python and they come packaged in many different distributions which will perform differently depending on your hardware. The focus of this guide is on selecting the ideal distribution of BLAS and LAPACK for your system and ensuring your installation procedure properly links to this distribution.

In this guide, we will cover:

  • How to choose a BLAS and LAPACK distribution
  • How to install the MKL or OpenBLAS distributions
  • Troubleshooting linking issues in MLK or OpenBLAS
  • Tips for good linking results

For those looking for a skimming strategy, first check out the TL;DR at the end of the first section. Then:

  • Starting fresh: If you're starting with a fresh environment and just want to get started quickly, run the relevant command in the second section.
  • Am I good? If you're trying to figure out if your environment is already set up correctly, skip to the section on Identifying Which Libraries Are Linked.
  • I know I’m not good! If you already know the libraries are not linked properly and you need to modify an environment setup script to fix this, jump to the last section.

How to choose a BLAS and LAPACK distribution

Let’s start with the basics. In this section I’ll explain what BLAS and LAPACK are, cover their most popular distributions, and provide guidance on which distribution to choose for your machine. I’ll provide more specific guidance for a local desktop/laptop and for AWS instances, since these are the machines where I do most of my work these days.

What are BLAS and LAPACK?

BLAS stands for Basic Linear Algebra Subprograms and contains optimized routines for standard linear algebra operations. LAPACK stands for Linear Algebra Package; it's built on top of BLAS and provides higher-level optimized functions for things like linear solvers and matrix factorization. Their implementations are usually packaged together in common distributions, though you can have BLAS without LAPACK (but not the other way around).

Both BLAS and LAPACK are API specifications rather than specific SDKs. In other words, there can be multiple implementations of the BLAS and LAPACK interface which are drop-in replacements for each other. They have a common public API (set of function signatures), only differing in their underlying implementation details. Yet as with most software, those details can make a big difference in the performance each provides. Note that the term "distribution" here refers to both the actual packaging of the binaries and the underlying implementation details.

What are the most popular distributions of BLAS and LAPACK?

The two most popular distributions of BLAS and LAPACK are OpenBLAS and MKL. The table below covers OpenBLAS, MKL, and a few other distributions you’ll likely run into at some point, though the list is still non-exhaustive.

Distribution What it is Where it performs best
MKL Developed and maintained by Intel with a focus on optimizing performance on Intel chips. Most Intel chips, particularly latest-gen.
OpenBLAS Successor to ATLAS developed by the open source community Competitive with or outperforms MKL. and BLIS on AMD chips.
NetLIB Reference implementation of BLAS and LAPACK, developed by NetLIB. This is oftenn what you'll get if you use conda to install NumPy, SciPy, etc. and you don't have either MKL or OpenBLAS libraries installed. Nowhere. This exists only for reference purposes, but it's better than nothing!
ATLAS One of the earliest distributions after the reference implementation from NetLIB. Nowhere. Replaced by OpenBLAS.
BLIS Stands for BLAS-like Library Instantiation Software. It does not include LAPACK, but the same developers alspo distribute a library called libFLAME that is compatible with the LAPACK API. Tends to outperform OpenBLAS and even beats out MKL on some Intel chips, according to the BLIS benchmarks.

What is the best distribution of BLAS and LAPACK?

So which distribution should you use? The answer is that it depends on your hardware. One of the main differences between the various distributions is the amount of processor-specific hand-tuning of algorithms each provides. In general, the precedence NumPy uses is a good starting point:

  • For BLAS:  MKL > BLIS > OpenBLAS > ATLAS > NetLIB
  • For LAPACK:  MKL > OpenBLAS > libFLAME > ATLAS > NetLIB

With that being said, my reading on this topic and perusal of various benchmarks has led me to a slightly different answer. BLIS seems to match or outperform OpenBLAS on most chips and MKL on non-Intel chips (and even on some Intel chips), but can be trickier to install. This leads me to currently prefer using MKL or OpenBLAS in my work. Between MKL and OpenBLAS, MKL typically does better on Intel chips, and OpenBLAS tends to match or outperform MKL on AMD chips. As mentioned above, if you're working on your local dev machine, it's probably Intel. If you're on AWS, you can look up the chip type by instance type.

TL;DR: For most use cases, I would go with MKL if using an Intel chip (most local workloads and some AWS instance types) and OpenBLAS if using an AMD chip (some AWS instance types).

How to install MKL or OpenBLAS

According to Anaconda's docs on MKL, the four libraries that have specific BLAS and LAPACK optimizations are:

In general, it's preferable to use conda to install Python scientific computing libraries. Automatically downloading and linking useful C and Fortran libraries like BLAS and LAPACK is something conda is very good at. It is particularly convenient to be able to install these non-Python libraries and your standard Python libraries with the same package manager.

If you use conda, you'll most likely already have some distribution of BLAS and LAPACK. It will be downloaded and linked when you install any of the Python libraries that rely on them, without you doing anything extra. If you're just installing Python libraries using pip, then NumPy's installation procedure will still look for BLAS and LAPACK distributions installed on your system and try to link to them. If you have multiple distributions available, it will use the precedence detailed here. We can override this by specifying “libblas=*=*<distribution_name>” in the install command, as shown below.

To install these libraries with MKL:

conda install -c conda-forge numpy scipy scikit-learn numexpr "libblas=*=*mkl"

To install these libraries with OpenBLAS:

conda install -c conda-forge numpy scipy scikit-learn numexpr "libblas=*=*openblas"

Of course, you can also update these commands to pin specific versions of the packages, include other packages, etc.

Where linking goes awry

That seemed pretty simple, right? So why is there more to this guide? It turns out that things get tricky when you either:

  1. Don't use conda to install things.
  2. Mix conda and pip to install things.

The former situation is where many software engineers find themselves, while the latter is often the world that data scientists live in. There are many good, and many less good, reasons you might find yourself in these situations. For instance, if no one in your organization is using conda, it might not make sense to blaze new trails and introduce conda into your build scripts and package management infrastructure. Alternatively, conda may be in active use, but the particular packages your code relies on may not be released on any conda channels you have available to you. In other cases, you may be working with a pre-existing setup script that mixes conda and pip for no particular reason.

You’ll have to assess for yourself the best path forward. If possible, switch to using conda only and use the commands above. If not, read on.

Troubleshooting linking issues in MLK and OpenBLAS

The first step in troubleshooting is figuring out what's currently happening. This starts with figuring out which libraries you're actually linking to. This section describes how to do this. Then you'll want to figure out how to avoid landing yourself (or others) in this situation again. That's what the next section covers.

Identifying which libraries you're linked to in MLK and OpenBLAS

To check which hardware libs are linked to your NumPy installation, run this:

    import numpy as np  
np.show_config()
  

You'll see some output that looks something like this:

    blas_mkl_info:  
  NOT AVAILABLE  
blis_info:  
  NOT AVAILABLE  
openblas_info:  
  NOT AVAILABLE  
blas_opt_info:  
  NOT AVAILABLE  
lapack_mkl_info:  
  NOT AVAILABLE  
openblas_lapack_info:  
  NOT AVAILABLE  
lapack_opt_info:  
  NOT AVAILABLE
  

If your output looks exactly like this, you have absolutely nothing linked. That's not a bad starting point. It's likely you used pip to install NumPy and you didn't first install any of the distributions discussed above. So go install the one you want, then uninstall and reinstall NumPy (and the other three libraries mentioned above, if you're using them).

Alternatively, you might see something like this:

    blas_mkl_info:  
  NOT AVAILABLE  
blis_info:  
  NOT AVAILABLE  
openblas_info:  
    libraries = ['openblas', 'openblas']  
    library_dirs = ['/usr/local/lib']  
    language = c  
    define_macros = [('HAVE_CBLAS', None)]  
blas_opt_info:  
    libraries = ['openblas', 'openblas']  
    library_dirs = ['/usr/local/lib']  
    language = c  
    define_macros = [('HAVE_CBLAS', None)]  
lapack_mkl_info:  
  NOT AVAILABLE  
openblas_lapack_info:  
    libraries = ['openblas', 'openblas']  
    library_dirs = ['/usr/local/lib']  
    language = c  
    define_macros = [('HAVE_CBLAS', None)]  
lapack_opt_info:  
    libraries = ['openblas', 'openblas']  
    library_dirs = ['/usr/local/lib']  
    language = c  
    define_macros = [('HAVE_CBLAS', None)]
  

In this case, you have OpenBLAS and not MKL. If you see this on a machine with an AMD chip, you're good to go.

If running on a machine with an Intel chip you'll probably want to install MKL (conda install "libblas=*=*mkl") then uninstall and reinstall your Python libs. On my Mac, I see the following after successfully linking MKL:

    blas_mkl_info:  
    libraries = ['mkl_rt', 'pthread']  
    library_dirs = ['/Users/mack/anaconda3/lib']  
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]  
    include_dirs = ['/Users/mack/anaconda3/include']  
blas_opt_info:  
    libraries = ['mkl_rt', 'pthread']  
    library_dirs = ['/Users/mack/anaconda3/lib']  
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]  
    include_dirs = ['/Users/mack/anaconda3/include']  
lapack_mkl_info:  
    libraries = ['mkl_rt', 'pthread']  
    library_dirs = ['/Users/mack/anaconda3/lib']  
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]  
    include_dirs = ['/Users/mack/anaconda3/include']  
lapack_opt_info:  
    libraries = ['mkl_rt', 'pthread']  
    library_dirs = ['/Users/mack/anaconda3/lib']  
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]  
    include_dirs = ['/Users/mack/anaconda3/include']
  

Note: You don't need to reinstall everything that depends on those four libraries, since they don't need to link anything at install (compile) time and just import these libraries as Python packages at runtime.

It's also worth noting that the output from np.show_config() can be a bit obtuse at times. For instance, you may also see something like this:

    blas_info:  
    libraries = ['cblas', 'blas', 'cblas', 'blas']  
    library_dirs = ['/home/mack/.conda/envs/demo36/lib']  
    include_dirs = ['/home/mack/.conda/envs/demo36/include']  
    language = c  
    define_macros = [('HAVE_CBLAS', None)]  
blas_opt_info:  
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]  
    libraries = ['cblas', 'blas', 'cblas', 'blas']  
    library_dirs = ['/home/mack/.conda/envs/demo36/lib']  
    include_dirs = ['/home/mack/.conda/envs/demo36/include']  
    language = c  
lapack_info:  
    libraries = ['lapack', 'blas', 'lapack', 'blas']  
    library_dirs = ['/home/mack/.conda/envs/demo36/lib']  
    language = f77  
lapack_opt_info:  
    libraries = ['lapack', 'blas', 'lapack', 'blas', 'cblas', 'blas', 'cblas', 'blas']  
    library_dirs = ['/home/mack/.conda/envs/demo36/lib']  
    language = c  
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]  
    include_dirs = ['/home/mack/.conda/envs/demo36/include']
  

This also has BLAS and LAPACK linked, but it's not immediately clear which distributions we have. You can usually clarify this by running conda list | grep "blas\|lapack" 

In the example environment I pulled the output above from, this outputs:

    libblas                   3.9.0               10_openblas    conda-forge  
libcblas                  3.9.0               10_openblas    conda-forge  
liblapack                 3.9.0               10_openblas    conda-forge  
libopenblas               0.3.17          pthreads_h8fe5266_1    conda-forge
  

So in this case, we're still pulling from OpenBLAS, which is good. Here's an example of what you might see instead:

    blas                      1.0                          mkl    conda-main  
libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge  
libcblas                  3.9.0           5_h92ddd45_netlib    conda-forge  
liblapack                 3.9.0           5_h92ddd45_netlib    conda-forge
  

Here we actually have a BLAS from MKL, as well as a BLAS and LAPACK from NetLIB. You might find yourself in a similar situation with other distributions, e.g. if you have both OpenBLAS and NetLIB, or MKL and OpenBLAS, or all three. In these cases, you can assume the linked versions are those listed for libblas, libcblas and liblapack.

So in the case just above, we'd assume NetLIB is linked. If instead, we see the following, we could safely say MKL is linked.

    blas        1.0             mkl    conda-forge  
libblas     3.8.0        21_mkl    conda-forge  
libcblas    3.8.0        21_mkl    conda-forge  
liblapack   3.8.0        21_mkl    conda-forge
  

If you want a lower-level way to make absolutely sure you know which library is linked, the section below is for you.

Identifying linked libraries directly

At the risk of getting a bit esoteric, here's a little section explaining how you can check for sure which libraries are linked. Please feel free to skip this if it's overkill for you.

First, figure out where your NumPy is installed, by running:

    import numpy as np  
print(np.__file__)
  

For me, this outputs: /home/mack/.conda/envs/demo36/lib/python3.6/site-packages/numpy/__init__.py

Now, find the shared object (.so) files in the core and linalg packages in the NumPy distribution at that location. Filter to those without "tests" in the name, by running:

    ls -al /home/mack/.conda/envs/demo36/lib/python3.6/site-packages/numpy/core/ | grep .so | grep -v "tests"  
ls -al /home/mack/.conda/envs/demo36/lib/python3.6/site-packages/numpy/linalg/ | grep .so | grep -v "tests"
  

For me, this outputs:

-rwxrwxr-x. 2 mack mack 3718456 Jul 19 08:11 _multiarray_umath.cpython-36m-x86_64-linux-gnu.so 

and

    -rwxrwxr-x. 2 mack mack 25960 Jul 19 08:11 lapack_lite.cpython-36m-x86_64-linux-gnu.so 
-rwxrwxr-x. 2 mack mack 211376 Jul 19 08:11 _umath_linalg.cpython-36m-x86_64-linux-gnu.so 
  

Now, use the ldd command to see which libraries these are linked to, filtering to "blas" or "lapack".

    ldd /home/mack/.conda/envs/demo36/lib/python3.6/site-packages/numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so | grep "blas\|lapack"  
ldd /home/mack/.conda/envs/demo36/lib/python3.6/site-packages/numpy/linalg/_umath_linalg.cpython-36m-x86_64-linux-gnu.so | grep "blas\|lapack"
  

For me, this outputs:

    libcblas.so.3 => /home/mack/.conda/envs/demo36/lib/python3.6/site-packages/numpy/core/../../../../libcblas.so.3 (0x00007fda31e0d000)  
liblapack.so.3 => /home/mack/.conda/envs/demo36/lib/python3.6/site-packages/numpy/linalg/../../../../liblapack.so.3 (0x00007f4cad370000)
  

So as you can see, in this case, my NumPy installation is linked to the libcblas and liblapack libraries. From the output above, we can see that these are from NetLIB.

Tips for good linking results

To ensure proper linking, you'll want to keep in mind two principles:

  1. Install as much as possible using conda.
  2. If you need to install things using pip, first conda install pip so the two play well together.

For the second principle, you can either do this by including it in the initial conda create command, or like this:

    conda install   
conda install pip  
pip install 
  

If, when you are pip installing things, your properly linked versions of the Python packages get overridden, you should either:

  • Change the initial conda install portion to use the versions compatible with the stuff you'll be pip installing (ideal, to avoid multiple installs).
  • Reinstall those dependencies using conda afterwards, pinning the specific versions you know will be compatible with stuff you already installed (still works, but not ideal; only do this if you want to pin versions other than those pinned by the stuff you're pip installing). 

One of these two things should work; however, there is one problem case to watch out for. Sometimes the conda channels you're using will not provide the same pinned version of a dependency that pip has available on PyPi. If this is the case, you can revert to installing your Python packages with pip, once you've ensured pip has been installed with conda and you've used conda to install the necessary BLAS and LAPACK distributions. If all goes well, the linking should still work.

Fixing a poorly linked environment

Here's one final note, which may be of use if you're trying to modify an existing environment to fix linking issues. Often-times, if you have linking issues it's because:

  • You're not using conda.
  • You're using a mix of conda and pip and you didn't first conda install pip.

If it's the latter, you may already be in a situation where the two package managers are not playing well together. Specifically, they may be consulting and maintaining different indexes of installed packages. If this happens, you may try to uninstall some packages with conda, only to be told it cannot find those packages. If you're sure they're installed, then you actually need to first uninstall with pip, and then reinstall with conda or conda install pip and then reinstall with this new pip. Otherwise you'll end up with two versions installed, which can cause all sorts of headaches.

Conclusion

I hope you’ve found this guide helpful for optimizing your own Python workflows using the hardware optimized linear algebra libraries BLAS and LAPACK. As I mentioned in the introduction, there are a variety of ways in which this guide can provide value. Hopefully you were able to get the benefits of hardware optimization with just a few modifications to your install commands. If you were instead deep in the rabbit hole of troubleshooting linking issues, I hope this has helped you dig yourself out. In my experience, we’ll all find ourselves there at some point. Either way, you may find it useful to bookmark this guide for future reference.

Thanks for reading!


Capital One Tech

Stories and ideas on development from the people who build it at Capital One.

Related Content