Pip vs Conda

Jun 28, 2022

Pip: Python libraries only

For example, let’s say you want to install Python 3.9 with NumPy, Pandas, and the gnuplot rendering tool, a tool that is unrelated to Python. Here’s what the pip requirements.txt would look like:

numpy
pandas

Installing Python and gnuplot is out of scope for pip. You as a user must deal with this yourself. You might, for example, do so with a Docker image:

FROM ubuntu:20.04
RUN apt-get update && apt-get install -y gnuplot python3.9
COPY requirements.txt .
RUN pip install -r requirements.txt

Both the Python interpreter and gnuplot need to come from system packages, in this case Ubuntu’s packages.

Conda: Any dependency can be a Conda package (almost)

With Conda, Python and gnuplot are just more Conda packages, no different than NumPy or Pandas. The environment.yml that corresponds (somewhat) to the requirements.txt we saw above will include all of these packages:

name: myenv
channels:
  - conda-forge
dependencies:
  - python=3.9
  - numpy
  - pandas
  - gnuplot

Conda only relies on the operating system for basic facilities, like the standard C library. Everything above that is Conda packages, not system packages.

We can see the difference if the corresponding Dockerfile; there is no need to install any system packages:

FROM continuumio/miniconda3
COPY environment.yml .
RUN conda env create

This base image ships with Conda pre-installed, but we’re not relying on any existing Python install, we’re installing a new one in the new environment.

Why Conda packages everything

Why did Conda make the decision to package everything, Python interpreter included? How does this benefit you? In part it’s about portability and reproducibility.

Portability across operating systems: Instead of installing Python in three different ways on Linux, macOS, and Windows, you can use the same environment.yml on all three.
Reproducibility: It’s possible to pin almost the whole stack, from the Python interpreter upwards.
Consistent configuration: You don’t need to install system packages and Python packages in two different ways; (almost) everything can go in one file, the environment.yml.

But it also addresses another problem: how to deal with Python libraries that require compiled code. That’s a big enough topic that it gets a whole new section, next.

Beyond pure Python: Packaging compiled extensions

In the early days of Python packaging, a package included just the source code that needed to be installed. For pure Python packages, this worked fine, and still does. But what happens when you need to compile some Rust or C or C++ or Fortran code as part of building the package?

Solution #1: Compile it yourself

The original solution was to have each user compile the code themselves at install time. This can be quite slow, wastes resources, is often painful to configure, and still doesn’t solve a big part of the problem: shared library dependencies.

The Pillow image graphics library, for example, relies on third party shared libraries like libpng and libjpeg. In order to compile Pillow yourself, you have to install all of them, plus their development headers. On Linux or macOS you can install the system packages or the Homebrew packages; for Windows this can be more difficult. But you’re going to have to write different configuration for every single OS and even Linux distribution.

Solution #2: Pip wheels

The way pip solves this problem is with packages called “wheels” that can include compiled code. In order to deal with shared library dependencies like libpng, any shared library external dependencies get bundled inside the wheel itself.

For example, let’s look at a Pillow wheel for Linux; a wheel is just a ZIP file so we can use standard ZIP tools:

$ zipinfo Pillow.whl
...
Pillow.libs/libpng16-213e245f.so.16.37.0
Pillow.libs/libjpeg-183418da.so.9.4.0
...
PIL/FpxImagePlugin.py
PIL/PalmImagePlugin.py
...
PIL/_imagingcms.cpython-39-x86_64-linux-gnu.so
...

The wheel includes both Python code, a compiled Python extension, and third-party shared libraries like libpng and libjpeg. This can sometimes make packages larger, as multiple copies of third-party shared libraries may be installed, one per wheel.

Solution #3: Conda packages

Conda packages take a different approach to third-party shared libraries. libjpeg and libpng are packaged as additional Conda packages:

$ conda install -c conda-forge pillow
...
The following NEW packages will be INSTALLED:

...
  jpeg               conda-forge/linux-64::jpeg-9d-h36c2ea0_0
...
  libpng             conda-forge/linux-64::libpng-1.6.37-h21135ba_2
...
  pillow             conda-forge/linux-64::pillow-7.2.0-py38h9776b28_2
  zstd               conda-forge/linux-64::zstd-1.5.0-ha95c52a_0
...

Those installed libjpeg and libpng can then be depended on by other installed packages. They’re not wheel-specific, they’re available to any package in the Conda environment.

Conda can do this because it’s not a packaging system only for Python code; it can just as easily package shared libraries or executables.

Summary: pip vs Conda

	pip	Conda
Installs Python	No	Yes, as package
3rd-party shared libraries	Inside the wheel	Yes, as package
Executables and tools	No	Yes, as package
Python source code	Yes, as package	Yes, as package

PyPI vs. Conda-Forge

Another fundamental difference between pip and Conda is less about the tools themselves, and more about the package repositories they rely on and how they work. In particular, most Python programs will rely on open source libraries, and these need to be downloaded from somewhere. For these, pip relies on PyPI, whereas Conda supports multiple different “channels” hosted on Anaconda.

The default Conda channel is maintained by Anaconda Inc, the company that created Conda. It tends to have limited package selection and be somewhat less up-to-date, with some potential benefits regarding stability and GPU support. Beyond that I don’t know that much about it.

But there’s also the Conda-Forge community channel, which packages far more packages, tends to be up-to-date, and is where you probably want to get your Conda packages most of the time. You can mix packages from the default channel and Conda-Forge, if you want the default channel’s GPU packages.

You need to login in order to like this post: click here