For example, let’s say you want to install Python 3.9 with NumPy, Pandas, and the gnuplot rendering tool, a tool that is unrelated to Python. Here’s what the pip requirements.txt
would look like:
numpy
pandas
Installing Python and gnuplot is out of scope for pip. You as a user must deal with this yourself. You might, for example, do so with a Docker image:
FROM ubuntu:20.04
RUN apt-get update && apt-get install -y gnuplot python3.9
COPY requirements.txt .
RUN pip install -r requirements.txt
Both the Python interpreter and gnuplot need to come from system packages, in this case Ubuntu’s packages.
With Conda, Python and gnuplot are just more Conda packages, no different than NumPy or Pandas. The environment.yml
that corresponds (somewhat) to the requirements.txt
we saw above will include all of these packages:
name: myenv
channels:
- conda-forge
dependencies:
- python=3.9
- numpy
- pandas
- gnuplot
Conda only relies on the operating system for basic facilities, like the standard C library. Everything above that is Conda packages, not system packages.
We can see the difference if the corresponding Dockerfile
; there is no need to install any system packages:
FROM continuumio/miniconda3
COPY environment.yml .
RUN conda env create
This base image ships with Conda pre-installed, but we’re not relying on any existing Python install, we’re installing a new one in the new environment.
Why did Conda make the decision to package everything, Python interpreter included? How does this benefit you? In part it’s about portability and reproducibility.
environment.yml
on all three.environment.yml
.But it also addresses another problem: how to deal with Python libraries that require compiled code. That’s a big enough topic that it gets a whole new section, next.
In the early days of Python packaging, a package included just the source code that needed to be installed. For pure Python packages, this worked fine, and still does. But what happens when you need to compile some Rust or C or C++ or Fortran code as part of building the package?
The original solution was to have each user compile the code themselves at install time. This can be quite slow, wastes resources, is often painful to configure, and still doesn’t solve a big part of the problem: shared library dependencies.
The Pillow image graphics library, for example, relies on third party shared libraries like libpng
and libjpeg
. In order to compile Pillow yourself, you have to install all of them, plus their development headers. On Linux or macOS you can install the system packages or the Homebrew packages; for Windows this can be more difficult. But you’re going to have to write different configuration for every single OS and even Linux distribution.
The way pip solves this problem is with packages called “wheels” that can include compiled code. In order to deal with shared library dependencies like libpng
, any shared library external dependencies get bundled inside the wheel itself.
For example, let’s look at a Pillow wheel for Linux; a wheel is just a ZIP file so we can use standard ZIP tools:
$ zipinfo Pillow.whl
...
Pillow.libs/libpng16-213e245f.so.16.37.0
Pillow.libs/libjpeg-183418da.so.9.4.0
...
PIL/FpxImagePlugin.py
PIL/PalmImagePlugin.py
...
PIL/_imagingcms.cpython-39-x86_64-linux-gnu.so
...
The wheel includes both Python code, a compiled Python extension, and third-party shared libraries like libpng
and libjpeg
. This can sometimes make packages larger, as multiple copies of third-party shared libraries may be installed, one per wheel.
Conda packages take a different approach to third-party shared libraries. libjpeg
and libpng
are packaged as additional Conda packages:
$ conda install -c conda-forge pillow
...
The following NEW packages will be INSTALLED:
...
jpeg conda-forge/linux-64::jpeg-9d-h36c2ea0_0
...
libpng conda-forge/linux-64::libpng-1.6.37-h21135ba_2
...
pillow conda-forge/linux-64::pillow-7.2.0-py38h9776b28_2
zstd conda-forge/linux-64::zstd-1.5.0-ha95c52a_0
...
Those installed libjpeg
and libpng
can then be depended on by other installed packages. They’re not wheel-specific, they’re available to any package in the Conda environment.
Conda can do this because it’s not a packaging system only for Python code; it can just as easily package shared libraries or executables.
pip | Conda | |
---|---|---|
Installs Python | No | Yes, as package |
3rd-party shared libraries | Inside the wheel | Yes, as package |
Executables and tools | No | Yes, as package |
Python source code | Yes, as package | Yes, as package |
Another fundamental difference between pip and Conda is less about the tools themselves, and more about the package repositories they rely on and how they work. In particular, most Python programs will rely on open source libraries, and these need to be downloaded from somewhere. For these, pip relies on PyPI, whereas Conda supports multiple different “channels” hosted on Anaconda.
The default Conda channel is maintained by Anaconda Inc, the company that created Conda. It tends to have limited package selection and be somewhat less up-to-date, with some potential benefits regarding stability and GPU support. Beyond that I don’t know that much about it.
But there’s also the Conda-Forge community channel, which packages far more packages, tends to be up-to-date, and is where you probably want to get your Conda packages most of the time. You can mix packages from the default channel and Conda-Forge, if you want the default channel’s GPU packages.
You need to login in order to like this post: click here
YOU MIGHT ALSO LIKE