Learning how to package your code is very useful for any python developer. It gives you a better understanding of how Python works and, above all, enables you to share your code with others or simply deploy it in a runtime environment.
So how does it work? Why not just share your code via git? Actually, it's more complex than it sounds. There's even a whole working group, the Python Packaging Authority (aka pypa), which has been working on this subject since since 2011 and it's a constantly evolving field.
In this article, we'll take a quick look at packaging in Python and present a simple method for packaging your code in 2023.
TL;DR
- Use the pyproject.toml file following the setuptools guide.
- You can simply use
pip install build && python -m build
to build your package.- Adopt src-layout.
Packages in Python
First, a few definitions:
Python module
A module python un is a text file with the extension .py
containing python code.
It's called a module because it "modularizes" the python source code into several .py
files. We then use python's import functionality to import elements from another module.
For example, if you create a count_lines.py
file containing a count_lines_file
function, you can then import this function:
# count_lines.py
def count_lines_file(filepath: str) -> int:
"""Count the number of lines in a file"""
return sum(1 for _ in open(filepath))
from count_lines import count_lines_file
count_lines_file("count_lines.py")
3
When you execute the instruction from count_lines import count_lines_file
, python looks for the module in the current directory (where python was launched); then in the python installation directory /usr/lib/python3.11
; then in the directory where python installs default packages: /usr/lib/python3.11/site-packages
.
As soon as the module is found, it is executed in the python environment and the elements defined in it become available.
The list of directories in which python searches for modules is provided by the sys
package:
import sys
sys.path
["", "/usr/lib/python3.11/python311.zip", "/usr/lib/python3.11/python311", "/usr/lib/python3.11/site-packages"]
Package python
A package is a folder that groups together a set of python modules and facilitates access to them by creating a namespace: from numpy.linalg import norm
.
To create a package, simply create a __init__.py
file (which can be empty) in a folder.
The folder is then considered by python as a package.
Let's create a package for the count_lines.py
module:
count_package
├── __init__.py
└── count_lines.py
We can now use "pointed" notation to import the count_lines
module:
from count_package.count_lines import count_lines_file
count_lines_file("count_lines.py")
3
When executing the instruction from count_package.count_lines import count_lines_file
, python looks for a count_package
folder containing a __init__.py
file in the sys.path
directories (the current directory and the default directories seen above).
If the package is found, the modules present in it can be accessed using the "pointed" notation.
Distribute your package
To distribute your package, create a distribution. This is an archive containing the package to be distributed, which can then be installed using the pip package manager.
There are two main distribution formats:
- The Source Distribution (sdist) format: this is an archive containing all source code and metadata.
- The Built Distribution format: this is a distribution format in which a number of things have been pre-compiled to facilitate installation on other environments. This is particularly useful for modules written in C / C++.
The wheel
format is the reference Built Distribution format. It is the format developed by the Python Packaging Authority and is widely used to distribute packages.
There are many tools available for creating distributions, but here we'll focus mainly on the tools created by the Python Packaging Authority, which have become indispensable: setuptools
, build
and twine
.
setuptools
setuptools
is the tool used by the vast majority of projects to build their distributions.
Let's take our count_package
example from earlier and see how to create a distribution with setuptools
.
Our project tree might look something like this:
projet_genial
├── count_package
│ ├── __init__.py
│ └── count_lines.py
├── tests
│ └── test_count_lines.py
├── .gitignore
├── LICENSE.md
└── README.md
setuptools
needs a configuration file to know what to include in the distribution and all the project metadata.
Unfortunatly there is three configuration files standards currently used (and you can use them all at the same time...) :
- The
setup.py
file, the oldest and most popular (3.7 million results on github) - The
setup.cfg
file, the second most popular file (0.4 million hits on github) - The
pyproject.toml
file, the latest arrival and now the official standard for all python python packaging tools. (0.2 million hits on github)
pyproject.toml
is the official standard, but is still in the minority compared to setup.py
, so we'll be looking at all three file formats.
setup.py
The setup.py
file, as its extension suggests, is a python file. It has the following form:
# setup.py
from setuptools import setup
setup(
name='count_package',
author='me',
description='Package for counting the number of lines in files.'
version='0.0.1',
python_requires='>=3.7, <4',
install_requires=[
'pandas',
'importlib-metadata; python_version >= "3.8"',
],
)
The very fact that it's a python file is both its strength and its weakness: it's possible to build the configuration dynamically in the code, but this makes it difficult to parse and interface with other external tools.
In addition, since this format is specific to setuptools
, distributions of the sdist type can only be installed if setuptools has been installed on the target environment and in a compatible version.
Sadly, since the vast majority of projects started using setuptools and setup.py
, it became difficult to propose alternatives, so projects like flit had to be built "on top" of setuptools
, so the setup.py
doesn't encourage innovation.
Moreover, its use is often problematic.
To take the example from this article, you might be tempted to introduce a if/else condition in your setup.py
to manage a dependency needed in python 2.7 based on sys.version
, but in doing so you would be introducing a vicious bug: the dependency will be included or not depending on the environment that compiles the distribution and not depending on the environment that is installing it.
It's also tempting to import your own package from setup.py
to manage the version. But by doing so, sdist distributions will crash at installation because the package you are trying to import is not yet present in the python environnment.
In short : please do not use setup.py
anymore
And if you do, use it in a declarative way.
Using the file as a script: python setup.py
is depreciated, as the documentation clearly explained in the setuptools documentation :
It is important to remember, however, that running this file as a script (e.g. python setup.py sdist) is strongly
discouraged, and that the majority of the command line interfaces are (or will be) deprecated (e.g. python setup.py
install, python setup.py bdist_wininst, ...).We also recommend users to expose as much as possible configuration in a more declarative way via the pyproject.toml
or setup.cfg, and keep the setup.py minimal with only the dynamic parts (or even omit it completely if applicable).See Why you shouldn't invoke setup.py directly
for more background.
setup.cfg
To address the issues mentioned above and make configuration more declarative, in 2016 pypa created the setup.cfg
file format.
The example setup.py
file above is equivalent to the following setup.cfg
file:
# setup.cfg
[metadata]
name = count_package
version = 0.0.1
author = me
description = Package for counting the number of lines in files.
[options]
python_requires = >=3.7,<4
install_requires =
pandas
importlib-metadata; python_version >= "3.8"
This format has had many fans but has recently been superseded by the pyproject.toml
format, which is now the official way of declaring python package configuration.
pyproject.toml
In addition to adopting the declarative approach of setup.cfg
, the pyproject.toml
format introduces a number of new features.
It is now possible (and even mandatory) to specify the package builder.
It is also a means of centralizing the configuration of numerous development tools in an agnostic way, rather than multiplying configuration files such as tox.ini
, .coveragerc
, etc.
The pyproject.toml
format includes a mandatory section to define the builder to be used to build the package :
# pyproject.toml
[build-system]
requires = [
"setuptools>=60",
"wheel>=0.30.0",
"cython>=0.29.4",
]
build-backend = "setuptools.build_meta"
With pyproject.toml
it is now possible to declare to pip
the dependencies needed for the build!
It is then perfectly possible to specify the use of a builder other than setuptools, such as flit :
# pyproject.toml
[build-system]
requires = ["flit"]
build-backend = "flit.api:main"
This format is gradually becoming the preferred way of centralizing package configuration.
It is the format preferred by setuptools and a number of third-party tools use it to store their configuration: black, pytest, isort, etc.
Here's a sample pyproject.toml
file for our count_package
example package:
# pyproject.toml
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[project]
name = "count_package"
version = "0.0.1"
description = "Package for counting the number of lines in files."
name = "my_package"
authors = [
{name = "me", email = "email@me.fr"},
]
requires-python = ">=3.8,<4"
dependencies = [
"pandas",
'importlib-metadata; python_version >= "3.8"',
]
dynamic = ["version"]
The setuptools documentation gives more details on configuration in pyproject.toml
format.
Build a distribution of your package
Once you've written your configuration file (preferably pyproject.toml
), all that's left to do is build your package.
The modern way to do this is to use the build
package developed by pypa:
pip install --upgrade build
python -m build
build
will first install the builder specified in your pyproject.toml
file, then use it to build an sdist distribution and a wheel.
You can then use the twine
package to publish it on the official pypi repository.
For backward compatibility with older versions of packaging libraries, you can create a minimal
setup.py
file in addition to thepyproject.toml
file:# setup.py from setuptools import setup setup()
Version management
In the pyproject.toml
example seen above, the package version is set manually. You therefore need to change it each time you want to publish a new version of your package.
I find the tool setuptools-scm
very useful for managing package versions using git or mercurial.
This is done very simply by adding the pyproject.toml
dependency and specifying that the version is dynamic:
# pyproject.toml
[build-system]
requires = ["setuptools>=45", "setuptools_scm[toml]>=6.2"]
[project]
# version = "0.0.1" # Remove any existing version parameter.
dynamic = ["version"]
[tool.setuptools_scm]
write_to = "src/pkg/_version.py"
When building a package, setuptools-scm
will search for the last tag with a valid version number and then deduce the package version number. By default, the version is built from three elements:
- The last tag with a valid version number (example:
v1.2.3
) - The distance to this tag (number of revisions since this tag)
- Working directory status (if there are any uncommitted changes)
Once the version number has been deduced, a _version.py
file will be created inside the distribution at the specified location (e.g. src/pkg/_version.py
), allowing the package version to be known from the distribution without the git history being present on the target environment.
If you are in the habit of entering the version number of your packages yourself, please note that valid version formats are governed by PEP 440.
If you don't comply with these specifications, you're likely to run into problems when publishing or installing your packages.
In particular, versionsv1.2.3-local
orv1.2.3-dev
are invalid.
The layout
When configuring your package, regardless of the method used (setup.py
, setup.cfg
or pyproject.toml
), you must specify the packages and subpackages you wish to include in your distribution:
# pyproject.toml
[tool.setuptools]
packages = ["mypkg", "mypkg.subpkg1", "mypkg.subpkg2"]
Fortunately, setuptools
has an automatic discovery feature for your packages and subpackages. This is compatible with two classic project layouts:
flat-layout:
count_package
├── count_package
│ ├── __init__.py
│ └── count_lines.py
├── tests
│ └── test_count_lines.py
├── .gitignore
├── LICENSE.md
└── README.md
and layout with a src-layout folder:
projet_genial
├── src
| ├── count_package
│ ├── __init__.py
│ └── count_lines.py
├── tests
│ └── test_count_lines.py
├── .gitignore
├── LICENSE.md
└── README.md
The difference may seem minimal, but personally I have a strong preference for src-layout because it prevents bad habits and forces you to understand how the package import and installation system
system works in Python.
In fact, when you develop your package, you test the functionalities you add to it as you go along.
To do this, we are very tempted to simply import our package from our :
# test_file.py
from count_package import count_lines
This will work if you use a flat-layout and run your test module from the directory containing the count_package
folder, because as we saw above python includes the current directory in the list of directories where it searches for modules.
However, this is a bad habit for two reasons:
- Firstly, if you use setup.py, which is located at the root of your project, it is able to import the
count_package
package it is supposed to install on a client in sdist mode, which can cause bugs if you're not careful. - Secondly, you're not really testing the package as it will be installed on others! For example, you may not have thought to include data files in your configuration file, and your tests should crash as a result. But since these files are present in your working directory, you won't notice a thing.
For these reasons, I think it's best to opt for a src-layout and use an editable installation with the command: pip install -e .
for local development. This allows you to install the package by making a symbolic link with your code, so that any changes you make are immediately reflected in the installed package.
For an in-depth analysis of the benefits of src-layout, I refer you to this article (which dates back to 2014).
Going further
If you want to learn more about the packaging eco-system I found the resources listed in this article very usefull : 🐍 Best resources on Python packaging 📖
In the meantime here is a non-exhaustive list of alternatives to setuptools you should consider :
-
Pipenv: allows you to jointly manage your project's virtual environment and its dependencies.
dependencies. Adds a valuable feature: generation of
Pipfile.lock
files, which reference reference exact versions of dependencies to enable identical reproduction of the development environment. development environment. -
Poetry: a powerful tool for managing virtual environments
environments, dependencies (and dependencies on dependencies), generate a
poetry.lock
file similar toPipfile.lock
, publish your package, etc. However, it does not comply with certain PEP standards. - PDM: next-generation package manager for python. Unlike PerformanceEntry, it respects PEP standards.
- Hatch: Pypa's new tool for managing python projects. It has many interesting features.
- uv: pip written in rust to make it 10 to 100 times faster.
Bonus
- Try a next-gen manager Hatch, PDM or Poetry.
- I'm sharing my simple template
cookie-cutter
for your python projects.
Let's be happy
Everything I've told you here may seem like a lot, and yet I've only skimmed the surface. In any case, I think we can count ourselves lucky when we see what the Sam & Max site wrote in 2018:
First we have
distutils
,setuptools
,distribute
, anddistribute2
which were all at one time the "standards recommended for packaging a lib.
Then came the days ofeggs
,exe
, and other stuff thateasy_install
would go and find anywhere in the wild, blindly following links on PyPi.
Not to mention the stuff that had to be compile at every turn.
Besides, nothing was encrypted when downloaded, andpip
wasn't packaged with Python.
Python, it was dying on stupid errors like badly managed encoding...
On top of that,virtualenv
was a separate thing, with lots of competitors, and linked system packages by default.
Not to mention that we didn't havepython -m
.In short, Python packaging was a real mess. Not to mention a shitty documentation.
It's a lot simpler these days.
References
- pypa's guide to python packaging: An Overview of Packaging for Python
- An article on wheels: What Are Python Wheels and Why Should You Care?
- A series of three very enlightening articles on how Python packaging works, written by Bernát Gábor in
2019 :
- The state of Python Packaging
- Python Packaging - Past, Present, Future](https://bernat.tech/posts/pep-517-518/)
- Python packaging - Growing Pains
- An article in praise of
setup.cfg
on the late Sam & Max site: about setup.cfg - Stackoverflow question: What is pyproject.toml file for
Top comments (0)