I was recently tasked to to turn our simple Python project (let’s say a very simple REST API with 1 or 2 endpoints) into a private package, installable using pip or poetry. What sounds simple and quick at first turned out to be a real challenge.
There are many resources online, but some are not up to date, and others do not use best practices. So today, I’ll walk you through what I did to convert a Python project into a Python package.
The story
At Datalynx we initially started with a 2 level API to achieve separation of concerns. The main API ‘backend’ that serves the frontend directly and a secondary API ‘ml_backend’ that serves the main backend application.
That also required us to spin 2 ECS services (more 💰 at the end of the month 😔) to serve a single web interface! With time we realized that ‘ml_backend’ was not as computationally intensive as expected and it would be nice to be able to import the classes, that were once called by hitting an API, directly in ‘backend’.
Prepare code for transition
- Remove environment dependencies
Our application relied upon a certain number of API keys that were stored into .env
files. We addressed that by by requiring all classes that used these variables to have additional parameters in the constructor.
Users (projects that use the library) are now required to pass these parameters.
From:
import os
from dotenv import load_dotenv
class User:
def __init__(self):
load_dotenv()
self.api_key = os.getenv("API_KEY")
To:
class User:
def __init__(self, api_key):
self.api_key = api_key
At the end of this process you should be able to use classes without having any .ini
or .env
files. (make sure you test for that!)
- Rename project and module names to avoid confusion
PEP8 defines a standard for how to name packages and modules:
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
We had to rename ‘ml_backend’ to ‘mlbackend’, rename/delete redundant folders to achieve this. I used this opportunity to rename classes and variable names as well (like Sunday cleaning).
Create pyproject.toml metadata file
The pyproject.toml is a config file that contains metadata and some instructions about your package. This file is used by a build backend (like setuptools or hatch) that builds your package and create associated distribution files. Those files will be uploaded to your package repository (in our case AWS CodeArtifact).
The build frontend (like pip or poetry) will be responsible of downloading the package and and managing its installation in the user's environment.
For our case this file will include the project info, a version, and the dependencies. Nothing else.
Here is a sample pyproject.toml that works for this simple project:
[project]
name = "mlbackend"
version = "1.0.0"
dependencies = [
"boto3>=1.16.0",
"numpy>=1.21.5",
"pydantic>=2.5.2",
"pytest>=7.4.3",
"requests>=2.31.0",
"websockets>=11.0.3"
]
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
Here is list of other config info you can add.
You are now ready to build your project. Make sure you install build
python3 -m pip install build
Then build the package
python3 -m build --sdist
You will notice that it creates a /dist folder and inside of it there’s your .tar.gz package 🎉
Upload package to AWS CodeArtifact
We now want to upload our package into a private repository. At Datalynx we use AWS for basically everything so we’ll stick to use AWS CodeArtifact to create a private repository and upload our package to it.
After creating your private repository, AWS provides a command to authenticate with the repository from your local machine that would look like this
aws codeartifact login --tool twine --repository [your-repository] --domain [your-domain] --domain-owner [aws-account-id] --region [your-region]
Once authentication is done you can go ahead and upload your package. There’s a few tools to achieve this but I’ll stick to the most popular one. twine
Install twine
python3 -m pip install twine
Then upload your package
twine upload --repository codeartifact dist/*
And that’s it!
Next up would be to build a pipeline to automate this process on your preferred trigger (new PR created, push to a branch..)
Top comments (0)