PYTHONPATH VariableOfficial Python Packaging Guide
A module in Python is a single file (with a .py extension) that contains Python code. It typically consists of classes, functions, and variables that can be used by other Python code. Modules are used to organize code into logical units and facilitate code reusability.
For example, consider a module named math_operations.py that contains functions to perform mathematical operations like addition, subtraction, multiplication, etc. You can import and use these functions in other Python scripts.
# math_operations.py
def add(x, y):
return x + y
def subtract(x, y):
return x - yA package in Python is a collection of modules grouped together in a directory. A package is typically represented by a directory containing an __init__.py file (which can be empty) and one or more Python modules. The __init__.py file indicates to Python that the directory should be treated as a package.
For example, consider a package named my_package:
my_package/
├── __init__.py
├── module1.py
└── module2.py
Here, my_package is a package containing module1.py and module2.py, which can be imported using dot notation (my_package.module1, my_package.module2).
A sub-package in Python is a package nested within another package. This means that a package can contain other packages as well as modules. Sub-packages are created by organizing directories and adding __init__.py files appropriately to define the package structure.
For example:
my_parent_package/
├── __init__.py
└── my_sub_package/
├── __init__.py
├── module3.py
└── module4.py
In this structure, my_sub_package is a sub-package of my_parent_package, and it can contain its own modules (module3.py, module4.py). The sub-package can be imported using dot notation (my_parent_package.my_sub_package.module3).
A distribution package (or simply distribution) in Python refers to a packaged collection of Python code and resources that is made available for installation. It typically includes modules, packages, data files, configuration files, and other resources needed for a specific purpose (e.g., a library or application).
Distribution packages are often distributed and installed using Python package managers such as pip and can be uploaded to package repositories like PyPI (Python Package Index) for easy distribution and installation by other developers.
For example, popular distribution packages include numpy, fast-api, pandas, etc., which are installed using pip and provide functionalities that can be used in Python projects.
sdist formatsetup.pypython setup.py build sdist to build the source distributionpip install ./dist/<package_name>.tar.gz to install the packagepip list to see if the package is installed.If changes are made in the package then use pip install . which will build and install the latest package on the fly.
Or simply use editable so that you don't always have to rebuild the package evrerytime new changes are made:
pip install --editable .sdist is short for source distribution, a .tar file containing our code called an "sdist". What that means is that the distribution package only contains a subset of our source code.
A "source distribution" is essentially a zipped folder containing our source code.
wheel formatSource Distribution (sdist):
*.py), configuration files, and other project-related assets.python setup.py sdist.Wheels (bdist_wheel):
sdist..whl extension.python setup.py bdist_wheel.Faster Installation:
*.whl) is typically faster than installing from source distributions (*.tar.gz).Ease of Use for Users:
sdist and bdist_wheelsdist and bdist_wheel for Python packages to accommodate different use cases and platforms.
python setup.py sdist bdist_wheelCompilation Requirements:
Unfortunately, building wheels for all operating systems gets difficult if you have a compilation step required, so some OSS maintainers only build wheels for a single OS.
pip actually executes the setup.py (or equivalent files) on your machine, right after downloading the sdist.Building the wheel might require compiling code
gcc if the source code is in C, but other languages require their own compilers. The user must install these on their on machine or the pip install ... will simply fail. This can happen when you install numpy, pandas, scipy, pytorch, tensorflow, etc.setup.py may contain arbitrary code. This is highly insecure. A setup.py might contain malicious code.
Dependency Management:
gcc, etc.) to successfully install packages that require compilation.Security Concerns:
setup.py for package installation can be insecure as it executes arbitrary code.wheel fileA wheel filename is broken down into parts separated by hyphens:
{dist}-{version}(-{build})?-{python}-{abi}-{platform}.whlEach section in {brackets} is a tag, or a component of the wheel name that carries some meaning about what the wheel contains and where the wheel will or will not work.
For Example: dist/packaging-0.0.0-py3-none-any.whl
packaging is the package name0.0.0 is the verison numberpy3 denotes it's build for Python3abi tag. ABI stands for application binary interface.any stands for that this package is build to work on any platform.Other Examples:
cryptography-2.9.2-cp35-abi3-macosx_10_9_x86_64.whlchardet-3.0.4-py2.py3-none-any.whlPyYAML-5.3.1-cp38-cp38-win_amd64.whlnumpy-1.18.4-cp38-cp38-win32.whlscipy-1.4.1-cp36-cp36m-macosx_10_6_intel.whlThe reason for this is, the wheel file contains the pre-complied binary code that helps to install the package fast.
Package Maintenance:
sdist and bdist_wheel distributions to maximize compatibility and ease of use for users.Consider User Experience:
build CLI tool and pyproject.toml"Build dependencies" are anything that must be installed on your system in order to build your distribution package into an sdist or wheel.
For example, we needed to pip install wheel in order to run python setup.py bdist_wheel so wheel is a build dependency for building wheels.
setup.py files can get complex.
You may need to pip install ... external libraries and import them into your setup.py file to accommodate complex build processes.
The lecture shows pytorch and airflow as examples of packages with complex setup.py files.
Somehow you need to be able to document build dependencies outside of setup.py.
If they were documented in the setup.py file… you would not be able to execute the setup.py file to read the documented dependencies (like if they were specified in an list somewhere in the file).
This is the original problem pyproject.toml was meant to solve.
# pyproject.toml
[build-system]
# Minimum requirements for the build system to execute.
requires = ["setuptools>=62.0.0", "wheel"]pyproject.toml sits adjacent to setup.py in the file tree
The build CLI tool (pip install build) is a special project by the Python Packaging Authority (PyPA) which
[build-system] table in the pyproject.toml,pip install build
# both setup.py and pypproject.toml should be together, ideally in the root directory
# python -m build --sdist --wheel path/to/dir/with/setup.py/and/pyproject.toml
python -m build --sdist --wheel .setup.py to setup.cfg config fileMoving from
# setup.py
from pathlib import Path
from setuptools import find_packages, setup
import wheel
# Function to read the contents of README.md
def read_file(filename: str) -> str:
filepath = Path(__file__).resolve().parent / filename
with open(filepath, encoding="utf-8") as file:
return file.read()
setup(
name="packaging-demo",
version="0.0.0",
packages=find_packages(),
# package meta-data
author="Amit Vikram Raj",
author_email="[email protected]",
description="Demo for Python Packaging",
license="MIT",
# Set the long description from README.md
long_description=read_file("README.md"),
long_description_content_type="text/markdown",
# install requires: libraries that are needed for the package to work
install_requires=[
"numpy", # our package depends on numpy
],
# setup requires: the libraries that are needed to setup/build
# the package distribution
# setup_requires=[
# "wheel", # to build the binary distribution we need wheel package
# ],
)TO
# setup.py
from setuptools import setup
# Now setup.py takes it's configurations from setup.cfg file
setup()# setup.cfg
[metadata]
name = packaging-demo
version = attr: packaging_demo.VERSION
author = Amit Vikram Raj
author_email = [email protected]
description = Demo for Python Packaging
long_description = file: README.md
keywords = one, two
license = MIT
classifiers =
Framework :: Django
Programming Language :: Python :: 3
[options]
zip_safe = False
include_package_data = True
# same as find_packages() in setup()
packages = find:
python_requires = >=3.8
install_requires =
numpy
importlib-metadata; python_version<"3.10"ALSO addtional setting is passed to
pyproject.tomlfile. Here we have specifiedbuild-systemsimilar tosetup_requiresinsetup.py
# pyproject.toml
[build-system]
# Minimum requirements for the build system to execute
requires = ["setuptools", "wheel", "numpy<1.24.3"]
# Adding ruff.toml to pyproject.toml
[tool.ruff]
line-length = 99
[tool.ruff.lint]
# 1. Enable flake8-bugbear (`B`) rules, in addition to the defaults.
select = ["E", "F", "B", "ERA"]
# 2. Avoid enforcing line-length violations (`E501`)
ignore = ["E501"]
# 3. Avoid trying to fix flake8-bugbear (`B`) violations.
unfixable = ["B"]
# 4. Ignore `E402` (import violations) in all `__init__.py` files, and in select subdirectories.
[tool.ruff.lint.per-file-ignores]
"__init__.py" = ["E402"]
"**/{tests,docs,tools}/*" = ["E402"]
# copying isort configurations from .isort.cfg to pyproject.toml
[tool.isort]
profile = "black"
multi_line_output = "VERTICAL_HANGING_INDENT"
force_grid_wrap = 2
line_length = 99
# copying balck config from .black.toml to pyproject.toml
[tool.black]
line-length = 99
exclude = ".venv"
# copying flake8 config from .flake8 to pyproject.toml
[tool.flake8]
docstring-convention = "all"
extend-ignore = ["D107", "D212", "E501", "W503", "W605", "D203", "D100",
"E305", "E701", "DAR101", "DAR201"]
exclude = [".venv"]
max-line-length = 99
# radon
radon-max-cc = 10
# copying pylint config from .pylintrc to pyproject.toml
[tool.pylint."messages control"]
disable = [
"line-too-long",
"trailing-whitespace",
"missing-function-docstring",
"consider-using-f-string",
"import-error",
"too-few-public-methods",
"redefined-outer-name",
]We have been treating setup.py as a glorified config file, not really taking advantage of the fact that it is a Python file by adding logic to it.
This is more common than not. Also, there has been a general shift away from using Python for config files because doing so adds complexity to using the config files (like having to install libraries in order to execute the config file).
setup.cfg is a companion file to setup.py that allows us to define our package configuration in a static text file—specifically an INI format file.
Any values that we do not directly pass as arguments to setup() will be looked for by the setup() invocation in a setup.cfg file, which is meant to sit adjacent to setup.py in the file tree if used.
Now we are accumulating a lot of files!
setup.pysetup.cfgpyproject.tomlREADME.md.pylintrc, .flake8, .blackrc, ruff.toml, .mypy, pre-commit-config.yaml, etc.CHANGELOG or CHANGELOG.mdVERSION or version.txtIt turns out that nearly all of these files can be replaced with pyproject.toml . Nearly every linting / code quality tool supports parsing a section called [tool.<name>] e.g. [tool.black] section of pyproject.toml to read its configuration!
The docs of each individual tool should tell you how to accomplish this.
Above shown is a pyproject.toml with configurations for many of the linting tools we have used in the course.
Can
setup.cfgandsetup.pybe replaced as well?
setup.cfg to pyproject.tomlFROM
setup.cfg
# setup.cfg
[metadata]
name = packaging-demo
version = attr: packaging_demo.VERSION
author = Amit Vikram Raj
author_email = [email protected]
description = Demo for Python Packaging
long_description = file: README.md
keywords = one, two
license = MIT
classifiers =
Programming Language :: Python :: 3
[options]
zip_safe = False
include_package_data = True
# same as find_packages() in setup()
packages = find:
python_requires = >=3.8
install_requires =
numpy
importlib-metadata; python_version<"3.10"TO
# pyproject.toml
[build-system]
# Minimum requirements for the build system to execute
requires = ["setuptools>=61.0.0", "wheel"]
# Adding these from setup.cfg in pyproject.toml file
[project]
name = "packaging-demo"
authors = [{ name = "Amit Vikram Raj", email = "[email protected]" }]
description = "Demo for Python Packaging"
readme = "README.md"
requires-python = ">=3.8"
keywords = ["one", "two"]
license = { text = "MIT" }
classifiers = ["Programming Language :: Python :: 3"]
dependencies = ["numpy", 'importlib-metadata; python_version<"3.10"']
dynamic = ["version"]
# version = "0.0.3"
# https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html#dynamic-metadata
[tool.setuptools.dynamic]
# every while making changes in package, you can change the verison in one of these files
# version = {attr = "packaging_demo.VERSION"} # version read by 'packaging_demo/__init__.py' file
version = {file = ["version.txt"]} # version read by 'version.txt' file in root folder
python -m build --sdist --wheel .- Runs perfectly, we got rid of another config file (setup.cfg)
setup.py with build-backendPEP 517 added a build-backend argument to pyproject.toml like so:
[build-system]
# Defined by PEP 518:
requires = ["flit"]
# Defined by this PEP:
build-backend = "flit.api:main"# The above toml config is equivalent to
import flit.api
backend = flit.api.mainThe build-backend defines an entrypoint (executable Python module in this case) that the build CLI uses to actually do the work of parsing pyproject.toml and building the wheel and sdist.
This means you could implement your own build backend today by writing a program that does that, and you could use it by adding your package to requires = [...] and specifying the entrypoint in build-backend = ....
If you do not specify a build-backend in pyproject.toml, setuptools is assumed and package will get bulit prefectly fine.
setup.py and run python -m build --sdist --wheel . it runs perfectly without it because the default value of build-system is set as build-backend = "setuptools.build_meta" in build CLI which builds our package.But you can still explicitly declare setuptools as your build backend like this
# pyproject.toml
...
[build-system]
requires = ["setuptools>=61.0.0", "wheel"]
build-backend = "setuptools.build_meta"
...Each build backend typically extends the pyproject.toml file with its own configuration options. For example,
# pyproject.toml
...
[tool.setuptools.package-data]
package_demo = ["*.json"]
[tool.setuptools.dynamic]
version = {file = "version.txt"}
long_description = {file = "README.md"}
...If you choose to use setuptools in your project, you can add these sections to pyproject.toml. You can read more about this in the setuptools documentation
It's often beneficial to include non-python files like data files or binary files inside of your package because oftentimes your Python code relies on these non-python files.
And then we saw that if we're going to include those files, we need to get those files to end up inside of our package folder because it's our package folder that ends up inside of our users virtual environments upon installation.
We also saw that by default, all non-python files don't make it into that final virtual environment folder. That is, don't actually make it into our package folder during build procedure.
So, how do we make sure that these files end up in our wheel/dist build of our package? For example here we demo with cities.json file that we want in our package as it's used by states_info.py file.
Official
setuptoolsDocs for Data Support
# pyprject.toml
[tool.setuptools]
# ...
# By default, include-package-data is true in pyproject.toml, so you do
# NOT have to specify this line.
include-package-data = trueSo, setuptools by default has this include-package-data value set to true as shown in the official docs but we need to create an extra file MANIFEST.in and specify the data which we want to inculde in our package present at root dir.
IMP: It's import all the folders in the package directory should have
__init__.pyfile inculing the data directory which we want to include because thefind_packages()recusrive process that setuptools does will not go into fo;ders that have__init__.pyfile in it.
# MANIFEST.in
include packaging_demo/*.json
include packaging_demo/my_folder/*.json
OR
Recursive include all json files in the package directory
recursive-include packaging_demo/ *.json
Docs on configuring MANIFEST.in file
MANIFEST.in fileFrom setuptools docs we can add this in our pyproject.toml file:
# this is by default true so no need to explicitly add it
# but as mentioned in the docs, it is false for other methods like setup.py or setup.cfg
[tool.setuptools]
include-package-data = true
# add the data here, it's finding the files recursively
[tool.setuptools.package-data]
package_demo = ["*.json"]build-backend systems than setuptoolsOther than setuptools we can use these build backend systems. The point to note is when using other systems the pyproject.toml cofiguration should follow their standerds.
Hatch
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"Poetry
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"Documenting the exact versions of our dependencies and their dependencies and so on.
It's advisable to have as little as possible number of dependencies associated with our package, as it can lead to dependency hell, or dependency conflict with other package as explained in the lectures.
The more complex the dependency tree higher the chance of conflicts with future version of other libraries.
Dependency Graph Analysis By Eric
Keeping the pinned versions of dependencies and python versions is advicable for troubleshooting purposes:
pip freeze > requirements.txtpip install pipdeptree graphviz
sudo apt-get install graphviz
# generate the dependency graph
pipdeptree -p packaging-demo --graph-output png > dependency-graph.png[project.optional-dependencies]
dev = ["ruff", "mypy", "black"]# installing our package with optional dependencies
pip install '.[dev]'[project.optional-dependencies]
# for developement
dev = ["ruff", "mypy", "black"]
# plugin based architecture
colors = ["rich"]# plugin based installation
pip install '.[colors]'
# here we demo with rich library, if user wants the output to be
# colorized then they can install our package like this.
# we can add multiple optional dependencies like:
pip install '.[colors, dev]'[project.optional-dependencies]
# for developement
dev = ["ruff", "mypy", "black"]
# plugin based architecture
colors = ["rich"]
# install all dependencies
all = ["packaging-demo[dev, colors]"]# Installing all dependencies all at once
pip install '.[all]'We can use Snyk to check how stable, well supported, if any security issues, etc. are present for the dependencies which we are going to use for our package and then take the decision on using it in our project.
To pusblish our package to PyPI[Python Packaging Index], as stated in the official guide, we use twine CLI tool.
pip install twine
twine upload --helpGenerate API Token for PyPI Test or PyPI Prod
Build your python package: python -m build --sdist --wheel "${PACKAGE_DIR}", here we're building both sdist and wheel, as recommended.
Run the twine tool: twine upload --repository testpypi ./dist/*, uplading to test-pypi
CMake and Makefile
sudo apt-get install makeTaskfile
justfile
pyinvoke
Dev
$rightarrow$ QA/Staging$rightarrow$ Prod



# .github/workflows/publish.yaml
name: Build, Test, and Publish
# triggers: whenever there is new changes pulled/pushed on this
# repo under given conditions, run the below jobs
on:
pull_request:
types: [opened, synchronize]
push:
branches:
- main
# Manually trigger a workflow
# https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_dispatch
workflow_dispatch:
jobs:
build-test-and-publish:
runs-on: ubuntu-latest
steps:
# github actions checksout, clones our repo, and checks out the branch we're working in
- uses: actions/checkout@v3
with:
# Number of commits to fetch. 0 indicates all history for all branches and tags
# fetching all tags so to aviod duplicate version tagging in 'Tag with the Release Version'
fetch-depth: 0
- name: Set up Python 3.8
uses: actions/setup-python@v3
with:
python-version: 3.8
# tagging the release version to avoid duplicate releases
- name: Tag with the Release Version
run: |
git tag $(cat version.txt)
- name: Install Python Dependencies
run: |
/bin/bash -x run.sh install
- name: Lint, Format, and Other Static Code Quality Check
run: |
/bin/bash -x run.sh lint:ci
- name: Build Python Package
run: |
/bin/bash -x run.sh build
- name: Publish to Test PyPI
# setting -x in below publish:test will not leak any secrets as they are masked in github
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
run: |
/bin/bash -x run.sh publish:test
env:
TEST_PYPI_TOKEN: ${{ secrets.TEST_PYPI_TOKEN }}
- name: Publish to Prod PyPI
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
run: |
/bin/bash -x run.sh publish:prod
env:
PROD_PYPI_TOKEN: ${{ secrets.PROD_PYPI_TOKEN }}
- name: Push Tags
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
run: |
git push origin --tagsLocking Requirements
Dependency Caching
Whenever github actions gets executed in the github CI, everytime it's run on a fresh container. Thus, everytime we'll have to download and re-install dependencies from pip again and again; which is not a good as it's inefficeint and slows our workflow.
Thus we would want to install all the dependencies when the workflow ran first and use it every time a new worflow is run.
GitHub Actions provide this functionality by caching the dependencies, it stores the installed
dependencies(~/.cache/pip) and downloads it everytime a new workflow is run.
Docs
- uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-setup-python github actionsteps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.9'
cache: 'pip' # caching pip dependencies
- run: pip install -r requirements.txtParallelization
# See .github/workflows/publish.yaml
jobs:
check-verison-txt:
...
lint-format-and-static-code-checks:
....
build-wheel-and-sdist:
...
publish:
needs:
- check-verison-txt
- lint-format-and-static-code-checks
- build-wheel-and-sdist
...