Skip to content

Development guide

This document aims to provide a comprehensive set of instructions for continuing development of this package. Good knowledge of Python development is assumed. Some ways of working are subjective and preferential; as such we try to be as minimal in our proscription of other methods as possible.

Development environment setup

Python

The package currently supports major versions 3.9, 3.10 and 3.11 of Python. We recommend installing all of these versions; at minimum the latest supported version of Python should be used. Many people use pyenv for managing multiple python versions. On MacOS homebrew is a good, less invasive option for this (provided you then use a virtual environment manager too). For virtual environment management, we recommend Python's in-built venv functionality, but conda or some similar system would suffice (note that in the section below it may not be necessary to use any specific virtual environment management at all depending on the setup of Poetry).

Poetry

We use Poetry to manage dependencies and the actual packaging and publishing of NHSSynth to PyPI. Poetry is a more robust alternative to a requirements.txt file, allowing for grouped dependencies and advanced build options. Rather than freezing a specific pip state, Poetry only specifies the top-level dependencies and then handles the resolution and installation of the latest compatible versions of the full dependency tree per these top-level dependencies. See the pyproject.toml in the GitHub repository and Poetry's documentation for further context.

Once Poetry is installed (in your preferred way per the instructions on their website), you can choose one of two options:

  1. Allow poetry to control virtual environments in their proprietary way), such that when you install and develop the package poetry will automatically create a virtual environment for you.

  2. Change poetry's configuration to manage your own virtual environments:

    poetry config virtualenvs.create false
    poetry config virtualenvs.in-project false
    

    In this setup, a virtual environment can be be instantiated and activated in whichever way you prefer. For example, using venv:

    python3.11 -m venv nhssynth-3.11
    source nhssynth-3.11/bin/activate
    

Package installation

At this point, the project dependencies can be installed via poetry install --with dev (add optional flags: --with aux to work with the auxiliary notebooks, --with docs to work with the documentation). This will install the package in editable mode, meaning that changes to the source code will be reflected in the installed package without needing to reinstall it. Note that if you are using your own virtual environment, you will need to activate it before running this command.

You can then interact with the package in one of two ways:

  1. Via the CLI module, which is accessed using the nhssynth command, e.g.

    poetry run nhssynth ...
    

    Note that you can omit the poetry run part and just type nhssynth if you followed the optional steps above to manage and activate your own virtual environment, or if you have executed poetry shell beforehand. 2. Through directly importing parts of the package to use in an existing project (from nhssynth.modules... import ...).

Secure mode

Note that in order to train a generator in secure mode (see the documentation for details) the PyTorch extension package csprng must be installed separately. Currently this package's dependencies are not compatible with recent versions of PyTorch (the author's plan on rectifying this - watch this space), so you will need to install it manually, you can do this in your environment by running:

git clone git@github.com:pytorch/csprng.git
cd csprng
git branch release "v0.2.2-rc1"
git checkout release
python setup.py install

Coding practices

Style

We use black for code formatting. This is a fairly opinionated formatter, but it is widely used and has a good reputation. We also use ruff to manage imports and lint the code. Both of these tools are run automatically via pre-commit hooks. Ensure you have installed the package with the dev group of dependencies and then run the following command to install the hooks:

pre-commit install

Note that you may need to pre-pend this command with poetry run if you are not using your own virtual environment.

This will ensure that your code conforms to the two formatters' / linters' requirements each time you commit to a branch. black and ruff are also run as part of the CI workflow discussed below, such that even without these hooks, the code will be checked and raise an error on GitHub if it is not formatted consistently.

Configuration for both packages can be found in the pyproject.toml, this configuration should be picked up automatically by both the pre-commit hooks and your IDE / running them manually in the command line. The main configuration is as follows:

[tool.black]
line-length = 120

[tool.ruff]
include = ["*.py", "*.pyi", "**/pyproject.toml", "*.ipynb"]
select = ["E4", "E7", "E9", "F", "C90", "I"]

[tool.ruff.per-file-ignores]
"src/nhssynth/common/constants.py" = ["F403", "F405"]

[tool.ruff.isort]
known-first-party = ["nhssynth"]

This ensure that absolute imports from NHSSynth are sorted separately from the rest of the imports in a file.

There are a number of other hooks used as part of this repositories pre-commit, including one that automatically mirrors the poetry version of these packages in the dev per the list of supported packages and .poetry-sync-db.json. Roughly, these other hooks ensure correct formatting of .yaml and .toml files, checks for large files being added to a commit, strips notebook output from the files, and fixes whitespace and end-of-file issues. These are mostly consistent with the NHSx analytics project template's hooks

Documentation

There should be Google-style docstrings on all non-trivial functions and classes. Ideally a docstring should take the form:

def func(arg1: type1, arg2: type2) -> returntype:
    """
    One-line summary of the function.
    AND / OR
    Longer description of the function, including any caveats or assumptions where appropriate.

    Args:
        arg1: Description of arg1.
        arg2: Description of arg2.

    Returns:
        Description of the return value.
    """
    ...

These docstrings are then compiled into a full API documentation tree as part of a larger MkDocs documentation site hosted via GitHub (the one you are reading right now!). This process is derived from this tutorial.

The MkDocs page is built using the mkdocs-material theme. The documentation is built and hosted automatically via GitHub Pages.

The other parts of this site comprise markdown documents in the docs folder. Adding new pages is handled in the mkdocs.yml file as in any other Material MkDocs site. See their documentation if more complex changes to the site are required.

Testing

We use tox to manage the execution of tests for the package against multiple versions of Python, and to ensure that they are being run in a clean environment. To run the tests, simply execute tox in the root directory of the repository. This will run the tests against all supported versions of Python. To run the tests against a specific version of Python, use tox -e py311 (or py310 or py39).

Configuration

See the tox.ini file for more information on the testing configuration. We follow the Poetry documentation on tox support to ensure that for each version of Python, tox will create an sdist package of the project and use pip to install it in a fresh environment. Thus, dependencies are resolved by pip in the first place and then afterwards updated to the locked dependencies in poetry.lock by running poetry install ... in this fresh environment. The tests are then run using poetry pytest, which is configured in the pyproject.toml file. This configuration is fairly minimal: simply specifying the testing directory as the tests folder and filtering some known warnings.

[tool.pytest.ini_options]
testpaths = "tests"
filterwarnings = ["ignore::DeprecationWarning:pkg_resources"]

We can also use coverage to check the test coverage of the package. This is configured in the pyproject.toml file as follows:

[tool.coverage.run]
source = ["src/nhssynth/cli", "src/nhssynth/common", "src/nhssynth/modules"]
omit = [
    "src/nhssynth/common/debugging.py",
]

We omit debugging.py as it is a wrapper for reading full trace-backs of warnings and not to be imported directly.

Adding Tests

We use the pytest framework for testing. The testing directory structure mirrors that of src. The usual testing practices apply.

Releases

Version management

The package's version should be updated following the semantic versioning framework. The package is currently in a pre-release state, such that major version 1.0.0 should only be tagged once the package is functionally complete and stable.

To update the package's metadata, we can use Poetry's version command:

poetry version <version>

We can then commit and push the changes to the version file, and create a new tag:

git add pyproject.toml
git commit -m "Bump version to <version>"
git push

We should then tag the release using GitHub's CLI (or manually via git if you prefer):

gh release create <version> --generate-notes

This will create a new release on GitHub, and will automatically generate a changelog based on the commit messages and PR's closed since the last release. This changelog can then be edited to add more detail if necessary.

Building and publishing to PyPI

Poetry offers not only dependency management, but also a simple way to build and distribute the package.

After tagging a release per the section above, we can build the package using Poetry's build command:

poetry build

This will create a dist folder containing the built package. To publish this to PyPI, we can use the publish command:

poetry publish

This will prompt for PyPI credentials, and then publish the package. Note that this will only work if you have been added as a Maintainer of the package on PyPI.

It might be preferable at some point in the future to set up Trusted Publisher Management via OpenID Connect (OIDC) to allow for automated publishing of the package via a GitHub workflow. See the "Publishing" tab of NHSSynth's project management panel on PyPI to set this up.

GitHub

Continuous integration

We use GitHub Actions for continuous integration. The different workflows comprising this can be found in the .github/workflows folder. In general, the CI workflow is triggered on every push to the main or a feature branch - as appropriate - and runs tests against all supported versions of Python. It also runs black and ruff to check that the code is formatted correctly, and builds the documentation site.

There are also scripts to update the dynamic badges in the README. These work via a gist associated with the repository. It is not easy to transfer ownership of this process, so if they break please feel free to contact me.

Branching

We encourage the use of the Gitflow branching model for development. This means that the main branch is always in a stable state, and that all development work is done on feature branches. These feature branches are then merged into main via pull requests. The main branch is protected, such that pull requests must be reviewed and approved before they can be merged.

At minimum, the main branches protection should be maintained, and roughly one branch per issue should be used. Ensure that all of the CI checks pass before merging.

Security and vulnerability management

The GitHub repository for the package has Dependabot, code scanning, and other security features enabled. These should be monitored continuously and any issues resolved as soon as possible. When issues of this type require a specific version of a dependency to be specified (and it is one that is not already amongst the dependency groups of the package), the version should be referenced as part of the security group of dependencies (i.e. with poetry add <package> --group security) and a new release created (see above).