Development guide
This document aims to provide a comprehensive set of instructions for continuing development of this package. Good knowledge of Python development is assumed. Some ways of working are subjective and preferential; as such we try to be as minimal in our proscription of other methods as possible.
Development environment setup
Python
The package currently supports major versions 3.9, 3.10 and 3.11 of Python. We recommend installing all of these versions; at minimum the latest supported version of Python should be used. Many people use pyenv
for managing multiple python versions. On MacOS homebrew is a good, less invasive option for this (provided you then use a virtual environment manager too). For virtual environment management, we recommend Python's in-built venv
functionality, but conda or some similar system would suffice (note that in the section below it may not be necessary to use any specific virtual environment management at all depending on the setup of Poetry).
Poetry
We use Poetry to manage dependencies and the actual packaging and publishing of NHSSynth
to PyPI. Poetry is a more robust alternative to a requirements.txt
file, allowing for grouped dependencies and advanced build options. Rather than freezing a specific pip
state, Poetry only specifies the top-level dependencies and then handles the resolution and installation of the latest compatible versions of the full dependency tree per these top-level dependencies. See the pyproject.toml
in the GitHub repository and Poetry's documentation for further context.
Once Poetry is installed (in your preferred way per the instructions on their website), you can choose one of two options:
-
Allow
poetry
to control virtual environments in their proprietary way), such that when you install and develop the package poetry will automatically create a virtual environment for you. -
Change
poetry
's configuration to manage your own virtual environments:In this setup, a virtual environment can be be instantiated and activated in whichever way you prefer. For example, using
venv
:
Package installation
At this point, the project dependencies can be installed via poetry install --with dev
(add optional flags: --with aux
to work with the auxiliary notebooks, --with docs
to work with the documentation). This will install the package in editable mode, meaning that changes to the source code will be reflected in the installed package without needing to reinstall it. Note that if you are using your own virtual environment, you will need to activate it before running this command.
You can then interact with the package in one of two ways:
-
Via the CLI module, which is accessed using the
nhssynth
command, e.g.Note that you can omit the
poetry run
part and just typenhssynth
if you followed the optional steps above to manage and activate your own virtual environment, or if you have executedpoetry shell
beforehand. 2. Through directly importing parts of the package to use in an existing project (from nhssynth.modules... import ...
).
Secure mode
Note that in order to train a generator in secure mode (see the documentation for details) the PyTorch extension package csprng
must be installed separately. Currently this package's dependencies are not compatible with recent versions of PyTorch (the author's plan on rectifying this - watch this space), so you will need to install it manually, you can do this in your environment by running:
git clone git@github.com:pytorch/csprng.git
cd csprng
git branch release "v0.2.2-rc1"
git checkout release
python setup.py install
Coding practices
Style
We use black
for code formatting. This is a fairly opinionated formatter, but it is widely used and has a good reputation. We also use ruff
to manage imports and lint the code. Both of these tools are run automatically via pre-commit
hooks. Ensure you have installed the package with the dev
group of dependencies and then run the following command to install the hooks:
Note that you may need to pre-pend this command with poetry run
if you are not using your own virtual environment.
This will ensure that your code conforms to the two formatters' / linters' requirements each time you commit to a branch. black
and ruff
are also run as part of the CI workflow discussed below, such that even without these hooks, the code will be checked and raise an error on GitHub if it is not formatted consistently.
Configuration for both packages can be found in the pyproject.toml
, this configuration should be picked up automatically by both the pre-commit hooks and your IDE / running them manually in the command line. The main configuration is as follows:
[tool.black]
line-length = 120
[tool.ruff]
include = ["*.py", "*.pyi", "**/pyproject.toml", "*.ipynb"]
select = ["E4", "E7", "E9", "F", "C90", "I"]
[tool.ruff.per-file-ignores]
"src/nhssynth/common/constants.py" = ["F403", "F405"]
[tool.ruff.isort]
known-first-party = ["nhssynth"]
This ensure that absolute imports from NHSSynth
are sorted separately from the rest of the imports in a file.
There are a number of other hooks used as part of this repositories pre-commit, including one that automatically mirrors the poetry version of these packages in the dev
per the list of supported packages and .poetry-sync-db.json. Roughly, these other hooks ensure correct formatting of .yaml
and .toml
files, checks for large files being added to a commit, strips notebook output from the files, and fixes whitespace and end-of-file issues. These are mostly consistent with the NHSx analytics project template's hooks
Documentation
There should be Google-style docstrings on all non-trivial functions and classes. Ideally a docstring should take the form:
def func(arg1: type1, arg2: type2) -> returntype:
"""
One-line summary of the function.
AND / OR
Longer description of the function, including any caveats or assumptions where appropriate.
Args:
arg1: Description of arg1.
arg2: Description of arg2.
Returns:
Description of the return value.
"""
...
These docstrings are then compiled into a full API documentation tree as part of a larger MkDocs documentation site hosted via GitHub (the one you are reading right now!). This process is derived from this tutorial.
The MkDocs page is built using the mkdocs-material
theme. The documentation is built and hosted automatically via GitHub Pages.
The other parts of this site comprise markdown documents in the docs folder. Adding new pages is handled in the mkdocs.yml
file as in any other Material MkDocs site. See their documentation if more complex changes to the site are required.
Testing
We use tox
to manage the execution of tests for the package against multiple versions of Python, and to ensure that they are being run in a clean environment. To run the tests, simply execute tox
in the root directory of the repository. This will run the tests against all supported versions of Python. To run the tests against a specific version of Python, use tox -e py311
(or py310
or py39
).
Configuration
See the tox.ini file for more information on the testing configuration. We follow the Poetry documentation on tox
support to ensure that for each version of Python, tox
will create an sdist
package of the project and use pip
to install it in a fresh environment. Thus, dependencies are resolved by pip in the first place and then afterwards updated to the locked dependencies in poetry.lock
by running poetry install ...
in this fresh environment. The tests are then run using poetry pytest
, which is configured in the pyproject.toml file. This configuration is fairly minimal: simply specifying the testing directory as the tests folder and filtering some known warnings.
[tool.pytest.ini_options]
testpaths = "tests"
filterwarnings = ["ignore::DeprecationWarning:pkg_resources"]
We can also use coverage
to check the test coverage of the package. This is configured in the pyproject.toml file as follows:
[tool.coverage.run]
source = ["src/nhssynth/cli", "src/nhssynth/common", "src/nhssynth/modules"]
omit = [
"src/nhssynth/common/debugging.py",
]
We omit debugging.py
as it is a wrapper for reading full trace-backs of warnings and not to be imported directly.
Adding Tests
We use the pytest
framework for testing. The testing directory structure mirrors that of src
. The usual testing practices apply.
Releases
Version management
The package's version should be updated following the semantic versioning framework. The package is currently in a pre-release state, such that major version 1.0.0 should only be tagged once the package is functionally complete and stable.
To update the package's metadata, we can use Poetry's version
command:
We can then commit and push the changes to the version file, and create a new tag:
We should then tag the release using GitHub's CLI (or manually via git
if you prefer):
This will create a new release on GitHub, and will automatically generate a changelog based on the commit messages and PR's closed since the last release. This changelog can then be edited to add more detail if necessary.
Building and publishing to PyPI
Poetry offers not only dependency management, but also a simple way to build and distribute the package.
After tagging a release per the section above, we can build the package using Poetry's build
command:
This will create a dist
folder containing the built package. To publish this to PyPI, we can use the publish
command:
This will prompt for PyPI credentials, and then publish the package. Note that this will only work if you have been added as a Maintainer of the package on PyPI.
It might be preferable at some point in the future to set up Trusted Publisher Management via OpenID Connect (OIDC) to allow for automated publishing of the package via a GitHub workflow. See the "Publishing" tab of NHSSynth
's project management panel on PyPI to set this up.
GitHub
Continuous integration
We use GitHub Actions for continuous integration. The different workflows comprising this can be found in the .github/workflows
folder. In general, the CI workflow is triggered on every push to the main
or a feature branch - as appropriate - and runs tests against all supported versions of Python. It also runs black
and ruff
to check that the code is formatted correctly, and builds the documentation site.
There are also scripts to update the dynamic badges in the README
. These work via a gist associated with the repository. It is not easy to transfer ownership of this process, so if they break please feel free to contact me.
Branching
We encourage the use of the Gitflow branching model for development. This means that the main
branch is always in a stable state, and that all development work is done on feature branches. These feature branches are then merged into main
via pull requests. The main
branch is protected, such that pull requests must be reviewed and approved before they can be merged.
At minimum, the main
branches protection should be maintained, and roughly one branch per issue should be used. Ensure that all of the CI checks pass before merging.
Security and vulnerability management
The GitHub repository for the package has Dependabot, code scanning, and other security features enabled. These should be monitored continuously and any issues resolved as soon as possible. When issues of this type require a specific version of a dependency to be specified (and it is one that is not already amongst the dependency groups of the package), the version should be referenced as part of the security
group of dependencies (i.e. with poetry add <package> --group security
) and a new release created (see above).