3. Skip to content

3. Setting up open source LLMs

It is possible to run LLMs locally on consumer CPUs or GPUs with usable performance and inference speed. The following table contains a (non-exhaustive) list of projects which document the process of setting up a local open source LLM.

Projects

M1 MBP Tested

Note: GGML superseded by GGUF

  • Convert LLMs to GGML format and quantise
  • Manifesto
  • Supports builds for many other LLMs
  • Building from source will require additional software e.g. Make, CMake, Zig or GMake.

Important reading:

Further reading:

GitHub

M1 MBP

  • Download and run pre-built LLMs locally from the ollama.ai library
  • Supports LLMs not in their library by importing GGUF files

GitHub

Production-ready AI project that allows you to ask questions about your documents.

GitHub

LLMs that run locally on your CPU and nearly any GPU.

GitHub

Pre-built quantised LLMs on HuggingFace.

Hugging Face 🤗

Inference library for running local LLMs on modern consumer GPUs.

GitHub

3.1 Quantisation

The ability to run LLMs on consumer grade hardware has been achieved by quantisation or "rounding" of floating point data types. This is accomplished by mapping floating point ranges into more compact integer representations for example, quantising the range (-1.0, ..., 1.0) to (-127, -126, ..., 126, 127). The following links provide a nice introduction to floating point data types and quantisation techniques.

Quantisation methods