3. Setting up open source LLMs
It is possible to run LLMs locally on consumer CPUs or GPUs with usable performance and inference speed. The following table contains a (non-exhaustive) list of projects which document the process of setting up a local open source LLM.
Projects
Note: GGML superseded by GGUF
- Convert LLMs to GGML format and quantise
- Manifesto
- Supports builds for many other LLMs
- Building from source will require additional software e.g. Make, CMake, Zig or GMake.
Important reading:
Further reading:
- Download and run pre-built LLMs locally from the ollama.ai library
- Supports LLMs not in their library by importing GGUF files
Production-ready AI project that allows you to ask questions about your documents.
LLMs that run locally on your CPU and nearly any GPU.
Pre-built quantised LLMs on HuggingFace.
Inference library for running local LLMs on modern consumer GPUs.
3.1 Quantisation
The ability to run LLMs on consumer grade hardware has been achieved by quantisation or "rounding" of floating point data types. This is accomplished by mapping floating point ranges into more compact integer representations for example, quantising the range (-1.0, ..., 1.0) to (-127, -126, ..., 126, 127). The following links provide a nice introduction to floating point data types and quantisation techniques.
Good reads
Introduction to open source LLMs.