3. Setting up open source LLMs

It is possible to run LLMs locally on consumer CPUs or GPUs with usable performance and inference speed. The following table contains a (non-exhaustive) list of projects which document the process of setting up a local open source LLM.

Projects

llama.cppOllamaPrivateGPTGPT4ALLTheBlokeExLlamaV2

Note: GGML superseded by GGUF

Convert LLMs to GGML format and quantise
Manifesto
Supports builds for many other LLMs
Building from source will require additional software e.g. Make, CMake, Zig or GMake.

Important reading:

3.1 Quantisation

The ability to run LLMs on consumer grade hardware has been achieved by quantisation or "rounding" of floating point data types. This is accomplished by mapping floating point ranges into more compact integer representations for example, quantising the range (-1.0, ..., 1.0) to (-127, -126, ..., 126, 127). The following links provide a nice introduction to floating point data types and quantisation techniques.

Quantisation methods

Good reads

Introduction to open source LLMs.