7. Install UniversalNER Locally
Warning
Requires the quantised NER/UniNER-7B-type
model placed in the model
folder.
7.1 Downloading a Quantised version of the model from HuggingFace
This HuggingFace repository holds a range of quantized UniversalNER Models. Any of these models can be downloaded, but make sure you download a model that your RAM can handle.
cd privfp-experiments
source .venv/bin/activate
huggingface-cli download yuuko-eth/UniNER-7B-all-GGUF UniversalNER-7B-all-Q4_0.gguf --local-dir ./models
Warning
You may need to change the universal_ner_path set in the ./src/config.py file.
7.2 Quantising the UniversalNER model yourself
The quantised model was created by cloning the llama.cpp repository and quantising the Universal-NER/UniNER-7B-type
locally to q4_1.gguf
format.
First you need to install the llama.cpp repo. You need to run the MAKE command to create essential directories.
git clone llama.cpp
cd llama.cpp
MAKE # If you got CPU
MAKE CUBLAS=1 # If you got GPU
In the llamma.cpp repo you will need to create a python environment to install all the necessary requirements
cd llama.cpp
python3.9 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Next you need to download the UniversalNER model.
pip install transformers datasets sentencepiece
huggingface-cli download Universal-NER/UniNER-7B-type --local-dir models
Next you will convert the model into a 32 floating point size.
python convert.py ./models
Then you convert the f32 point to a q4_1 bit of accuracy.
./quantize models/ggml-model-f32.gguf models/quantized_q4_1.gguf q4_1
This quantized model is located in the quantize models folders. This model can then be transderred to our repo's ./model folder.
The steps provided above have been retrieved from a medium article