Deploying a large language model (LLM) on my local machine

Published: November 9, 2024 2 minute read, with 436 words. Post views:

I am planing to deploy a large language model (LLM) on my local machine (AMD-ITX). The machine is just a personal computer running on ubuntu, coming with a 8-core CPU (AMD Ryzen 7 5700X), 32G RAM and a GPU of RTX 4060ti 16G. Only small size LLMs are suitable. Here I record the process of deploying a LLM on my local machine, for the ease of future maintenance.

Some important references:

Install Docker

Start Docker

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

NVidia Container Runtime for Docker

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker # The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.
sudo systemctl restart docker

Install Open WebUI

With NVidia GPU and CUDA Support: Utilize GPU resources by running the following command:

docker run -d -p 4000:8080 \
        --gpus all \
        --add-host=host.docker.internal:host-gateway \
        -v /DATA/open-webui:/app/backend/data \
        --name open-webui \
        --restart always \
        ghcr.io/open-webui/open-webui:cuda

# or run as a system service
sudo nano /etc/systemd/system/open-webui.service
sudo systemctl daemon-reload
sudo systemctl enable open-webui

Verify the Docker instance is running:

sudo docker ps

Now the Open WebUI is running on your local machine. You can access it by visiting http://localhost:4000 in your browser.

Install and start Ollama

curl -fsSL https://ollama.com/install.sh | sh # install
ollama serve # start for once

start Ollama as a service

First change the OLLAMA_HOST to 0.0.0.0 referring to: https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-linux

then sudo systemctl enable ollama and sudo systemctl start ollama

Expose the Open WebUI to the Internet via cloudflared

Log in to Zero Trust, and go to Networks > Tunnels.
Select the tunnel (AMD-ITX here) and add a public hostname (e.g., “local_llm”), specifying appropriate port (4000 as above).
Under the “Access > Applications” tab, add an application and select the appropriate access policy.
Now visit https://local_llm.guhaogao.com and you are good to go.

Downloading Ollama Models

Setting RAG models and web search

RAG using bge-m3:latest, reranking with BAAI/bge-reranker-v2-m3. Top K: 5

Web search using searxng, installed using docker.

docker run -d --name searxng -p 8081:8080 -v /home/hggu/software_480G/searxng:/etc/searxng --restart always searxng/searxng:latest
# or running as a system service on port 8081.
sudo nano /etc/systemd/system/searxng.service
sudo systemctl daemon-reload
sudo systemctl enable searxng

use this link: http://host.docker.internal:8081/search?q=<query> as the query link in the Open WebUI setting.

Tags: LLM

Categories: Misc

Haogao Gu

Deploying a large language model (LLM) on my local machine

Install Docker

Start Docker

NVidia Container Runtime for Docker

Install Open WebUI

Install and start Ollama

start Ollama as a service

Expose the Open WebUI to the Internet via cloudflared

Downloading Ollama Models

Setting RAG models and web search

Comments

Recent posts

Course notes: Molecular Evolution workshop in Marine Biological Laboratory, US

Paper digest: Selection on synonymous sites: the unwanted transcript hypothesis (Nat Rev Genet, 2024)

Book notes: Molecular Evolution: A Statistical Approach by Ziheng Yang

Paper digest: More structured coalescent papers: on Nicola F. Muller and Nicola De Maio