Deploying a large language model (LLM) on my local machine
Published: Post views:I am planing to deploy a large language model (LLM) on my local machine (AMD-ITX). The machine is just a personal computer running on ubuntu, coming with a 8-core CPU (AMD Ryzen 7 5700X), 32G RAM and a GPU of RTX 4060ti 16G. Only small size LLMs are suitable. Here I record the process of deploying a LLM on my local machine, for the ease of future maintenance.
Some important references:
- a online tutorial in Chinese
- another turotial in English
- Docker
- NVidia Container Runtime for Docker
- Cloudflared
Install Docker
Start Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
NVidia Container Runtime for Docker
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker # The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.
sudo systemctl restart docker
Install Open WebUI
With NVidia GPU and CUDA Support: Utilize GPU resources by running the following command:
docker run -d -p 4000:8080 \
--gpus all \
--add-host=host.docker.internal:host-gateway \
-v /DATA/open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:cuda
# or run as a system service
sudo nano /etc/systemd/system/open-webui.service
sudo systemctl daemon-reload
sudo systemctl enable open-webui
Verify the Docker instance is running:
sudo docker ps
Now the Open WebUI is running on your local machine. You can access it by visiting http://localhost:4000
in your browser.
Install and start Ollama
curl -fsSL https://ollama.com/install.sh | sh # install
ollama serve # start for once
start Ollama as a service
First change the OLLAMA_HOST to 0.0.0.0 referring to: https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-linux
then sudo systemctl enable ollama
and sudo systemctl start ollama
Expose the Open WebUI to the Internet via cloudflared
- Log in to Zero Trust, and go to Networks > Tunnels.
- Select the tunnel (AMD-ITX here) and add a public hostname (e.g., “local_llm”), specifying appropriate port (4000 as above).
- Under the “Access > Applications” tab, add an application and select the appropriate access policy.
- Now visit https://local_llm.guhaogao.com and you are good to go.
Downloading Ollama Models
- Qwen2.5-14B-Instruct-Q4_K_M
- Qwen2.5-14B-Instruct-Q6_K_L
- Qwen2.5-32B-Instruct-Q3_K_M
- Qwen2.5-32B-Instruct-IQ4_XS
- llama3.2-vision:11b
- Mistral-Small-Instruct-2409-Q5_K_S
- bge-m3:latest
Setting RAG models and web search
RAG using bge-m3:latest, reranking with BAAI/bge-reranker-v2-m3. Top K: 5
Web search using searxng, installed using docker.
docker run -d --name searxng -p 8081:8080 -v /home/hggu/software_480G/searxng:/etc/searxng --restart always searxng/searxng:latest
# or running as a system service on port 8081.
sudo nano /etc/systemd/system/searxng.service
sudo systemctl daemon-reload
sudo systemctl enable searxng
use this link: http://host.docker.internal:8081/search?q=<query>
as the query link in the Open WebUI setting.
Tags: LLM
Categories: Misc
Comments