I Self-Hosted OpenClaw on My Proxmox Homelab — Here's Everything I Learned

I'll be honest with you — I'm a little obsessed with self-hosting things. If there's an open-source alternative to a SaaS product I use, I've probably tried to run it on my homelab at some point. Pi-hole, Gitea, Nextcloud, Uptime Kuma, Grafana — they've all had a home on my Proxmox cluster at various points.

So when I started building Kumari.ai — my AI agent platform — the obvious first step was wiring everything up to the Claude API. And honestly? It was a great experience.

CODE

1 line

claude-3-sonnet

is genuinely impressive for agentic tasks — the instruction following is tight, the context window is large, and Anthropic's API is reliable. For the first few weeks of building Kumari.ai, every agent call, every test run, every "does this prompt work?" experiment went straight to Claude.

Then I got my first month's API bill.

$80. For my own development and testing. Not production traffic. Not paying users. Just me hammering the API while figuring out how agents should chain together. I did the math — at the rate I was iterating, I was looking at $150-200/month before the project even had a single user. That's when I started seriously looking for alternatives.

I tried OpenAI for a bit (similar story), then Groq (fast but still API-based), and eventually landed on OpenClaw with Ollama for the bulk of my dev work. The idea: run local models for 90% of the iteration and testing, keep Claude and Groq as fallbacks for the tasks that genuinely need frontier-model quality. Best of both worlds.

This post is the full story of that migration — how I set up the self-hosted stack on Proxmox, what the performance trade-offs actually look like, and why Claude still has a spot in the config even after all this.

Why Bother Self-Hosting? (And Why I Didn't Start Here)

A fair question — especially since I had working Claude and Groq integrations already. Let me be honest about the full picture, not just the "cloud bad, self-host good" narrative.

The Claude API is genuinely good. I want to say this clearly. When I first wired Kumari.ai into

CODE

1 line

claude-3-sonnet

, the quality jump over smaller models was obvious. Tool use worked reliably, long context reasoning was solid, and the API uptime was excellent. I still use it. It's the fallback that kicks in for anything complex — multi-step reasoning, nuanced code review, long document analysis. Some tasks just need a frontier model and there's no shame in that.

But the cost model breaks for development. The issue isn't production traffic — it's the 500 test calls you make figuring out the right prompt structure for an agent. At $3/million input tokens (Sonnet), those experiments add up. Local Ollama is $0/million tokens. For iteration speed during development, that's a huge deal.

Privacy at the edges. When I'm sketching out Kumari.ai's agent architecture, the prompts contain ideas I'm genuinely not ready to share. Local inference is a clean answer — nothing leaves the machine. I'm not paranoid about Anthropic specifically (their privacy policy is reasonable), but the habit of routing sensitive design decisions through external APIs felt sloppy.

Learning the infrastructure. There's a big difference between using an LLM API and understanding how inference actually works. After setting up Ollama, I understand model quantization, context length trade-offs, memory requirements, and batching in a way that makes me significantly better at designing systems around LLMs. That depth directly feeds back into Kumari.ai.

It's genuinely fun. Watching my $120 Dell OptiPlex generate coherent text at 18 tokens per second from a completely local model is one of the most satisfying things I've done in my homelab. No API key. No usage dashboard. Just silicon doing math.

The Migration: From Claude API to Ollama (And What I Kept)

Before the setup steps, here's how the actual transition went — because it wasn't a clean "rip out Claude, plug in Ollama" swap.

Phase 1 — Claude only. Every call in my early Kumari.ai prototype went to

CODE

1 line

claude-3-sonnet

. Clean, fast to iterate, great quality. Monthly cost: ~$80 and climbing.

Phase 2 — Groq as the primary, Claude as fallback. Groq runs llama3-70b on custom LPU hardware and returns responses in under a second. For quick tasks — classification, formatting, short summaries — it was a direct swap with zero quality loss. I routed anything needing deep reasoning or large context to Claude. Monthly cost dropped to ~$35.

Phase 3 — Ollama for dev, APIs for production. This is where OpenClaw came in. I set up Ollama locally to handle all development and testing. Production traffic that needed quality still went to Groq or Claude, but the hundreds of daily test calls during development became free. Monthly cost: ~$12 (just production traffic).

Phase 4 — Current state. Ollama handles ~70% of total calls (mostly dev, some lighter production tasks). Groq handles ~25% (fast production tasks where llama3-70b is sufficient). Claude handles ~5% — complex reasoning, nuanced agent behavior, anything where the output quality genuinely matters and a smaller model won't cut it.

Here's what my OpenClaw provider config looks like today:


YAML
39 lines
1# Provider routing in OpenClaw config
2providers:
3  - name: local-ollama
4    type: ollama
5    base_url: http://ollama:11434
6    priority: 1          # First choice
7    models:
8      - llama3:8b        # Fast, everyday tasks
9      - mistral:7b       # Quick queries
10      - codellama:13b    # Code generation
11      - deepseek-r1:7b   # Reasoning tasks
12
13  - name: groq-cloud
14    type: openai_compatible
15    base_url: https://api.groq.com/openai/v1
16    api_key: ${GROQ_API_KEY}
17    priority: 2          # Fallback for speed-sensitive tasks
18    models:
19      - llama3-70b-8192  # Fast, high quality
20      - mixtral-8x7b     # Good for multi-step reasoning
21
22  - name: anthropic-claude
23    type: anthropic
24    api_key: ${CLAUDE_API_KEY}
25    priority: 3          # Reserved for complex tasks
26    models:
27      - claude-3-sonnet-20240229   # Main workhorse
28      - claude-3-haiku-20240307    # Fast Claude for lighter tasks
29
30# Routing rules
31routing:
32  default: local-ollama
33  rules:
34    - condition: "task.requires_large_context"
35      provider: anthropic-claude
36    - condition: "task.type == 'code_review' AND task.complexity > 7"
37      provider: anthropic-claude
38    - condition: "task.latency_sensitive"
39      provider: groq-cloud

This tiered approach is honestly the most practical setup for anyone building seriously with LLMs. Use local inference for the long tail of cheap tasks, use fast cloud inference for latency-sensitive production tasks, and keep the frontier model for the 5% that actually needs it. Your bill will thank you.

The Architecture

Here's the full picture of what we're building:

The key design decisions:

OpenClaw runs as a Docker container inside a dedicated Proxmox VM
Ollama runs alongside it in a separate container, serving models locally
Nginx handles SSL termination and reverse proxying
Cloudflare Tunnel gives me HTTPS access from anywhere without opening ports on my router
PostgreSQL persists conversation history and session data
External APIs (Claude, OpenAI, Groq) are configured as fallbacks for tasks that need more horsepower

Setting Up the Proxmox VM

I created a dedicated VM for OpenClaw rather than running it on my existing Docker host. The reason is isolation — I can snapshot it, migrate it, and experiment without worrying about affecting other services.

Here's the VM spec I settled on after some trial and error:


CODE
8 lines
1VM ID:     105
2Name:      openclaw
3Node:      pve1
4OS:        Ubuntu 22.04.3 LTS (cloud-init image)
5CPU:       4 vCPU (host passthrough for AVX2 support)
6RAM:       8 GB
7Disk:      80 GB (local-lvm thin)
8Network:   VLAN 20 (10.10.20.10)

important

[!IMPORTANT] The CPU host passthrough is critical for Ollama performance. By default, QEMU presents a generic CPU that doesn't expose AVX2 instructions to the guest. Most modern LLM inference engines (including llama.cpp, which Ollama uses under the hood) are heavily optimized with AVX2 vector instructions. Without it, you'll see a warning like WARNING: your CPU does not support AVX2 and inference will be noticeably slower. Set your CPU type to host in Proxmox to pass through the host CPU's full feature set.

Creating the VM from the command line (I keep a script for this):


Bash
43 lines
1#!/bin/bash
2# create-openclaw-vm.sh
3
4VMID=105
5NODE="pve1"
6CLOUDINIT_IMG="/var/lib/vz/template/iso/ubuntu-22.04-cloudimg-amd64.img"
7
8# Create VM
9qm create $VMID \
10    --name openclaw \
11    --node $NODE \
12    --memory 8192 \
13    --cores 4 \
14    --cpu host \
15    --net0 virtio,bridge=vmbr0,tag=20 \
16    --scsihw virtio-scsi-single \
17    --onboot 1 \
18    --agent enabled=1
19
20# Import cloud-init disk
21qm importdisk $VMID $CLOUDINIT_IMG local-lvm
22
23# Attach and resize disk
24qm set $VMID --scsi0 local-lvm:vm-${VMID}-disk-0
25qm resize $VMID scsi0 80G
26
27# Cloud-init drive
28qm set $VMID \
29    --ide2 local-lvm:cloudinit \
30    --boot order=scsi0 \
31    --serial0 socket \
32    --vga serial0
33
34# Set cloud-init config
35qm set $VMID \
36    --ciuser resham \
37    --cipassword "$(openssl passwd -6 "$VM_PASSWORD")" \
38    --ipconfig0 ip=10.10.20.10/24,gw=10.10.20.1 \
39    --nameserver 10.10.20.1 \
40    --sshkeys ~/.ssh/id_ed25519.pub
41
42qm start $VMID
43echo "VM $VMID started. SSH: ssh resham@10.10.20.10"

Once the VM is up, SSH in and run the initial setup:


Bash
14 lines
1ssh resham@10.10.20.10
2
3# System updates
4sudo apt update && sudo apt full-upgrade -y
5
6# Install Docker
7curl -fsSL https://get.docker.com | sudo bash
8sudo usermod -aG docker resham
9
10# Install Docker Compose plugin
11sudo apt install -y docker-compose-plugin
12
13# Log out and back in for group changes
14exit

The Docker Compose Setup

Here's the full

CODE

1 line

docker-compose.yml

that runs everything:


YAML
79 lines
1# ~/openclaw-deploy/docker-compose.yml
2version: '3.8'
3
4services:
5  openclaw:
6    image: openclaw/openclaw:latest
7    container_name: openclaw-app
8    restart: unless-stopped
9    ports:
10      - "3000:3000"
11    environment:
12      - DATABASE_URL=postgresql://openclaw:${POSTGRES_PASSWORD}@postgres:5432/openclaw
13      - OLLAMA_BASE_URL=http://ollama:11434
14      - DEFAULT_PROVIDER=ollama
15      - DEFAULT_MODEL=llama3:8b
16      - CLAUDE_API_KEY=${CLAUDE_API_KEY}
17      - OPENAI_API_KEY=${OPENAI_API_KEY}
18      - GROQ_API_KEY=${GROQ_API_KEY}
19      - SECRET_KEY=${SECRET_KEY}
20      - ENABLE_SIGNUP=false
21      - WEBUI_NAME="Resham's AI"
22    depends_on:
23      postgres:
24        condition: service_healthy
25      ollama:
26        condition: service_started
27    volumes:
28      - openclaw_data:/app/backend/data
29    networks:
30      - openclaw_net
31
32  ollama:
33    image: ollama/ollama:latest
34    container_name: openclaw-ollama
35    restart: unless-stopped
36    ports:
37      - "11434:11434"
38    volumes:
39      - ollama_models:/root/.ollama
40    environment:
41      - OLLAMA_NUM_PARALLEL=2
42      - OLLAMA_MAX_LOADED_MODELS=2
43    networks:
44      - openclaw_net
45    # Uncomment if you have a GPU:
46    # deploy:
47    #   resources:
48    #     reservations:
49    #       devices:
50    #         - driver: nvidia
51    #           count: 1
52    #           capabilities: [gpu]
53
54  postgres:
55    image: postgres:16-alpine
56    container_name: openclaw-postgres
57    restart: unless-stopped
58    environment:
59      - POSTGRES_USER=openclaw
60      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
61      - POSTGRES_DB=openclaw
62    volumes:
63      - postgres_data:/var/lib/postgresql/data
64    networks:
65      - openclaw_net
66    healthcheck:
67      test: ["CMD-SHELL", "pg_isready -U openclaw"]
68      interval: 10s
69      timeout: 5s
70      retries: 5
71
72volumes:
73  openclaw_data:
74  ollama_models:
75  postgres_data:
76
77networks:
78  openclaw_net:
79    driver: bridge

And the

CODE

1 line

.env

file — never commit this:


Bash
6 lines
1# ~/openclaw-deploy/.env
2POSTGRES_PASSWORD=your_strong_password_here
3SECRET_KEY=$(openssl rand -hex 32)
4CLAUDE_API_KEY=sk-ant-...
5OPENAI_API_KEY=sk-...
6GROQ_API_KEY=gsk_...

Start everything up:


Bash
2 lines
1cd ~/openclaw-deploy
2docker compose up -d

Terminal output showing Docker containers starting and Ollama models listed

Watch the containers come up, then pull the models:


Bash
8 lines
1# Pull models (this takes a while — get a coffee)
2docker exec openclaw-ollama ollama pull llama3:8b
3docker exec openclaw-ollama ollama pull mistral:7b
4docker exec openclaw-ollama ollama pull codellama:13b
5docker exec openclaw-ollama ollama pull deepseek-r1:7b
6
7# Verify everything is up
8docker exec openclaw-ollama ollama list

The first model pull will take a while depending on your internet speed. llama3:8b is 4.7GB, codellama:13b is 7.4GB. Go do something else. When you come back, you should see all models listed.

Pulling in the models took longer than I expected

I'll be transparent about one thing: the first time I pulled codellama:13b, it failed halfway through with a disk write error. Turns out I hadn't allocated enough disk space on the

CODE

1 line

ollama_models

volume. I had to:

Stop the containers
Resize the Proxmox VM disk:
CODE
1 line
qm resize 105 scsi0 +20G
Resize the partition inside the VM:
CODE
1 line
sudo growpart /dev/sda 1 && sudo resize2fs /dev/sda1
Restart and re-pull

Not a big deal once you know what happened, but it cost me 45 minutes of confused troubleshooting. The lesson: allocate more disk than you think you need upfront. Models are big.

Nginx Reverse Proxy with SSL

I use a wildcard SSL cert from Let's Encrypt for all my homelab services, managed by a Certbot container on my Nginx VM. Here's the OpenClaw Nginx config:


NGINX
52 lines
1# /etc/nginx/conf.d/openclaw.conf
2
3upstream openclaw_backend {
4    server 10.10.20.10:3000;
5    keepalive 32;
6}
7
8# HTTP → HTTPS redirect
9server {
10    listen 80;
11    server_name ai.homelab.local ai.reshamk.com;
12    return 301 https://$host$request_uri;
13}
14
15server {
16    listen 443 ssl http2;
17    server_name ai.homelab.local ai.reshamk.com;
18
19    ssl_certificate     /etc/letsencrypt/live/reshamk.com/fullchain.pem;
20    ssl_certificate_key /etc/letsencrypt/live/reshamk.com/privkey.pem;
21    ssl_protocols       TLSv1.2 TLSv1.3;
22    ssl_ciphers         ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
23
24    # Larger body size for file uploads to the agent
25    client_max_body_size 50M;
26
27    # WebSocket support (OpenClaw uses WS for streaming responses)
28    location / {
29        proxy_pass http://openclaw_backend;
30        proxy_http_version 1.1;
31        proxy_set_header Upgrade $http_upgrade;
32        proxy_set_header Connection "upgrade";
33        proxy_set_header Host $host;
34        proxy_set_header X-Real-IP $remote_addr;
35        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
36        proxy_set_header X-Forwarded-Proto $scheme;
37
38        # Streaming timeouts — important for LLM responses
39        proxy_read_timeout 300s;
40        proxy_send_timeout 300s;
41        proxy_connect_timeout 10s;
42    }
43
44    # Rate limiting on the API endpoint
45    location /api/ {
46        limit_req zone=api_limit burst=20 nodelay;
47        proxy_pass http://openclaw_backend;
48        proxy_http_version 1.1;
49        proxy_set_header Host $host;
50        proxy_set_header X-Real-IP $remote_addr;
51    }
52}

warning

[!WARNING] The proxy_read_timeout 300s on the streaming location is essential. LLM responses can take 30-120 seconds for longer generations, especially on CPU-only inference. Default Nginx timeout is 60 seconds — without this setting, you'll get 504 Gateway Timeout on anything longer than a short reply, which is incredibly confusing to debug.

Cloudflare Tunnel: Public Access Without Port Forwarding

My ISP doesn't give me a static IP and I'm on a shared network where port forwarding is a pain. Cloudflare Tunnel solves this beautifully — you install a lightweight

CODE

1 line

cloudflared

daemon that creates an outbound tunnel to Cloudflare's network, and they handle the public HTTPS endpoint on your behalf. No port forwarding, no static IP, no firewall holes.

Installing

CODE

1 line

cloudflared

on the VM:


Bash
29 lines
1# Install cloudflared
2curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb \
3    -o cloudflared.deb
4sudo dpkg -i cloudflared.deb
5
6# Authenticate with Cloudflare
7cloudflared tunnel login
8
9# Create the tunnel
10cloudflared tunnel create openclaw
11
12# Create DNS record
13cloudflared tunnel route dns openclaw ai.reshamk.com
14
15# Configure the tunnel
16mkdir -p ~/.cloudflared
17cat > ~/.cloudflared/config.yml << EOF
18tunnel: <your-tunnel-id>
19credentials-file: /home/resham/.cloudflared/<tunnel-id>.json
20
21ingress:
22  - hostname: ai.reshamk.com
23    service: http://localhost:3000
24  - service: http_status:404
25EOF
26
27# Install as a systemd service
28sudo cloudflared service install
29sudo systemctl enable --now cloudflared

That's it. Within a few seconds,

CODE

1 line

https://ai.reshamk.com

was pointing at my homelab OpenClaw instance, with Cloudflare handling SSL, DDoS protection, and hiding my home IP.

tip

[!TIP] Cloudflare Tunnel's free tier is genuinely excellent for homelab use. You get unlimited bandwidth, automatic SSL, and basic bot protection. The only things I'd watch out for: Cloudflare's Terms of Service doesn't allow using tunnels for commercial purposes (serving customers), and they'll terminate tunnels generating unusually high traffic. For personal use, it's been rock solid for me for over four months.

Real Performance on CPU-Only Inference

No GPU on my OptiPlexes. Everything runs on CPU. Here's what the real-world numbers look like after a few months of use:

OpenClaw Resource Monitor and Container Stats

The actual inference speeds on my i5-7500:

Model	Size	Tokens/sec	First token latency	Good for
llama3:8b	4.7 GB	~18 t/s	~1.2s	General chat, quick Q&A
mistral:7b	4.1 GB	~20 t/s	~1.0s	Fast responses, coding help
codellama:13b	7.4 GB	~9 t/s	~2.5s	Detailed code generation
deepseek-r1:7b	4.7 GB	~17 t/s	~1.5s	Reasoning, step-by-step problems

18 tokens/second for llama3:8b means a 200-word response takes about 11 seconds. That's not blazing fast, but it's genuinely usable for most tasks. For anything time-sensitive, I route to Groq (which runs llama3-70b at ~300 t/s via their LPU hardware). The model switching in OpenClaw makes this seamless — I just pick the model from a dropdown and the response speed changes accordingly.

The Killer Feature: Agent Tasks

The part that initially attracted me to OpenClaw over something like plain Ollama + Open WebUI is the autonomous agent functionality. I can give it a task like "research the latest trends in LLM agent architectures and summarize the top 5 papers from the last 3 months" and it'll autonomously browse the web, read papers, and synthesize a summary — without me holding its hand through each step.

I wired it up to a few integrations for my workflow:


YAML
20 lines
1# OpenClaw agent config (simplified)
2integrations:
3  - type: web_search
4    provider: searxng
5    url: http://10.10.20.15:8080  # Self-hosted SearXNG instance
6
7  - type: code_execution
8    provider: docker
9    image: python:3.12-slim
10    timeout: 30s
11
12  - type: file_access
13    paths:
14      - /home/resham/notes
15      - /home/resham/projects/kumari-ai/docs
16
17  - type: slack
18    webhook_url: ${SLACK_WEBHOOK_URL}
19    channels:
20      - "#personal-assistant"

note

[!NOTE] I use a self-hosted SearXNG instance for web search instead of connecting to Google or Bing APIs. SearXNG is a privacy-respecting metasearch engine that aggregates results from multiple sources without tracking you. It runs in another container on VLAN 20 and I give OpenClaw access to it through the local network. No external API keys for search, complete privacy. The image quality and result freshness are similar to DuckDuckGo in my experience.

What I Actually Use It For

After a few months, here's how OpenClaw has actually become part of my daily workflow:

Code review and rubber ducking. I paste code and ask it to critique it, suggest improvements, or explain what a function does. codellama:13b is surprisingly good at this for a 13B parameter model.

Research synthesis. When I'm reading about a new technology for Kumari.ai, I'll ask it to compare a few approaches and summarize the trade-offs. It uses the web search integration to pull current information rather than relying on training data.

Learning security concepts. As a cybersecurity student, I use it to explain vulnerability concepts, walk through CVE analyses, and help me understand attack patterns. Having this run locally means I can ask about sensitive topics (specific exploit techniques, malware behavior) without those queries going to an external API.

Generating boilerplate. Kubernetes YAML, Nginx configs, GitHub Actions workflows — anything repetitive that has a clear structure. I describe what I want and let it generate the first draft.

The One Thing That Annoyed Me

OpenClaw's model switching is seamless, but I kept running into an issue where the ollama container would silently run out of VRAM (well, RAM in my case) when I had both llama3:8b and codellama:13b loaded simultaneously. The container would just... silently fail and return empty responses.

The fix was setting two environment variables in the Ollama container:


Bash
2 lines
1OLLAMA_MAX_LOADED_MODELS=2      # Max simultaneous loaded models
2OLLAMA_NUM_PARALLEL=2            # Max parallel inference requests

And making sure the container had enough memory allocated. I bumped the Proxmox VM from 6GB to 8GB RAM, and since then it's been solid.

Monitoring: Know Before It Breaks

I added OpenClaw to my existing monitoring stack — Prometheus + Grafana on VLAN 40, Uptime Kuma for availability checks:


Bash
10 lines
1# Add to prometheus.yml scrape configs
2- job_name: 'openclaw'
3  static_configs:
4    - targets: ['10.10.20.10:3000']
5  metrics_path: /metrics
6
7- job_name: 'ollama'
8  static_configs:
9    - targets: ['10.10.20.10:11434']
10  metrics_path: /api/metrics

The Grafana dashboard I built tracks:

Response time per model — helps me notice when inference is degrading
Requests per hour — useful for understanding my own usage patterns
Container RAM usage — the most important metric; if ollama is approaching its memory limit, it starts evicting models and performance tanks
Uptime — Uptime Kuma pings
CODE
1 line
/health
every 60 seconds and sends me a Slack message if it goes down

The monitoring paid for itself in the first week. I noticed that response times were spiking to 45+ seconds late at night even for simple queries. Turned out my Proxmox backup job was running at 2am and hammering the disk, starving the ollama container of I/O. Moving the backup job to 4am when I'm definitely not using the AI fixed it.

Automating the Setup for the Next Time

After going through this setup once, I turned the whole thing into an Ansible playbook so I can reproduce it in minutes:


YAML
30 lines
1# roles/openclaw/tasks/main.yml
2- name: Create openclaw directory
3  file:
4    path: "{{ openclaw_dir }}"
5    state: directory
6    owner: "{{ ansible_user }}"
7
8- name: Deploy docker-compose.yml
9  template:
10    src: docker-compose.yml.j2
11    dest: "{{ openclaw_dir }}/docker-compose.yml"
12
13- name: Deploy .env file
14  template:
15    src: env.j2
16    dest: "{{ openclaw_dir }}/.env"
17    mode: '0600'
18
19- name: Start OpenClaw stack
20  community.docker.docker_compose_v2:
21    project_src: "{{ openclaw_dir }}"
22    state: present
23
24- name: Pull default models
25  community.docker.docker_container_exec:
26    container: openclaw-ollama
27    command: "ollama pull {{ item }}"
28  loop: "{{ openclaw_models }}"
29  async: 600
30  poll: 30

With this playbook, setting up a fresh instance goes from two hours of manual commands to about 20 minutes, mostly spent waiting for model downloads.

Is It Worth It?

Three months in, the honest answer is: absolutely yes, with caveats.

The wins:

Zero API costs for local models
Complete privacy for anything sensitive
Learned a ton about LLM inference and agent architecture
Available even when my internet goes down (local AI is surprisingly useful)
Satisfying in a way that purely cloud-based tools aren't

The caveats:

CPU-only inference is slow for large models. If you want GPT-4 tier quality at scale, you need a GPU or cloud APIs
Maintenance overhead is real. I've spent maybe 4-5 hours total on upkeep over three months — mostly on Ollama updates and the occasional container restart
Model quality at the 7B-13B scale is noticeably below GPT-4 or Claude for complex reasoning. I use external API fallbacks for anything that requires serious thinking

If you're building AI applications, running a local stack like this is invaluable for development. You can experiment freely, understand the infrastructure deeply, and build intuition for how LLM systems actually work. That's been directly useful for my work on Kumari.ai.

And if you're a student or early-career engineer looking to stand out — having a self-hosted AI agent running on your homelab infrastructure is a great conversation piece in any interview. It shows you understand the whole stack, from the VM and network configuration to the inference engine to the reverse proxy and monitoring. Not many people can say they've done that.

✦

If you're thinking about replicating this setup, feel free to reach out. I'm happy to share the Ansible playbooks or troubleshoot any of the steps above. The OpenClaw community Discord is also genuinely helpful — much better signal-to-noise ratio than most tech communities I've been part of.

Now if you'll excuse me, I need to go ask codellama to review the FastAPI endpoint I just wrote.

I Self-Hosted OpenClaw on My Proxmox Homelab — Here's Everything I Learned

Why Bother Self-Hosting? (And Why I Didn't Start Here)

The Migration: From Claude API to Ollama (And What I Kept)

The Architecture

Setting Up the Proxmox VM

The Docker Compose Setup

Pulling in the models took longer than I expected

Nginx Reverse Proxy with SSL

Cloudflare Tunnel: Public Access Without Port Forwarding

Real Performance on CPU-Only Inference

The Killer Feature: Agent Tasks

What I Actually Use It For

The One Thing That Annoyed Me

Monitoring: Know Before It Breaks

Automating the Setup for the Next Time

Is It Worth It?

Continue Reading

Getting Started with Next.js 16

Complete Guide to Prisma with PostgreSQL

How I Built a Zero-Downtime CI/CD Pipeline with Jenkins, Docker, and Nginx