Back to Homelab
Jan 4, 2026|12 min read

Automating My Entire Homelab with Ansible and Terraform — So I Never Configure Anything Twice

How I went from SSH-ing into 30 machines manually to managing everything with Ansible playbooks and Terraform, including the mistakes that made me finally commit to infrastructure as code.

AnsibleTerraformIaCHomelabDevOpsProxmoxAutomationLinux

I want to tell you about the worst Saturday of my homelab career.

I was upgrading node_exporter on all my machines. Manually. Over SSH. One machine at a time. I was on the fourteenth machine when I realized I'd forgotten to restart the service on machine number three. So I went back, restarted it, then couldn't remember if I'd finished machine number nine. I SSH-ed into nine, checked, realized I'd upgraded the binary but hadn't updated the systemd service file, so the new version was running with old flags. Then I found the same issue on machines four through eight.

Three hours. Three hours to update a single binary across my homelab. And at the end, I still wasn't sure every machine was consistent.

That Sunday, I installed Ansible.

The Evolution

My homelab automation went through three distinct phases:

Phase 1: Bash scripts and SSH (months 1-6)

Bash
1# The old way. Don't do this. 2for host in pve1 pve2 pve3 r720 nas devbox; do 3 ssh root@$host "apt update && apt upgrade -y" 4done 5 6# "It works" but: 7# - No error handling 8# - No idempotency (run twice, get different results) 9# - No logging 10# - No state tracking 11# - No rollback

Phase 2: Ansible playbooks (months 6-14) Configuration management for everything after the VM exists.

Phase 3: Terraform + Ansible (month 14+) Terraform provisions the VMs, Ansible configures them. The full stack.

The Git Repository

Everything lives in a single Git repo:

Bash
1resham@devbox:~/homelab-iac$ tree -L 2 2. 3├── README.md 4├── ansible/ 5│ ├── ansible.cfg 6│ ├── inventory/ 7│ │ ├── hosts.yml 8│ │ └── group_vars/ 9│ ├── playbooks/ 10│ │ ├── site.yml # Master playbook 11│ │ ├── common.yml # Base config for ALL machines 12│ │ ├── proxmox-nodes.yml # Proxmox-specific 13│ │ ├── docker-hosts.yml # Docker setup 14│ │ ├── monitoring.yml # Prometheus + Grafana stack 15│ │ ├── nas.yml # ZFS NAS config 16│ │ └── security-lab.yml # Kali + targets 17│ ├── roles/ 18│ │ ├── common/ # SSH, users, fail2ban, ntp 19│ │ ├── docker/ # Docker Engine + Compose 20│ │ ├── monitoring/ # node_exporter, promtail 21│ │ ├── nginx/ # Reverse proxy + SSL 22│ │ ├── backup/ # PBS client, rclone 23│ │ ├── zfs/ # ZFS tuning, scrub schedules 24│ │ └── hardening/ # CIS benchmarks, audit rules 25│ └── templates/ 26│ ├── sshd_config.j2 27│ ├── prometheus.yml.j2 28│ ├── node_exporter.service.j2 29│ └── ... 30├── terraform/ 31│ ├── main.tf 32│ ├── variables.tf 33│ ├── outputs.tf 34│ ├── provider.tf 35│ ├── vms.tf 36│ ├── lxc.tf 37│ └── terraform.tfvars 38└── scripts/ 39 ├── bootstrap.sh # First-time setup 40 ├── deploy.sh # Run terraform + ansible 41 └── unlock-vault.sh # Decrypt ansible-vault

Infrastructure as Code architecture showing Git, Terraform, and Ansible flow
Infrastructure as Code architecture showing Git, Terraform, and Ansible flow

Ansible: The Configuration Layer

Inventory

The inventory file maps every machine in my homelab:

YAML
1# ansible/inventory/hosts.yml 2all: 3 children: 4 proxmox_nodes: 5 hosts: 6 pve1: 7 ansible_host: 10.10.10.11 8 pve2: 9 ansible_host: 10.10.10.12 10 pve3: 11 ansible_host: 10.10.10.13 12 r720: 13 ansible_host: 10.10.10.14 14 15 nas: 16 hosts: 17 nas01: 18 ansible_host: 10.10.10.20 19 20 workstation: 21 hosts: 22 devbox: 23 ansible_host: 10.10.50.1 24 ansible_user: resham 25 26 vms: 27 children: 28 docker_hosts: 29 hosts: 30 docker-host: 31 ansible_host: 10.10.20.5 32 openclaw: 33 ansible_host: 10.10.20.10 34 services: 35 hosts: 36 nginx-proxy: 37 ansible_host: 10.10.20.2 38 gitea: 39 ansible_host: 10.10.20.3 40 jenkins: 41 ansible_host: 10.10.20.4 42 43 security_lab: 44 hosts: 45 kali: 46 ansible_host: 10.10.30.10

The Common Role

This role runs on every single machine. It's the baseline that I never want to configure manually again:

YAML
1# ansible/roles/common/tasks/main.yml 2--- 3- name: Set timezone 4 timezone: 5 name: America/Chicago 6 7- name: Install base packages 8 package: 9 name: "{{ common_packages }}" 10 state: present 11 12- name: Configure SSH 13 template: 14 src: sshd_config.j2 15 dest: /etc/ssh/sshd_config 16 validate: "sshd -t -f %s" 17 notify: restart sshd 18 19- name: Deploy SSH authorized keys 20 authorized_key: 21 user: "{{ ansible_user | default('root') }}" 22 key: "{{ lookup('file', '~/.ssh/id_ed25519.pub') }}" 23 exclusive: true 24 25- name: Configure fail2ban 26 template: 27 src: jail.local.j2 28 dest: /etc/fail2ban/jail.local 29 notify: restart fail2ban 30 31- name: Configure NTP 32 template: 33 src: chrony.conf.j2 34 dest: "{{ chrony_config_path }}" 35 notify: restart chrony 36 37- name: Set up automatic security updates 38 include_tasks: "auto-updates-{{ ansible_os_family | lower }}.yml" 39 40- name: Install and configure node_exporter 41 include_role: 42 name: monitoring 43 tasks_from: node_exporter
YAML
1# ansible/roles/common/vars/Debian.yml 2common_packages: 3 - curl 4 - wget 5 - vim 6 - htop 7 - tmux 8 - git 9 - jq 10 - unzip 11 - fail2ban 12 - chrony 13 - ufw 14 15chrony_config_path: /etc/chrony/chrony.conf 16 17# ansible/roles/common/vars/Archlinux.yml 18common_packages: 19 - curl 20 - wget 21 - vim 22 - htop 23 - tmux 24 - git 25 - jq 26 - unzip 27 - fail2ban 28 - chrony 29 30chrony_config_path: /etc/chrony.conf

The

CODE
validate
parameter on the SSH config template is critical — it runs
CODE
sshd -t
to syntax-check the config before deploying it. Without this, a typo in the SSH config will lock you out of the machine. I learned this the hard way when I accidentally deployed a config with
CODE
PermitRootLogin
misspelled and couldn't SSH into three Proxmox nodes.

warning
[!WARNING] Always validate your SSH config before deploying it. A broken sshd_config means you lose SSH access. On a Proxmox node, you can recover via the web console. On a remote machine without out-of-band access, you're driving to the data center (or in my case, walking to the closet with a keyboard and monitor).

The Docker Role

YAML
1# ansible/roles/docker/tasks/main.yml 2--- 3- name: Install Docker prerequisites 4 apt: 5 name: 6 - ca-certificates 7 - curl 8 - gnupg 9 state: present 10 when: ansible_os_family == "Debian" 11 12- name: Add Docker GPG key 13 apt_key: 14 url: https://download.docker.com/linux/{{ ansible_distribution | lower }}/gpg 15 state: present 16 when: ansible_os_family == "Debian" 17 18- name: Add Docker repository 19 apt_repository: 20 repo: "deb https://download.docker.com/linux/{{ ansible_distribution | lower }} {{ ansible_distribution_release }} stable" 21 state: present 22 when: ansible_os_family == "Debian" 23 24- name: Install Docker 25 package: 26 name: 27 - docker-ce 28 - docker-ce-cli 29 - containerd.io 30 - docker-compose-plugin 31 state: present 32 33- name: Add user to docker group 34 user: 35 name: "{{ ansible_user | default('resham') }}" 36 groups: docker 37 append: true 38 39- name: Configure Docker daemon 40 template: 41 src: daemon.json.j2 42 dest: /etc/docker/daemon.json 43 notify: restart docker 44 45- name: Enable Docker service 46 systemd: 47 name: docker 48 enabled: true 49 state: started
JSON
1// ansible/roles/docker/templates/daemon.json.j2 2{ 3 "log-driver": "json-file", 4 "log-opts": { 5 "max-size": "10m", 6 "max-file": "3" 7 }, 8 "default-address-pools": [ 9 { 10 "base": "172.17.0.0/16", 11 "size": 24 12 } 13 ], 14 "metrics-addr": "0.0.0.0:9323", 15 "experimental": true 16}

Secrets with Ansible Vault

All sensitive values (passwords, API keys, SSH keys) are encrypted with Ansible Vault:

Bash
1# Create the vault 2ansible-vault create ansible/inventory/group_vars/all/vault.yml 3 4# Contents (encrypted at rest): 5vault_grafana_password: "my-strong-password" 6vault_postgres_password: "another-strong-password" 7vault_slack_webhook: "https://hooks.slack.com/..." 8vault_b2_app_key: "..." 9vault_idrac_password: "..."
Bash
1# Run playbook with vault decryption 2ansible-playbook ansible/playbooks/site.yml --ask-vault-pass 3 4# Or use a password file (for automation) 5ansible-playbook ansible/playbooks/site.yml \ 6 --vault-password-file ~/.ansible-vault-password

Terraform: The Infrastructure Layer

Ansible configures machines that already exist. But who creates the VMs in the first place? That used to be me, clicking through the Proxmox web UI. Now it's Terraform.

I use the bpg/proxmox Terraform provider, which talks to the Proxmox API:

HCL
1# terraform/provider.tf 2terraform { 3 required_providers { 4 proxmox = { 5 source = "bpg/proxmox" 6 version = ">= 0.55.0" 7 } 8 } 9} 10 11provider "proxmox" { 12 endpoint = "https://10.10.10.11:8006" 13 username = "terraform@pam" 14 password = var.proxmox_password 15 insecure = true # Self-signed cert on homelab 16 17 ssh { 18 agent = true 19 } 20}

Defining VMs as Code

HCL
1# terraform/vms.tf 2 3# Cloud-init template (created once, used by all VMs) 4resource "proxmox_virtual_environment_file" "ubuntu_cloud_image" { 5 content_type = "iso" 6 datastore_id = "local" 7 node_name = "pve1" 8 9 source_file { 10 path = "https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img" 11 file_name = "ubuntu-24.04-cloud.img" 12 } 13} 14 15# Development VM 16resource "proxmox_virtual_environment_vm" "dev_ubuntu" { 17 name = "dev-ubuntu" 18 node_name = "r720" 19 vm_id = 104 20 21 agent { 22 enabled = true 23 } 24 25 cpu { 26 cores = 4 27 type = "host" 28 } 29 30 memory { 31 dedicated = 16384 32 } 33 34 disk { 35 datastore_id = "local-lvm" 36 size = 100 37 interface = "scsi0" 38 } 39 40 network_device { 41 bridge = "vmbr0" 42 vlan_id = 20 43 } 44 45 initialization { 46 ip_config { 47 ipv4 { 48 address = "10.10.20.5/24" 49 gateway = "10.10.20.1" 50 } 51 } 52 53 user_account { 54 username = "resham" 55 keys = [file("~/.ssh/id_ed25519.pub")] 56 } 57 } 58 59 clone { 60 vm_id = 9000 # Ubuntu 24.04 template 61 } 62 63 tags = ["docker", "development"] 64} 65 66# Jenkins VM 67resource "proxmox_virtual_environment_vm" "jenkins" { 68 name = "jenkins" 69 node_name = "pve2" 70 vm_id = 103 71 72 cpu { 73 cores = 4 74 type = "host" 75 } 76 77 memory { 78 dedicated = 4096 79 } 80 81 disk { 82 datastore_id = "local-lvm" 83 size = 50 84 interface = "scsi0" 85 } 86 87 network_device { 88 bridge = "vmbr0" 89 vlan_id = 20 90 } 91 92 initialization { 93 ip_config { 94 ipv4 { 95 address = "10.10.20.4/24" 96 gateway = "10.10.20.1" 97 } 98 } 99 100 user_account { 101 username = "resham" 102 keys = [file("~/.ssh/id_ed25519.pub")] 103 } 104 } 105 106 clone { 107 vm_id = 9000 108 } 109 110 tags = ["cicd", "services"] 111}

LXC Containers

HCL
1# terraform/lxc.tf 2 3resource "proxmox_virtual_environment_container" "prometheus" { 4 description = "Prometheus monitoring server" 5 node_name = "pve1" 6 vm_id = 200 7 8 initialization { 9 hostname = "prometheus" 10 11 ip_config { 12 ipv4 { 13 address = "10.10.40.10/24" 14 gateway = "10.10.40.1" 15 } 16 } 17 } 18 19 cpu { 20 cores = 2 21 } 22 23 memory { 24 dedicated = 1024 25 swap = 0 26 } 27 28 disk { 29 datastore_id = "local-lvm" 30 size = 16 31 } 32 33 network_interface { 34 name = "eth0" 35 bridge = "vmbr0" 36 vlan_id = 40 37 } 38 39 operating_system { 40 template_file_id = "local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst" 41 type = "ubuntu" 42 } 43 44 tags = ["monitoring"] 45}

The Deploy Script

The magic is in the deploy script that chains Terraform and Ansible together:

Bash
1#!/bin/bash 2# scripts/deploy.sh 3set -euo pipefail 4 5echo "=== Homelab IaC Deploy ===" 6echo "Started at $(date)" 7 8# Step 1: Terraform — provision infrastructure 9echo "" 10echo "[1/3] Running Terraform..." 11cd terraform 12terraform init -upgrade 13terraform plan -out=plan.tfplan 14terraform apply plan.tfplan 15cd .. 16 17# Step 2: Wait for VMs to boot and be reachable 18echo "" 19echo "[2/3] Waiting for machines to be reachable..." 20ansible all -m ping --timeout 30 || { 21 echo "Some hosts unreachable. Waiting 30s and retrying..." 22 sleep 30 23 ansible all -m ping --timeout 30 24} 25 26# Step 3: Ansible — configure everything 27echo "" 28echo "[3/3] Running Ansible..." 29ansible-playbook ansible/playbooks/site.yml \ 30 --vault-password-file ~/.ansible-vault-password 31 32echo "" 33echo "=== Deploy complete at $(date) ==="
Bash
1# Usage: 2resham@devbox:~/homelab-iac$ ./scripts/deploy.sh 3 4=== Homelab IaC Deploy === 5Started at Sun Jan 5 14:30:00 CST 2026 6 7[1/3] Running Terraform... 8proxmox_virtual_environment_vm.dev_ubuntu: Refreshing state... 9proxmox_virtual_environment_vm.jenkins: Refreshing state... 10# ... (no changes needed — all VMs already exist) 11 12Apply complete! Resources: 0 added, 0 changed, 0 destroyed. 13 14[2/3] Waiting for machines to be reachable... 15pve1 | SUCCESS 16pve2 | SUCCESS 17pve3 | SUCCESS 18r720 | SUCCESS 19nas01 | SUCCESS 20devbox | SUCCESS 21docker-host | SUCCESS 22# ... 23 24[3/3] Running Ansible... 25PLAY [all] ***** 26TASK [common : Set timezone] *** 27ok: [pve1] 28ok: [pve2] 29# ... (lots of "ok" because everything is already configured) 30 31PLAY RECAP ***** 32pve1 : ok=24 changed=0 unreachable=0 failed=0 33pve2 : ok=24 changed=0 unreachable=0 failed=0 34pve3 : ok=24 changed=0 unreachable=0 failed=0 35r720 : ok=24 changed=0 unreachable=0 failed=0 36nas01 : ok=18 changed=0 unreachable=0 failed=0 37devbox : ok=15 changed=0 unreachable=0 failed=0 38 39=== Deploy complete at Sun Jan 5 14:32:45 CST 2026 ===

Two minutes and 45 seconds to verify the entire homelab is in the desired state. Every machine, every package, every config file. And because Ansible is idempotent, running it when nothing has changed is essentially a no-op — it just verifies and moves on.

The Disaster Recovery Test

The real test of infrastructure as code is: can you rebuild everything from scratch?

I tested this by intentionally destroying a VM (the Docker host) and rebuilding it entirely through the IaC pipeline:

Bash
1# Destroy the VM 2terraform destroy -target=proxmox_virtual_environment_vm.dev_ubuntu 3 4# Recreate it 5terraform apply 6 7# Configure it 8ansible-playbook ansible/playbooks/docker-hosts.yml \ 9 --limit docker-host \ 10 --vault-password-file ~/.ansible-vault-password 11 12# Time: 6 minutes 23 seconds from destroy to fully configured

Six minutes. From an empty VM to a fully configured Docker host with all services running. Before Ansible and Terraform, rebuilding this machine took me about three hours of manual configuration.

tip
[!TIP] Test your IaC regularly. I destroy and rebuild a random VM every month as a drill. It catches drift (manual changes that aren't in the playbooks), broken templates, and assumptions that no longer hold. The confidence that "I can rebuild anything in minutes" is worth the 10 minutes it takes to verify.

What I'd Add Next

1. GitOps workflow. Right now I run

CODE
deploy.sh
manually. I want to set up a GitHub Actions workflow that runs
CODE
terraform plan
on every PR and
CODE
terraform apply + ansible-playbook
on merge to main. The homelab would update itself when I push code.

2. Dynamic inventory. My Ansible inventory is static. The Proxmox API can generate inventory dynamically (list all VMs, their IPs, their tags), which would mean I never need to update the inventory file when I add a new VM.

3. Molecule testing. I want to test Ansible roles in CI before deploying them. Molecule can spin up Docker containers, run roles against them, and verify the result. Right now my testing is "run it and see if it breaks."

4. Vault integration. Ansible Vault is fine for a homelab, but HashiCorp Vault would give me dynamic secrets, auto-rotation, and audit logging. Overkill? Probably. But I want to learn it.

The Numbers

MetricBefore IaCAfter IaC
Time to update all machines2-3 hours (manual SSH)2 min 45 sec
Time to rebuild a VM~3 hours~6 minutes
Configuration driftConstant (every machine slightly different)Zero (Ansible enforces state)
"Did I update that machine?" uncertaintyAlwaysNever
Times I've been locked out by bad config30 (validate before deploy)
Hours spent on maintenance per month~8 hours~1 hour

The ROI is absurd. I spent maybe 40 hours building the Ansible playbooks and Terraform configs. That investment saves me 7+ hours every month. It paid for itself in under six months, and now it's pure time savings.

If you're managing more than three machines manually, you need configuration management. It doesn't have to be Ansible — Chef, Puppet, Salt, and even shell scripts with a proper framework are all valid. But the core principle is the same: define your infrastructure in code, store it in git, and never configure anything by hand.

The Saturday I spent three hours updating node_exporter was the most expensive Saturday of my homelab career, measured in time wasted. The Sunday I spent setting up Ansible was the most productive. Every Saturday since then, my homelab takes care of itself while I do something I actually enjoy.

Like writing overly detailed blog posts about my homelab.