Back to blog
DevOpsCI/CDJenkinsDockerNginxLinux

How I Built a Zero-Downtime CI/CD Pipeline with Jenkins, Docker, and Nginx

A detailed walkthrough of building a production CI/CD pipeline with blue-green deployments, automated rollbacks, and zero downtime — from someone who learned the hard way after a Friday deploy took down the app for 40 minutes.

10 min read

I still remember the Friday evening when I pushed a "small fix" directly to production. The app went down. For 40 minutes. On a Friday. My phone was blowing up with alerts and I was frantically SSH-ing into the server trying to figure out what went wrong. Turns out, the new container started before the old one fully stopped, there was a port conflict, and Nginx was routing traffic to a dead upstream.

That night I decided: never again. I was going to build a proper CI/CD pipeline with zero-downtime deployments, automated health checks, and instant rollbacks. This post is the result of about three weeks of iterations, late nights, and a lot of

CODE
docker logs
commands.

The Architecture

Before diving into the setup, here's the high-level view of what we're building:

CI/CD Pipeline Architecture
CI/CD Pipeline Architecture

The flow is straightforward: push to GitHub, Jenkins picks it up via webhook, builds a Docker image, runs tests, pushes to a registry, deploys to staging, runs health checks, and if everything passes — swaps traffic to the new container using Nginx. If anything fails, we roll back automatically.

The Stack

Here's what I'm working with:

ComponentToolWhy
CI ServerJenkins (self-hosted)Free, extensible, I already had it running on my homelab
ContainerizationDocker + Docker ComposeConsistent environments, easy rollbacks
Reverse ProxyNginxBlue-green traffic switching, SSL termination
Source ControlGitHubWebhook triggers on push
MonitoringPrometheus + GrafanaNeed to know when things break before users tell me
NotificationsSlack webhookBecause I check Slack more than email

Setting Up Jenkins with Docker

I run Jenkins itself inside a Docker container on my Proxmox homelab. Here's the compose file:

YAML
1# docker-compose.jenkins.yml 2version: '3.8' 3 4services: 5 jenkins: 6 image: jenkins/jenkins:lts 7 container_name: jenkins 8 restart: unless-stopped 9 ports: 10 - "8080:8080" 11 - "50000:50000" 12 volumes: 13 - jenkins_home:/var/jenkins_home 14 - /var/run/docker.sock:/var/run/docker.sock 15 environment: 16 - JAVA_OPTS=-Xmx512m 17 networks: 18 - cicd 19 20volumes: 21 jenkins_home: 22 23networks: 24 cicd: 25 driver: bridge
warning
[!WARNING] Mounting the Docker socket (/var/run/docker.sock) gives Jenkins root-level access to the host's Docker daemon. In a production environment, you'd want to use something like Docker-in-Docker or a remote Docker host. For my homelab, I accepted the risk since the network is isolated.

After starting Jenkins, I installed these plugins:

  • Docker Pipeline
  • GitHub Integration
  • Slack Notification
  • Blue Ocean (for the nice UI)

The Jenkinsfile

This is where most of the magic happens. I went through probably 15 revisions of this file before I was happy with it:

GROOVY
1pipeline { 2 agent any 3 4 environment { 5 DOCKER_REGISTRY = 'registry.homelab.local:5000' 6 APP_NAME = 'myapp' 7 SLACK_CHANNEL = '#deployments' 8 } 9 10 stages { 11 stage('Checkout') { 12 steps { 13 checkout scm 14 script { 15 env.GIT_COMMIT_SHORT = sh( 16 script: 'git rev-parse --short HEAD', 17 returnStdout: true 18 ).trim() 19 env.IMAGE_TAG = "${env.APP_NAME}:${env.GIT_COMMIT_SHORT}" 20 } 21 } 22 } 23 24 stage('Build') { 25 steps { 26 sh """ 27 docker build \ 28 --build-arg BUILD_DATE=\$(date -u +'%Y-%m-%dT%H:%M:%SZ') \ 29 --build-arg VCS_REF=${env.GIT_COMMIT_SHORT} \ 30 -t ${env.DOCKER_REGISTRY}/${env.IMAGE_TAG} \ 31 -t ${env.DOCKER_REGISTRY}/${env.APP_NAME}:latest \ 32 . 33 """ 34 } 35 } 36 37 stage('Test') { 38 steps { 39 sh """ 40 docker run --rm \ 41 ${env.DOCKER_REGISTRY}/${env.IMAGE_TAG} \ 42 python -m pytest tests/ -v --tb=short 43 """ 44 } 45 } 46 47 stage('Push') { 48 steps { 49 sh """ 50 docker push ${env.DOCKER_REGISTRY}/${env.IMAGE_TAG} 51 docker push ${env.DOCKER_REGISTRY}/${env.APP_NAME}:latest 52 """ 53 } 54 } 55 56 stage('Deploy to Staging') { 57 steps { 58 sh './scripts/deploy.sh staging ${IMAGE_TAG}' 59 } 60 } 61 62 stage('Health Check') { 63 steps { 64 script { 65 def healthy = false 66 for (int i = 0; i < 10; i++) { 67 def status = sh( 68 script: 'curl -s -o /dev/null -w "%{http_code}" http://staging.homelab.local/healthz', 69 returnStdout: true 70 ).trim() 71 if (status == '200') { 72 healthy = true 73 break 74 } 75 sleep(time: 3, unit: 'SECONDS') 76 } 77 if (!healthy) { 78 error('Health check failed after 30 seconds') 79 } 80 } 81 } 82 } 83 84 stage('Deploy to Production') { 85 when { 86 branch 'main' 87 } 88 steps { 89 sh './scripts/deploy.sh production ${IMAGE_TAG}' 90 sh './scripts/blue-green-swap.sh' 91 } 92 } 93 } 94 95 post { 96 success { 97 slackSend( 98 channel: env.SLACK_CHANNEL, 99 color: 'good', 100 message: "Deployed ${env.IMAGE_TAG} to production" 101 ) 102 } 103 failure { 104 sh './scripts/rollback.sh' 105 slackSend( 106 channel: env.SLACK_CHANNEL, 107 color: 'danger', 108 message: "Deploy FAILED for ${env.IMAGE_TAG} — rolled back" 109 ) 110 } 111 } 112}

The Blue-Green Deployment Script

This was the trickiest part to get right. The idea is simple: run two copies of your app (blue and green), and swap Nginx's upstream to point at whichever one has the new code. Here's the deploy script:

Bash
1#!/bin/bash 2# scripts/blue-green-swap.sh 3 4set -euo pipefail 5 6NGINX_CONF="/etc/nginx/conf.d/app.conf" 7CURRENT=$(grep -oP 'upstream_\K(blue|green)' $NGINX_CONF | head -1) 8 9if [ "$CURRENT" = "blue" ]; then 10 NEW="green" 11else 12 NEW="blue" 13fi 14 15echo "[$(date)] Swapping from $CURRENT to $NEW" 16 17# Start the new container 18docker compose -f docker-compose.prod.yml up -d "app_${NEW}" 19 20# Wait for new container to be healthy 21echo "Waiting for app_${NEW} to pass health check..." 22for i in $(seq 1 20); do 23 if curl -sf "http://localhost:${NEW_PORT}/healthz" > /dev/null 2>&1; then 24 echo "app_${NEW} is healthy after ${i} attempts" 25 break 26 fi 27 if [ "$i" -eq 20 ]; then 28 echo "FATAL: app_${NEW} failed health check" 29 docker compose -f docker-compose.prod.yml stop "app_${NEW}" 30 exit 1 31 fi 32 sleep 2 33done 34 35# Swap Nginx upstream 36sed -i "s/upstream_${CURRENT}/upstream_${NEW}/g" $NGINX_CONF 37nginx -t && nginx -s reload 38 39echo "[$(date)] Traffic now routing to app_${NEW}" 40 41# Keep old container running for 60s in case we need quick rollback 42echo "Keeping app_${CURRENT} alive for 60s as rollback safety net..." 43sleep 60 44docker compose -f docker-compose.prod.yml stop "app_${CURRENT}" 45echo "[$(date)] Stopped app_${CURRENT}. Deployment complete."

And the corresponding Nginx config:

NGINX
1# /etc/nginx/conf.d/app.conf 2 3upstream upstream_blue { 4 server 127.0.0.1:8001; 5} 6 7upstream upstream_green { 8 server 127.0.0.1:8002; 9} 10 11server { 12 listen 80; 13 server_name app.homelab.local; 14 15 # Active upstream — swap this between blue/green 16 location / { 17 proxy_pass http://upstream_blue; 18 proxy_set_header Host $host; 19 proxy_set_header X-Real-IP $remote_addr; 20 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 21 proxy_set_header X-Forwarded-Proto $scheme; 22 23 # Timeouts 24 proxy_connect_timeout 5s; 25 proxy_read_timeout 30s; 26 proxy_send_timeout 30s; 27 } 28 29 location /healthz { 30 proxy_pass http://upstream_blue/healthz; 31 access_log off; 32 } 33}

The Rollback Script

Rollbacks should be boring. Fast and boring. Here's what mine looks like:

Bash
1#!/bin/bash 2# scripts/rollback.sh 3 4set -euo pipefail 5 6echo "[$(date)] ROLLBACK INITIATED" 7 8NGINX_CONF="/etc/nginx/conf.d/app.conf" 9CURRENT=$(grep -oP 'upstream_\K(blue|green)' $NGINX_CONF | head -1) 10 11if [ "$CURRENT" = "blue" ]; then 12 ROLLBACK_TO="green" 13else 14 ROLLBACK_TO="blue" 15fi 16 17# Check if the previous container is still running 18if docker ps --format '{{.Names}}' | grep -q "app_${ROLLBACK_TO}"; then 19 echo "Previous container app_${ROLLBACK_TO} is still running. Quick rollback!" 20 sed -i "s/upstream_${CURRENT}/upstream_${ROLLBACK_TO}/g" $NGINX_CONF 21 nginx -t && nginx -s reload 22 echo "[$(date)] Rolled back to app_${ROLLBACK_TO} in <5 seconds" 23else 24 echo "Previous container stopped. Starting from last known good image..." 25 docker compose -f docker-compose.prod.yml up -d "app_${ROLLBACK_TO}" 26 sleep 10 27 sed -i "s/upstream_${CURRENT}/upstream_${ROLLBACK_TO}/g" $NGINX_CONF 28 nginx -t && nginx -s reload 29 echo "[$(date)] Rolled back to app_${ROLLBACK_TO} (cold start: ~15 seconds)" 30fi
tip
[!TIP] The 60-second grace period in the deploy script is key. If the new deployment breaks within a minute, the rollback is nearly instant because the old container is still running. After that window, rollback takes about 15 seconds for a cold start. Both are way better than the 40 minutes I spent manually fixing things that Friday night.

The Health Check Endpoint

Don't skip this. Your health check should actually check things, not just return 200:

Python
1# app/health.py 2from fastapi import APIRouter 3from datetime import datetime 4import redis 5import psycopg2 6 7router = APIRouter() 8 9@router.get("/healthz") 10async def health_check(): 11 checks = {} 12 13 # Database check 14 try: 15 conn = psycopg2.connect(os.environ["DATABASE_URL"]) 16 conn.close() 17 checks["database"] = "ok" 18 except Exception as e: 19 checks["database"] = f"error: {str(e)}" 20 21 # Redis check 22 try: 23 r = redis.from_url(os.environ["REDIS_URL"]) 24 r.ping() 25 checks["redis"] = "ok" 26 except Exception as e: 27 checks["redis"] = f"error: {str(e)}" 28 29 all_healthy = all(v == "ok" for v in checks.values()) 30 31 return { 32 "status": "healthy" if all_healthy else "degraded", 33 "timestamp": datetime.utcnow().isoformat(), 34 "checks": checks, 35 "version": os.environ.get("APP_VERSION", "unknown") 36 }

Here's what a successful health check response looks like:

JSON
1{ 2 "status": "healthy", 3 "timestamp": "2024-06-14T22:15:03.441Z", 4 "checks": { 5 "database": "ok", 6 "redis": "ok" 7 }, 8 "version": "a3f82d1" 9}

Monitoring the Pipeline

I set up a simple Grafana dashboard that tracks:

  • Build duration per stage
  • Deploy frequency (deploys per day/week)
  • Rollback rate
  • Container resource usage post-deploy

The Prometheus config scrapes both the app and Jenkins:

YAML
1# prometheus.yml 2scrape_configs: 3 - job_name: 'jenkins' 4 metrics_path: /prometheus 5 static_configs: 6 - targets: ['jenkins.homelab.local:8080'] 7 8 - job_name: 'app' 9 static_configs: 10 - targets: ['app.homelab.local:9090'] 11 scrape_interval: 15s

Results After Three Months

Here's what changed after implementing this pipeline:

MetricBeforeAfter
Average deploy time~15 min (manual)3 min 20s
Downtime per deploy30-90 seconds0 seconds
Rollback time5-40 min (manual SSH)< 30 seconds
Failed deploys caughtAfter users complainedBefore hitting production
Deploys per week2-3 (scared to deploy)8-12 (deploy with confidence)

The biggest win isn't even the zero-downtime part. It's the confidence. When you know that a failed deploy will automatically roll back and alert you on Slack, you stop being afraid to push code. You deploy smaller changes more frequently, which means less risk per deploy.

Lessons Learned

1. Start simple, iterate. My first version of this pipeline was just "Jenkins builds Docker image and restarts the container." That's fine. I added blue-green deploys after the first incident, health checks after the second, and monitoring after I got tired of SSH-ing into the server to check if things were working.

2. Health checks need to check real dependencies. A

CODE
/healthz
endpoint that just returns 200 is useless. Mine checks the database connection and Redis. If either is down, the deploy won't proceed.

3. Keep rollback dead simple. The whole point of having a rollback script is that you'll use it at 2am when you're half asleep. It should be one command, no arguments, no confirmation prompts.

4. Docker image tagging matters. Always tag with the git commit hash, not just

CODE
latest
. When you need to roll back, you need to know exactly which version to go back to.
CODE
latest
tells you nothing.

5. Monitor the pipeline itself. Jenkins going down silently is worse than no CI/CD at all, because you'll push code thinking it was tested and deployed when it wasn't.

What I'd Do Differently

If I were building this from scratch today, I'd probably use GitHub Actions instead of Jenkins. Jenkins is powerful but it's a lot of maintenance — Java updates, plugin compatibility issues, and it eats RAM like crazy. For a personal project or small team, GitHub Actions with self-hosted runners would give me 90% of the functionality with 10% of the maintenance.

I'd also look into ArgoCD if I were running Kubernetes instead of plain Docker Compose. The GitOps approach of "the git repo is the source of truth for what's deployed" is really clean.

But honestly? This setup has been running for three months now without any major issues. Sometimes the boring, well-understood tools are the right choice.

If you're still doing manual deploys over SSH, I get it — I was there too. But even a basic Jenkins pipeline with Docker will save you hours of stress. Start with the simple version and add complexity only when you actually need it. Your Friday evenings will thank you.

Continue Reading

More blogs you might enjoy

View all