Running Jenkins in Kubernetes: Why We Left EC2 Behind
Scaling Jenkins agents dynamically in Kubernetes beats static EC2 instances. Here's what worked, what broke, and how we solved Docker-in-Docker nightmares with BuildKit.
March 24, 2021
•
4 min read
•
Updated July 30, 2023
We moved Jenkins from EC2 instances to Kubernetes. The promise was simple: spawn agents on demand, scale to zero when idle, stop paying for idle build capacity.
The reality was messier. But worth it.
Why Kubernetes Over EC2
Dynamic Agent Scaling: EC2 requires pre-provisioned instances. You pay for capacity whether builds are running or not. Kubernetes spawns agent pods on demand and terminates them when done. Actual usage determines cost.
Scale to Zero: No builds running? Agent pods go to zero. With EC2, you keep minimum instances running "just in case." That idle cost adds up fast.
Resource Efficiency: Kubernetes schedules pods across nodes intelligently. Multiple small builds pack onto the same node. Large builds get dedicated resources. EC2 forces you to guess instance sizes upfront.
Faster Agent Provisioning: Spinning up a pod takes seconds. Launching an EC2 instance takes minutes. When builds queue, speed matters.
No Instance Management: No SSH keys. No AMI updates. No security patches. Just container images. Jenkins handles the rest.
The Helm Chart Challenge
The official Jenkins Helm chart is comprehensive—and complicated. Configuration sprawls across values files. Getting basic Jenkins running is easy. Getting it production-ready with proper persistence, security, and networking requires understanding Kubernetes internals.
We spent more time than expected just configuring the chart correctly. The defaults aren't terrible, but they're not production-ready either.
The Docker-in-Docker Problem
Building Docker images inside Kubernetes pods breaks the simple case. Jenkins agents need to build images, but pods can't run Docker daemons without privileged access—a security nightmare.
Attempt 1: Docker-in-Docker (DinD)
We started with DinD. Ran a Docker daemon as a sidecar container. Agents connected to it to build images.
Why it failed: Requires privileged pods. Opens security holes. Slow—daemon startup adds overhead to every build. Caching is complicated. Doesn't play well with ephemeral pods.
Attempt 2: Kaniko
Kaniko builds images without a Docker daemon. Runs unprivileged. Reads Dockerfiles, builds images, pushes to registries—all from within a standard pod.
Why we moved on: Worked well for simple Dockerfiles. Struggled with complex multi-stage builds. Cache management was awkward. Performance wasn't great for large builds. Felt like a workaround, not a solution.
Final Solution: BuildKit
BuildKit is Docker's next-gen build engine. Rootless mode, better caching, faster builds, proper multi-stage support.
Why it works: Runs rootless (no privileged pods). Cache management is excellent—persistent volumes or inline caching work smoothly. Handles complex builds without issues. Performance matches or beats Docker daemon builds.
The tradeoff: Slightly more complex setup than Kaniko. Requires configuring BuildKit daemon mode or using buildctl directly. Worth it for production use.
Making Jenkins Ephemeral with EFS
Jenkins is stateful. Job configs, build history, plugins—all live on disk. Running stateful services in Kubernetes means handling persistence properly.
AWS EFS (Elastic File System) solved this. Network file system accessible from any node. Jenkins pod dies? Reschedules on another node with the same data. Mount EFS to /var/jenkins_home and the controller becomes truly ephemeral. Upgrade? Delete the pod. New one starts with identical state.
What We Got
Cost Savings: Agents scale to zero overnight and weekends. Cut build infrastructure costs by ~60%.
Better Resource Utilization: Multiple builds share nodes efficiently. No more wasted large instances for small builds.
Faster Feedback: Builds start in seconds. Kubernetes pod scheduling beats EC2 instance launches.
Simpler Operations: No instance fleet management. Container images replace SSH configuration.
The Real Costs
Initial Setup Complexity: Getting Jenkins + Kubernetes + BuildKit + EFS working took time. The Helm chart isn't plug-and-play.
Learning Curve: Debugging moved from SSH to pod logs and Kubernetes events.
Build Scripts Changed: Pipelines needed updates for BuildKit's rootless mode.
Monitoring: Prometheus and Grafana handle metrics and dashboards. Different from EC2 CloudWatch but more flexible.
Worth It?
Yes. The initial complexity pays off in operational simplicity and cost savings.
If you're running Jenkins on static EC2 with variable build loads, Kubernetes makes sense. Scaling to zero alone justifies the migration.
Budget time for BuildKit configuration and pipeline testing before going to production.
---
Running Jenkins in Kubernetes? What's your build image strategy? Still fighting with DinD? Found a better approach than BuildKit?
This setup works for us, but CI/CD infrastructure is never truly finished.