Infrastructure

Securing Multi-Cloud Kubernetes: Talos, KubeSpan, and Tailscale

Deploy a production-ready multi-cloud Kubernetes cluster using Talos OS kexec hot-swap, KubeSpan encrypted mesh, and Tailscale-secured management.

Krishna C

Krishna C

September 23, 2025

5 min read

Running Kubernetes across different cloud providers usually means dealing with incompatible networks, manual OS installations, and exposed management APIs. You need a way to connect nodes securely, encrypt pod traffic across clouds, and lock down administrative access—all without drowning in VPN complexity.

Here's how to do it right.

The Stack

ComponentPurpose
Talos OSImmutable Kubernetes OS deployed via kexec hot-swap
KubeSpanWireGuard mesh for encrypted pod-to-pod communication
FlannelLightweight CNI that works everywhere
TailscaleZero-trust network access for cluster management
TraefikGateway API controller for public traffic

Infrastructure managed with OpenTofu and Terragrunt, using Supabase PostgreSQL for remote state.

Network Architecture

1┌─────────────────────────────────────────────────────────────────┐
2│ Internet │
3└────────────┬────────────────────────────────────┬───────────────┘
4 │ │
5 ┌───────▼─────────┐ ┌───────▼────────┐
6 │ VPS Provider A │ │ VPS Provider B │
7 │ (Public IPs) │ │ (Public IPs) │
8 └───────┬─────────┘ └───────┬────────┘
9 │ │
10 ┌───────▼────────────────────────────────────▼───────┐
11 │ KubeSpan WireGuard Mesh (51820) │
12 │ Encrypted pod-to-pod across all nodes │
13 │ 10.244.0.0/16 pod network via Flannel │
14 └───────┬────────────────────────────────────────────┘
15
16 ┌───────▼────────────────────────────────────────────┐
17 │ Kubernetes Cluster (Control Plane + Workers) │
18 │ - DNS Round-Robin to CP nodes (6443) │
19 │ - Traefik on host network (80/443) │
20 │ - Tailscale operator for internal routes │
21 └───────┬────────────────────────────────────────────┘
22
23 ┌───────▼────────────────────────────────────────────┐
24 │ Tailscale Management Network │
25 │ API access (50000, 6443) restricted to: │
26 │ - Tailscale network (100.64.0.0/10) │
27 │ - Firewall blocks public API access │
28 └────────────────────────────────────────────────────┘
29
30Public Traffic: Internet → Public IP:80/443 → Traefik → Pods
31Management: Admin → Tailscale VPN → CP Tailscale IP:6443 → API
32Pod-to-Pod: Pod A → Flannel → KubeSpan → Internet → KubeSpan → Pod B

Talos OS: Hot-Swap Any VPS

Talos is an immutable, API-only OS built for Kubernetes. No SSH, no shell, no package manager. The killer feature: deploy via kexec without reinstalling the OS.

Your VPS boots whatever it came with. SSH in once, run a deployment script, and minutes later you're running Talos. The OS hot-swaps itself while running.

Works on any provider—Hetzner, DigitalOcean, Vultr. The deployment script auto-detects network configuration and handles the kexec boot.

The infrastructure code configures:

  • LUKS2 full disk encryption (state and ephemeral partitions)
  • Tailscale system extension for management access
  • KubeSpan WireGuard mesh for pod networking
  • Flannel CNI overlay

KubeSpan: Encrypted Pod Networking

KubeSpan is Talos's built-in WireGuard mesh connecting all cluster nodes. It creates an encrypted overlay for pod-to-pod communication across clouds.

Configuration is minimal:

1machine:
2 network:
3 kubespan:
4 enabled: true
5cluster:
6 discovery:
7 enabled: true

Nodes discover each other automatically and establish WireGuard tunnels. All pod traffic flows encrypted, even crossing the public internet between providers.

Flannel provides the CNI layer with VXLAN overlay on top of KubeSpan. The pod subnet (10.244.0.0/16) works seamlessly across all nodes regardless of location.

Why KubeSpan Over Tailscale for Pod Traffic?

I initially tried running pod networking over Tailscale IPs. Problems:

IssueDetails
MTU LimitationsTailscale's 1280 MTU causes fragmentation with standard pod traffic
User-Space OverheadTailscale runs in user space, adding latency. KubeSpan uses kernel-space WireGuard
Routing ComplexityPod subnet routing through Tailscale requires additional configuration
Network StabilityDifferent providers have varying configs—KubeSpan handles this transparently

For management access (talosctl, kubectl), Tailscale is perfect. For pod networking at scale, KubeSpan's kernel-level mesh is the right tool.

Why Flannel Over Cilium?

I spent considerable time trying to make Cilium work as the CNI, L2 LoadBalancer, and Gateway API provider. The promise of an all-in-one solution was attractive.

The reality: debugging network issues across different cloud providers became a time sink. Each provider has different configurations—OVH uses /32 point-to-point, Hetzner uses standard subnets, some have strict MAC filtering.

Cilium's L2 announcements and socket-based load balancing kept breaking in subtle ways. Days troubleshooting why pods couldn't reach services on one provider but worked fine on another.

Flannel keeps it simple. VXLAN overlay, standard configuration, works the same everywhere. Combined with KubeSpan's encrypted mesh, it provides reliable networking across any VPS provider.

Sometimes boring technology wins.

Tailscale: Securing Management Access

Nodes join the Tailscale network as devices, but only for management access.

After Talos deploys, a post-deployment script:

  1. Waits for Tailscale to initialize on all nodes
  2. Applies firewall rules blocking API ports (50000, 6443) from public internet
  3. Allows these ports only from Tailscale network (100.64.0.0/10)
  4. Updates kubeconfig and talosconfig to use Tailscale endpoints

Control plane firewall configuration:

1machine:
2 network:
3 firewall:
4 defaultAction: block
5 rules:
6 - protocol: udp
7 port: 51820
8 ingress: allow
9 - protocol: tcp
10 port: 50000
11 sources: ["100.64.0.0/10"]
12 ingress: allow
13 - protocol: tcp
14 port: 6443
15 sources: ["100.64.0.0/10"]
16 ingress: allow

Now kubectl and talosctl only work through Tailscale:

1kubectl cluster-info # Uses Tailscale endpoints
2talosctl version # Uses Tailscale endpoints
3
4curl https://public-ip:6443 # Connection refused

Public Ingress with Traefik Gateway API

Traefik runs on host network mode to accept public traffic on ports 80 and 443. It implements Kubernetes Gateway API instead of traditional Ingress.

Gateway API provides:

  • Better separation between infrastructure and routing configuration
  • More expressive routing rules (header matching, path rewrites)
  • Cleaner multi-tenant support

Host network means direct binding to public IPs—no NodePort, no LoadBalancer services, no extra NAT layer.

What This Gives You

Security by Default

  • Disk encryption with LUKS2
  • Pod traffic encrypted via KubeSpan WireGuard
  • Management APIs locked to Tailscale VPN only

Multi-Cloud Freedom

  • Deploy on any VPS provider
  • KubeSpan connects nodes across clouds transparently
  • Add capacity anywhere in minutes

Operational Simplicity

  • Infrastructure as code with OpenTofu
  • GitOps for all applications via ArgoCD
  • DNS round-robin for control plane HA

Cost Efficiency

  • Use cheap VPS instances from any provider
  • No expensive load balancers or cloud networking services

In Practice

When you need more capacity, provision a VPS anywhere, run the deployment script, and it joins automatically:

  1. OpenTofu hot-swaps the OS to Talos via kexec
  2. Node joins KubeSpan mesh and gets encrypted pod networking
  3. Joins Tailscale for management access
  4. Firewall rules lock down public API access
  5. Ready to serve traffic in under 10 minutes

No VPN certificates to manage, no complex networking configs, no exposed management APIs. Just secure, simple, multi-cloud Kubernetes.

---

Interested in learning more? Reach out at [email protected]

#kubernetes

Next →

Local LLM Hardware Guide 2025: Mac Studio vs. NVIDIA & Ryzen

A deep dive into building a personal AI lab. Comparing Mac Studio Unified Memory against NVIDIA clusters and Ryzen AI for running massive models like Qwen-3 and GLM-4.5 locally.