You know that feeling when a service crashes at 3am and you have to SSH in, restart it, and hope it doesn't happen again tomorrow? I got so tired of being my server's babysitter that I built something better: a home server cluster that fixes itself.
Here's what it actually does: When my Jupyter notebook server crashes, it restarts automatically. When PostgreSQL runs out of memory, it gets more. When I deploy broken code, it rolls back to the last working version. All without me touching anything.
The magic comes from combining K3s (think "Kubernetes for humans" - all the self-healing powers without the enterprise complexity) with NixOS (a Linux system where every change can be undone). Together, they create infrastructure that's smarter than the sum of its parts.
Let me show you exactly how I built this, why it took 15 iterations to get right, and how you can avoid my mistakes.
What I Actually Built (And Why)
Before diving into the technical weeds, let me explain what this setup actually is. I built a 2-computer cluster where the computers work together like a team. If one gets sick, the other picks up the slack. If software breaks, it automatically reverts to the last working version.
The Tech Stack (In Plain English)
I chose K3s + NixOS because they solve two specific problems:
- K3s: It's Kubernetes stripped down to just the good parts. Think of regular Kubernetes as a semi-truck - powerful but complex. K3s is like a pickup truck - still hauls your stuff but way easier to park and maintain.
- NixOS: A Linux system where you describe what you want in a config file, and it builds that exact system every time. More importantly, when you mess up (and you will), you can roll back instantly.
The initial architecture centered on two primary nodes:
- k3s-control: Control plane server (24-core Threadripper, 32GB RAM) running PostgreSQL analytics and cluster coordination
- k3s-worker: Worker node (4-core i3, 8GB RAM) configured as Tailscale exit node with hardened security policies
Network Architecture Evolution
The most critical design decision involved networking. My initial Tailscale mesh worked perfectly for SSH access and service discovery, but Kubernetes requires more sophisticated pod-to-pod communication that Tailscale couldn't reliably handle.
The solution: dedicated Wireguard mesh for cluster traffic (10.100.0.0/24) while maintaining Tailscale for management access. This separation of concerns proved essential for operational stability.
# Wireguard Configuration for Cluster Networking
networking.wireguard.interfaces.wg-k3s = {
ips = ["10.200.0.1/24"]; # k3s-control
listenPort = 51821;
peers = [{
publicKey = "EXAMPLE_PUBLIC_KEY_REDACTED_FOR_SECURITY";
allowedIPs = [
"10.200.0.2/32" # k3s-worker
"10.42.0.0/16" # k3s pod CIDR
"10.43.0.0/16" # k3s service CIDR
];
endpoint = "k3s-worker.example.ts.net:51821";
persistentKeepalive = 25;
}];
};
Secrets Management Architecture
Production Kubernetes demands robust secrets management. I implemented agenix for encrypting cluster tokens, Wireguard keys, and service credentials directly in the NixOS configuration repository. This approach provides version control for secrets while maintaining security through age encryption.
# Secure Token Management
age.secrets.k3s-token = {
file = ./secrets/k3s-token.age;
owner = "root";
mode = "0400";
};
services.k3s.tokenFile = config.age.secrets.k3s-token.path;
Implementation Journey & Problem Solving
Building production infrastructure means encountering edge cases that don't appear in tutorials. Each challenge taught valuable lessons about system design and operational excellence.
Challenge 1: The K3s v1.32.6 Kube-Proxy Bug
Problem: After upgrading to K3s v1.32.6, agents entered restart loops with logging configuration conflicts. The kube-proxy component couldn't reinitialize its logging system on restarts.
Investigation: Systemd logs revealed the specific error pattern, and GitHub issues confirmed this as a known regression in that version. The temporary workaround required disabling kube-proxy on agent nodes.
Solution: Modified agent configuration to disable the problematic component while maintaining cluster functionality:
extraFlags = [
"--disable-kube-proxy"
"--kubelet-arg=v=2" # Reduced logging verbosity
];
Lesson Learned: Always test K3s upgrades in isolated environments first. Version-specific bugs can impact production stability, and having rollback procedures is essential.
Challenge 2: Systemd Service Conflicts
Problem: Custom systemd service configurations conflicted with NixOS module defaults, causing configuration validation errors.
Investigation: The issue stemmed from explicitly overriding restart policies that the K3s module already configured optimally. NixOS modules often include sophisticated default configurations that shouldn't be overridden without good reason.
Solution: Trust module defaults and only override when necessary:
# WRONG: Overriding module defaults
systemd.services.k3s.serviceConfig = {
Restart = "always";
RestartSec = "5s";
};
# RIGHT: Trust module defaults, add only what's needed
systemd.services.k3s = {
wants = ["network-online.target" "k3s-wireguard-ready.service"];
after = ["network-online.target" "k3s-wireguard-ready.service"];
};
Lesson Learned: NixOS modules encode best practices from the community. Understanding what modules already provide prevents configuration conflicts and reduces maintenance overhead.
Challenge 3: Network Configuration Auto-Detection
Problem: Hard-coding node IPs caused issues when Tailscale addresses changed or when hostname resolution failed during bootstrap.
Investigation: K3s supports automatic IP detection through interface selection, which works more reliably than explicit IP configuration in dynamic environments.
Solution: Use interface selection instead of hardcoded IPs:
# Specify Wireguard interface for flannel networking
extraFlags = [
"--flannel-iface wg-k3s"
"--node-ip 10.100.0.1" # Explicit for consistency
"--advertise-address 10.100.0.1"
];
Lesson Learned: Kubernetes networking benefits from explicit configuration in production environments, but the approach should accommodate network changes gracefully.
Production Migration: Tailscale to Wireguard
The most significant operational challenge involved migrating from Tailscale overlay networking to dedicated Wireguard for cluster communication. This migration required careful coordination to avoid cluster downtime.
Why the Migration Was Necessary
Tailscale excels at secure device connectivity and service discovery, but Kubernetes introduces networking requirements that conflict with Tailscale's design assumptions:
- Pod-to-pod communication needs direct Layer 3 routing that Tailscale's NAT traversal interferes with
- Service discovery requires consistent IP addressing that Tailscale's dynamic allocation complicates
- Network policies and security controls work better with dedicated VPN infrastructure
Migration Execution Strategy
The migration followed a systematic approach to minimize risk:
Phase 1: Parallel Network Setup
I deployed Wireguard alongside the existing Tailscale mesh, creating dual connectivity without disrupting running services. The new network used 10.100.0.0/24 addressing with static assignments for predictable routing.
Phase 2: K3s Configuration Updates
Updated both server and agent configurations to use Wireguard interfaces while maintaining backward compatibility. The key insight was using flannel interface selection rather than hardcoded endpoint addresses.
Phase 3: Controlled Cutover
Deployed changes using NixOS boot mode first, then executed coordinated reboots to activate the new network configuration. This approach avoided the race conditions that could occur with live service restarts.
Operational Challenges During Migration
The migration revealed several operational insights:
Service Ordering Dependencies: K3s must start after Wireguard interface initialization. I implemented a custom systemd service to ensure proper dependency management:
systemd.services.k3s-wireguard-ready = {
description = "Wait for Wireguard interface to be ready for k3s";
wantedBy = ["multi-user.target"];
before = ["k3s.service"];
script = ''
timeout=60
while [ $timeout -gt 0 ]; do
if ip link show wg-k3s >/dev/null 2>&1; then
echo "Wireguard interface wg-k3s is ready"
exit 0
fi
sleep 2
timeout=$((timeout - 2))
done
exit 1
'';
};
etcd Cluster Membership: Changing the advertise address required etcd cluster reset, which meant temporary loss of cluster state. This reinforced the importance of external backup procedures for critical cluster data.
Real Production Lessons
Operating Kubernetes in production, even at small scale, teaches lessons that apply directly to enterprise environments. Here are the key insights that transformed my operational approach.
Hardware-First Analysis for Resource Planning
The Core m3-6Y30 processor in nixbase operates under strict thermal constraints that traditional server hardware doesn't face. This limitation taught me to approach resource planning from hardware capabilities upward, not workload requirements downward.
Interactive Hardware Testing Process:
I developed a real-time testing methodology using evtest to correlate user interactions with system performance:
# Monitor thermal throttling during workload execution
watch -n 1 'cat /proc/cpuinfo | grep MHz'
# Correlate with interactive testing
evtest /dev/input/event0 # Monitor actual user input lag
This approach revealed that sustained CPU usage above 60% caused interactive lag, leading to workload scheduling policies that reserve capacity for human interaction.
Buffer Optimization for Thermal Constraints
Low-power hardware requires different operational patterns than datacenter equipment. Instead of maximizing utilization, production requires buffer management:
- CPU reservation: 40% capacity buffer for thermal management
- Memory allocation: Conservative limits prevent swap thrashing
- I/O throttling: Background processes yield to interactive workloads
These constraints forced me to design more efficient workload scheduling and resource allocation policies than I would have developed with unlimited resources.
Systematic Problem Analysis with 5 Whys
Example: Agent Join Failures
- Why do agents fail to join the cluster? → Network connectivity issues
- Why are there network connectivity issues? → Wireguard tunnel not establishing
- Why won't the Wireguard tunnel establish? → Service starts before interface is ready
- Why does the service start too early? → systemd dependencies incomplete
- Why are dependencies incomplete? → K3s module doesn't know about custom Wireguard setup
Root Cause: Need explicit service ordering for custom network dependencies.
This systematic approach prevented quick fixes that mask underlying system design issues.
What This Actually Solves: Real Production Benefits
After 15+ days of continuous operation, this K3s setup has transformed how I approach infrastructure problems. Here's what it actually delivers in practice:
Development Velocity: From Hours to Minutes
Before: Setting up a development environment for analytics work meant manually configuring PostgreSQL, installing dependencies, fighting with port conflicts, and spending 2-3 hours just to start coding.
After: kubectl apply -f dev-env.yaml and I have an isolated development environment in 30 seconds. Need GPU access? Another 30 seconds. Need a specific Python version? It's containerized and ready.
Evidence: Current workloads running seamlessly include Jupyter notebooks, n8n automation workflows, and PostgreSQL analytics - all deployed in minutes, not hours.
Infrastructure Reliability: Zero Surprise Downtime
Before: Manual service management meant forgetting to restart services after reboots, configuration drift between environments, and "works on my machine" debugging sessions that lasted hours.
After: NixOS declarative configuration means identical deployments every time. Services automatically restart after system updates. Zero configuration drift because the configuration IS the documentation.
Evidence: Control plane uptime of 15+ days through multiple NixOS updates, automatic service recovery, and consistent behavior across both nodes.
Resource Optimization: Maximum Efficiency from Limited Hardware
Before: Running analytics workloads on the Threadripper meant either maxing out CPU and making the system unusable, or manually babysitting resource allocation.
After: Kubernetes resource limits and thermal-aware scheduling mean I can run intensive workloads while keeping the system responsive for interactive use. GPU workloads automatically get dedicated resources without manual intervention.
Evidence: Successfully running 6 production pods (Jupyter, CoreDNS, metrics-server, NVIDIA plugin, n8n, resume-analytics) with stable resource utilization and responsive interactive performance.
Security Without Complexity
Before: Managing firewall rules, SSH access, and service exposure across multiple machines meant either leaving things too open or spending hours debugging connectivity issues.
After: Dual-network architecture provides automatic security. Cluster traffic isolated on Wireguard. Management access through Tailscale. agenix handles secrets without me thinking about encryption keys.
Evidence: Zero security incidents. Cluster traffic properly isolated on 10.100.0.x network. Management access seamless through Tailscale mesh.
Current Production Reality
The cluster currently handles real production workloads with zero manual intervention:
- Analytics Pipeline: Daily PostgreSQL data processing with automatic scaling
- Development Environments: On-demand Jupyter notebooks with GPU access
- Automation Services: n8n workflows processing resume data
- Infrastructure Services: DNS, metrics collection, device plugins
Resource utilization stays comfortably within thermal limits while providing room for growth - exactly what production infrastructure should do.
Security and Access Control
The dual-network approach provides layered security:
- Tailscale mesh: Administrative access and service discovery
- Wireguard cluster network: Isolated Kubernetes communication
- agenix secrets management: Encrypted credentials with proper rotation
- NixOS firewall: Declarative port management and network policies
Integration with Broader Infrastructure
The K3s cluster integrates seamlessly with existing services:
- PostgreSQL analytics: Shared data access for processing workloads
- File sharing: Distributed storage for application data
- Backup systems: Automated backup of both cluster state and application data
- Monitoring stack: Prometheus and Grafana deployment planned for comprehensive observability
Technical Roadmap & Enterprise Lessons
This foundation enables several next-level improvements that mirror enterprise Kubernetes adoption patterns:
Immediate Next Steps
- Longhorn distributed storage: Replace local storage with resilient, replicated volumes
- MetalLB load balancing: Proper service exposure without NodePort limitations
- cert-manager: Automated TLS certificate management for internal services
- Prometheus + Grafana: Production monitoring and alerting infrastructure
Lessons Applicable to Enterprise Environments
The operational patterns developed here translate directly to larger deployments:
Infrastructure as Code: NixOS configuration management provides the same benefits as Terraform + Ansible at enterprise scale, with better reproducibility guarantees.
Network Segmentation: The dual-network approach (management + cluster) mirrors enterprise patterns of separating control plane from data plane traffic.
Secrets Management: agenix demonstrates how declarative configuration can include secure secrets handling without compromising operational simplicity.
Gradual Migration: The Tailscale → Wireguard migration approach applies to any infrastructure change requiring zero-downtime transitions.
Scaling Considerations
This architecture supports horizontal scaling through additional worker nodes without fundamental changes to the networking or security model. The Wireguard mesh can accommodate dozens of nodes with the same operational characteristics.
Resource constraints that drove optimization decisions here—thermal management, memory pressure, I/O limitations—exist in enterprise environments too, just at different scales. The monitoring and reservation strategies transfer directly.
Want to Build This Yourself? Essential Questions for Implementation
This infrastructure solves real operational problems, but implementation requires careful planning. If you're working with another agent or consultant to build something similar, here are the critical questions they need answered:
Infrastructure Assessment Questions
- Hardware Inventory: What specific hardware do you have available? (CPU cores, RAM, storage type, network connectivity)
- Network Environment: Do you have static IPs, dynamic DNS, or need to work with NAT/firewall restrictions?
- Current NixOS Experience: Are you already running NixOS, or starting from another Linux distribution?
- Security Requirements: What level of network isolation do you need? Any compliance requirements?
- Workload Planning: What applications will you run? Do you need GPU access, persistent storage, or specific resource limits?
Implementation Approach Questions
- Secrets Management: Are you comfortable with agenix/age encryption, or prefer another secrets solution?
- Network Strategy: Will you use Tailscale for management, or need a different VPN solution?
- Monitoring Needs: What level of observability do you need from day one?
- Backup Strategy: How will you handle etcd backups and persistent volume data?
- Deployment Timeline: Are you doing this incrementally or need everything working immediately?
Key Technical Implementation Points
The agent implementing this needs to understand these critical architecture decisions:
Dual Network Architecture: Separate Tailscale (management) from Wireguard (cluster traffic) networks. This prevents Kubernetes networking conflicts while maintaining secure access.
NixOS + agenix: Declarative configuration with encrypted secrets. The entire cluster configuration should be reproducible from git repository.
Service Dependencies: K3s must start after Wireguard interface is ready. Systemd service ordering prevents race conditions during boot.
Resource Planning: Hardware constraints drive scheduling policies. Low-power systems need CPU reservation; high-performance systems can maximize utilization.
Complete Implementation Guide: Build This Yourself
Here's the actual NixOS configuration and implementation steps to build this exact infrastructure. This is the 5% technical detail that makes the 95% operational benefits possible.
Network Architecture Diagram
┌─────────────────────────────────────────────────────────────────┐
│ Production K3s Cluster │
│ Dual Network Design │
└─────────────────────────────────────────────────────────────────┘
Management Network (Tailscale) Cluster Network (Wireguard)
┌─────────────────────┐ ┌─────────────────────┐
│ SSH & Admin │ │ K8s Traffic │
│ 100.x.x.x/32 │ │ 10.200.0.0/24 │
└─────────────────────┘ └─────────────────────┘
│ │
▼ ▼
┌─────────────────┐ Tailscale Mesh ┌─────────────────┐
│ k3s-control │◄──── (SSH/Mgmt) ────►│ k3s-worker │
│ Threadripper │ │ Intel i3 │
│ 32GB RAM │ Wireguard Mesh │ 8GB RAM │
│ 10.200.0.1/24 │◄──── (K8s Only) ────►│ 10.200.0.2/24 │
└─────────────────┘ └─────────────────┘
│ │
┌────▼────┐ ┌────▼────┐
│ k3s API │ │ kubelet │
│ etcd │ │ flannel │
│PostgreSQL│ │containers│
└─────────┘ └─────────┘
Pod Network: 10.42.0.0/16 Service Network: 10.43.0.0/16
Complete NixOS Configuration
Flake-based System Structure
# flake.nix - Root flake configuration
{
description = "K3s cluster infrastructure";
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
agenix.url = "github:ryantm/agenix";
agenix.inputs.nixpkgs.follows = "nixpkgs";
};
outputs = { self, nixpkgs, agenix }: {
nixosConfigurations = {
k3s-control = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
agenix.nixosModules.default
./hosts/k3s-control/configuration.nix
./hosts/k3s-control/hardware-configuration.nix
];
};
k3s-worker = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
agenix.nixosModules.default
./hosts/k3s-worker/configuration.nix
./hosts/k3s-worker/hardware-configuration.nix
];
};
};
};
}
Control Plane Configuration (hosts/k3s-control/configuration.nix)
# hosts/k3s-control/configuration.nix
{ config, pkgs, ... }:
{
# Import agenix secrets
imports = [
../../secrets/secrets.nix
];
# Basic system configuration
networking.hostName = "k3s-control";
networking.networkmanager.enable = true;
# Enable flakes (handled by flake.nix but good to be explicit)
nix.settings.experimental-features = [ "nix-command" "flakes" ];
# Wireguard cluster network configuration
networking.wireguard.interfaces.wg-k3s = {
ips = [ "10.200.0.1/24" ];
listenPort = 51821;
privateKeyFile = config.age.secrets.wg-control-private.path;
peers = [{
# Worker node public key (generated during setup)
publicKey = "WORKER_WG_PUBLIC_KEY_HERE";
allowedIPs = [
"10.200.0.2/32" # Worker node IP
"10.42.0.0/16" # K3s pod CIDR
"10.43.0.0/16" # K3s service CIDR
];
endpoint = "k3s-worker.your-tailnet.ts.net:51821";
persistentKeepalive = 25;
}];
};
# K3s server configuration
services.k3s = {
enable = true;
role = "server";
tokenFile = config.age.secrets.k3s-token.path;
extraFlags = [
# Use Wireguard interface for cluster networking
"--flannel-iface wg-k3s"
"--node-ip 10.200.0.1"
"--advertise-address 10.200.0.1"
# Disable built-in load balancer (use MetalLB instead)
"--disable servicelb"
# Cluster networking configuration
"--cluster-cidr 10.42.0.0/16"
"--service-cidr 10.43.0.0/16"
# Security and performance
"--kubelet-arg=v=2"
"--kube-apiserver-arg=audit-log-maxage=7"
];
};
# agenix secrets configuration
age.secrets = {
k3s-token = {
file = ./secrets/k3s-token.age;
owner = "root";
group = "root";
mode = "0400";
};
wg-control-private = {
file = ./secrets/wg-control-private.age;
owner = "root";
group = "root";
mode = "0400";
};
};
# Firewall configuration
networking.firewall = {
enable = true;
allowedTCPPorts = [
22 # SSH
6443 # K3s API server
10250 # Kubelet
];
allowedUDPPorts = [
51821 # Wireguard
];
# Allow cluster traffic on Wireguard interface
interfaces.wg-k3s.allowedTCPPorts = [
6443 # K3s API
10250 # Kubelet
2379 # etcd client
2380 # etcd peer
];
};
# Tailscale for management access
services.tailscale.enable = true;
# System dependencies
environment.systemPackages = with pkgs; [
wireguard-tools
kubectl
kubernetes-helm
htop
curl
];
# Service dependencies and ordering
systemd.services.k3s = {
wants = [ "network-online.target" "wireguard-wg-k3s.service" ];
after = [ "network-online.target" "wireguard-wg-k3s.service" ];
};
}
Worker Node Configuration (hosts/k3s-worker/configuration.nix)
# hosts/k3s-worker/configuration.nix
{ config, pkgs, ... }:
{
# Import agenix secrets
imports = [
../../secrets/secrets.nix
];
# Basic system configuration
nix.settings.experimental-features = [ "nix-command" "flakes" ];
networking.hostName = "k3s-worker";
networking.networkmanager.enable = true;
# Wireguard cluster network
networking.wireguard.interfaces.wg-k3s = {
ips = [ "10.200.0.2/24" ];
listenPort = 51821;
privateKeyFile = config.age.secrets.wg-worker-private.path;
peers = [{
# Control plane public key
publicKey = "CONTROL_WG_PUBLIC_KEY_HERE";
allowedIPs = [
"10.200.0.1/32" # Control plane IP
"10.42.0.0/16" # Pod network
"10.43.0.0/16" # Service network
];
endpoint = "k3s-control.your-tailnet.ts.net:51821";
persistentKeepalive = 25;
}];
};
# K3s agent configuration
services.k3s = {
enable = true;
role = "agent";
serverAddr = "https://10.200.0.1:6443";
tokenFile = config.age.secrets.k3s-token.path;
extraFlags = [
"--flannel-iface wg-k3s"
"--node-ip 10.200.0.2"
"--kubelet-arg=v=2"
];
};
# NVIDIA GPU support (if applicable)
hardware.opengl.enable = true;
hardware.nvidia-container-toolkit.enable = true;
# agenix secrets
age.secrets = {
k3s-token = {
file = ./secrets/k3s-token.age;
owner = "root";
mode = "0400";
};
wg-worker-private = {
file = ./secrets/wg-worker-private.age;
owner = "root";
mode = "0400";
};
};
# Firewall for worker node
networking.firewall = {
enable = true;
allowedTCPPorts = [ 22 10250 ];
allowedUDPPorts = [ 51821 ];
interfaces.wg-k3s.allowedTCPPorts = [ 10250 ];
};
# Tailscale and essential packages
services.tailscale.enable = true;
environment.systemPackages = with pkgs; [
wireguard-tools
kubectl
htop
];
# Ensure Wireguard starts before K3s
systemd.services.k3s = {
wants = [ "wireguard-wg-k3s.service" ];
after = [ "wireguard-wg-k3s.service" ];
};
}
Secrets Management with agenix
Generate and Encrypt Secrets
# 1. Generate K3s cluster token
openssl rand -hex 32 > k3s-token-plaintext
# 2. Generate Wireguard keypairs
wg genkey > control-private.key
wg pubkey < control-private.key > control-public.key
wg genkey > worker-private.key
wg pubkey < worker-private.key > worker-public.key
# 3. Set up agenix secrets.nix
cat > secrets/secrets.nix << 'EOF'
let
control-key = "ssh-ed25519 YOUR_CONTROL_SSH_KEY";
worker-key = "ssh-ed25519 YOUR_WORKER_SSH_KEY";
user-key = "ssh-ed25519 YOUR_USER_SSH_KEY";
keys = [ control-key worker-key user-key ];
in
{
"k3s-token.age".publicKeys = keys;
"wg-control-private.age".publicKeys = keys;
"wg-worker-private.age".publicKeys = keys;
}
EOF
# 4. Encrypt secrets with agenix
agenix -e secrets/k3s-token.age
agenix -e secrets/wg-control-private.age
agenix -e secrets/wg-worker-private.age
Deployment Steps
Step 1: Prepare Both Nodes
# On both nodes, enable Tailscale first
sudo tailscale up
sudo tailscale status # Verify connectivity
# Install agenix
nix-env -iA nixpkgs.agenix
Directory Structure
k3s-infrastructure/
├── flake.nix
├── flake.lock
├── hosts/
│ ├── k3s-control/
│ │ ├── configuration.nix
│ │ └── hardware-configuration.nix
│ └── k3s-worker/
│ ├── configuration.nix
│ └── hardware-configuration.nix
└── secrets/
├── secrets.nix
├── k3s-token.age
├── wg-control-private.age
└── wg-worker-private.age
Step 2: Deploy Control Plane
# On control node, from flake directory
sudo nixos-rebuild switch --flake .#k3s-control
# Verify Wireguard interface
ip addr show wg-k3s
wg show
# Check K3s server status
sudo systemctl status k3s
sudo k3s kubectl get nodes
Step 3: Deploy Worker Node
# On worker node, from flake directory
sudo nixos-rebuild switch --flake .#k3s-worker
# Test Wireguard connectivity to control plane
ping 10.200.0.1
# Verify K3s agent joined cluster
sudo systemctl status k3s
# On control plane, verify worker joined
sudo k3s kubectl get nodes -o wide
Production Cluster Evidence
Control Plane Node (k3s-control)
justin@k3s-control
--------------
OS: NixOS 25.05.20250729.1f08a4d (Warbler) x86_64
Host: X399 AORUS XTREME
Kernel: 6.12.40
Uptime: 15 days, 21 hours, 53 mins
Packages: 2590 (nix-system), 916 (nix-user)
Shell: fish 4.0.2
CPU: AMD Ryzen Threadripper 2920X (24) @ 3.500GHz
GPU: NVIDIA GeForce RTX 2070 SUPER
Memory: 2.9 GiB / 32.0 GiB (9%)
Worker Node (k3s-worker)
justin@k3s-worker
--------------
OS: NixOS 25.05.20250729.1f08a4d (Warbler) x86_64
Host: 10MQS37000 ThinkCentre M710q
Kernel: 6.12.40
Uptime: 13 days, 21 hours, 1 min
Packages: 1489 (nix-system), 1248 (nix-user)
Shell: fish 4.0.2
CPU: Intel i3-7100T (4) @ 3.4GHz
GPU: Intel HD Graphics 630
Memory: 1.3 GiB / 7.7 GiB (16%)
Network: 1 Gbps
Cluster Status
# Active K3s cluster with 6 production workloads
NAME STATUS ROLES AGE VERSION INTERNAL-IP
k3s-control Ready control-plane,etcd,master 15d v1.32.6+k3s1 10.200.0.1
k3s-worker Ready 13d v1.32.6+k3s1 10.200.0.2
NAMESPACE NAME READY STATUS RESTARTS
jupyter jupyter-notebook-84d944f9d9-jxmnz 1/1 Running 1 (54m)
kube-system coredns-5688667fd4-xm9nf 1/1 Running 3 (54m)
kube-system metrics-server-6f4c6675d5-hnf9b 1/1 Running 3 (54m)
kube-system nvidia-device-plugin-daemonset-5lh9n 1/1 Running 0 (54m)
n8n n8n-899444cb6-cfwjs 1/1 Running 1 (54m)
resume-analytics resume-analytics-568ddc5994-6m4fw 1/1 Running 1 (54m)
Verification Commands
# Network connectivity tests
ping 10.200.0.1 # Control plane from worker
ping 10.200.0.2 # Worker from control plane
# Cluster health checks
k3s kubectl get nodes
k3s kubectl get pods -A
k3s kubectl cluster-info
# Deploy test workload
k3s kubectl create deployment nginx --image=nginx
k3s kubectl expose deployment nginx --port=80 --type=NodePort
k3s kubectl get services
# Resource monitoring
htop
k3s kubectl top nodes
k3s kubectl top pods -A
The Real Value: Infrastructure That Just Works
This isn't about building impressive technical architecture—it's about eliminating the operational overhead that prevents you from building valuable applications. When infrastructure deployment is atomic and reversible, when services restart reliably after updates, when development environments provision in seconds instead of hours, you can focus on solving actual business problems instead of fighting with servers.
The combination of NixOS declarative configuration with K3s lightweight orchestration creates infrastructure that gets out of your way. That's the real competitive advantage: not the technology stack, but the operational velocity it enables.
Ready to eliminate infrastructure friction from your development workflow? The questions above will get another agent started on building something similar for your specific environment.
Photo by GuerrillaBuzz on Unsplash
Content on this blog was created using human and AI-assisted workflows described here. Original ideas and editorial decisions by Justin Quaintance.