Local AI Made Simple: NixOS Ollama Flake for Private LLM Hosting

By: on Jan 6, 2025
HTML code on computer screen - development and programming concept

I Got Tired of Sending My Code to Corporate APIs

Look, I love AI tools, but I was getting uncomfortable with where my data was going. Every time I asked an AI to help debug code or review a document, that data went to external servers. My personal projects, client work, random thoughts at 2am - all flowing through corporate APIs.

Plus the costs were adding up, and I kept hitting rate limits right when I needed help most.

Setting up local AI traditionally sucks though. You spend forever wrestling with Python environments, CUDA drivers, model formats. I'm a Nix fanboy, so obviously I built a flake to solve this, but the same concept works other ways too.

My Nix Solution (Takes About 10 Minutes)

I built a NixOS Ollama Flake that gives you a complete local AI setup. Look, I know Nix isn't for everyone, but if you use it, this will save you hours of setup.

Add this to your flake and you get a production-ready Ollama service with GPU acceleration, API access, and proper security isolation.

What You Get Instantly

  • Ollama Service: Professionally configured LLM server
  • GPU Acceleration: Automatic CUDA/ROCm detection and setup
  • REST API: Full API access on localhost:11434 for integration
  • Model Management: Easy download and management of AI models
  • Security Isolation: Proper user permissions and firewall rules
  • Resource Control: Memory and CPU limits prevent system overload
  • Zero Dependencies: No Python environments or manual driver setup

How This Actually Works

Instead of following endless installation guides, you add this to your flake:

{
inputs.ollama-ai.url = "github:yourusername/nixos-ollama-flake";
imports = [ ollama-ai.nixosModules.default ];
}

Run nixos-rebuild switch and you get:

  • Ollama service running on localhost:11434
  • GPU acceleration automatically configured (if you have a GPU)
  • Secure firewall rules and user isolation
  • CLI tools for model management
  • API ready for any programming language

Don't use Nix? You can get the same result with Docker or by installing Ollama directly. The setup takes longer but works the same way.

Real-World Usage

Instant AI Chat

# Download and run a model
ollama pull llama3.2:3b
ollama run llama3.2:3b

# Start chatting immediately
> Explain quantum computing in simple terms
> Write a Python function for binary search
> Help me debug this JavaScript code

Programming Integration

Your local AI works with any programming language:

Python Integration:

import requests

def ask_local_ai(prompt):
response = requests.post('http://localhost:11434/api/generate',
json={"model": "llama3.2:3b", "prompt": prompt})
return response.json()['response']

# Use AI in your applications
code_review = ask_local_ai("Review this Python function for bugs...")
documentation = ask_local_ai("Write docs for this API endpoint...")

JavaScript/Node.js:

async function codeAssist(prompt) {
const response = await fetch('http://localhost:11434/api/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model: "llama3.2:3b", prompt })
});
return (await response.json()).response;
}

// AI-powered development
const explanation = await codeAssist("Explain this React component...");
const tests = await codeAssist("Write unit tests for this function...");

Development Workflow Integration

  • VS Code: Install Ollama extension for code completion
  • Vim/Neovim: Use ollama.nvim for AI-assisted editing
  • Emacs: Ellama package provides comprehensive AI integration
  • Terminal: Direct CLI access for quick questions

Model Ecosystem

Choose models for your specific needs:

Fast Development Models

  • llama3.2:1b - Lightning fast, basic quality
  • llama3.2:3b - Best balance for development work
  • qwen2.5:7b - Excellent coding and reasoning abilities

Specialized Models

  • deepseek-r1:8b - Advanced reasoning and problem solving
  • codellama:7b - Code generation and debugging focus
  • mistral:7b - Strong instruction following

Resource Requirements

  • 1B-3B models: 4GB RAM, run on most systems
  • 7B models: 8GB RAM, better quality
  • GPU acceleration: 4GB+ VRAM dramatically improves speed

Privacy and Control

Complete Data Privacy

  • All inference happens locally
  • No data ever leaves your machine
  • No API keys, accounts, or external dependencies
  • Full control over model versions and updates

Security Features

  • Ollama runs as isolated system user
  • Proper firewall configuration
  • Resource limits prevent system impact
  • Models stored with secure permissions

Offline Capability

  • Works completely offline
  • No internet required after model download
  • Perfect for sensitive environments
  • Reliable when external services are down

Performance and Efficiency

GPU Acceleration

The flake automatically detects and configures:

  • NVIDIA CUDA: For GeForce and RTX cards
  • AMD ROCm: For Radeon graphics
  • CPU Fallback: Works without GPU

Resource Management

# Fine-tune for your system
services.ollama.environmentVariables = {
OLLAMA_NUM_PARALLEL = "4"; # Concurrent requests
OLLAMA_MAX_LOADED_MODELS = "2"; # Memory management
};

Model Loading Strategy

  • Models load on first use
  • Multiple models can run simultaneously
  • Automatic memory management
  • Fast model switching

Integration Possibilities

Web Interfaces

  • Open WebUI: Full web interface for Ollama
  • Anything LLM: Document chat and RAG
  • Custom dashboards: Build your own UI

Document Analysis

  • PDF Processing: Analyze documents privately
  • Code Reviews: AI-powered code analysis
  • Research Assistant: Query local knowledge bases

Creative Applications

  • Writing Assistant: Fiction, technical writing, editing
  • Code Generation: Prototyping, boilerplate, documentation
  • Learning Aid: Explanations, tutorials, practice problems

Why This Matters

This flake represents a fundamental shift toward AI sovereignty:

  1. Privacy First: Your data never leaves your control
  2. Cost Effective: No per-token charges or API subscriptions
  3. Always Available: No rate limits or service outages
  4. Reproducible: Identical setup across all systems
  5. Integrated: Works seamlessly with NixOS ecosystem
  6. Scalable: From laptop development to server deployment

The Bigger Picture

Local AI is becoming essential for:

  • Developers: Code assistance without sharing proprietary code
  • Researchers: Document analysis with full privacy
  • Writers: Creative assistance without data exposure
  • Businesses: AI capabilities without external dependencies
  • Students: Learning aid without academic integrity concerns

This flake makes local AI as easy as any other NixOS service. No more choosing between AI capability and data privacy.


Get started today: Add the flake to your configuration, run ollama pull llama3.2:3b, and start your first private AI conversation. Your data stays yours, your AI stays local, and your privacy stays intact.

The future of AI is local, private, and declarative.

Photo by Markus Spiske on Unsplash

Content on this blog was created using human and AI-assisted workflows described here. Original ideas and editorial decisions by Justin Quaintance.