HTML code on computer screen - development and programming concept

I Got Tired of Sending My Code to Corporate APIs

Look, I love AI tools, but I was getting uncomfortable with where my data was going. Every time I asked an AI to help debug code or review a document, that data went to external servers. My personal projects, client work, random thoughts at 2am - all flowing through corporate APIs.

Plus the costs were adding up, and I kept hitting rate limits right when I needed help most.

Setting up local AI traditionally sucks though. You spend forever wrestling with Python environments, CUDA drivers, model formats. I'm a Nix fanboy, so obviously I built a flake to solve this, but the same concept works other ways too.

My Nix Solution (Takes About 10 Minutes)

I built a NixOS Ollama Flake that gives you a complete local AI setup. Look, I know Nix isn't for everyone, but if you use it, this will save you hours of setup.

Add this to your flake and you get a production-ready Ollama service with GPU acceleration, API access, and proper security isolation.

What You Get Instantly

Ollama Service: Professionally configured LLM server
GPU Acceleration: Automatic CUDA/ROCm detection and setup
REST API: Full API access on localhost:11434 for integration
Model Management: Easy download and management of AI models
Security Isolation: Proper user permissions and firewall rules
Resource Control: Memory and CPU limits prevent system overload
Zero Dependencies: No Python environments or manual driver setup

How This Actually Works

Instead of following endless installation guides, you add this to your flake:

{
  inputs.ollama-ai.url = "github:yourusername/nixos-ollama-flake";
  imports = [ ollama-ai.nixosModules.default ];
}

Run nixos-rebuild switch and you get:

Ollama service running on localhost:11434
GPU acceleration automatically configured (if you have a GPU)
Secure firewall rules and user isolation
CLI tools for model management
API ready for any programming language

Don't use Nix? You can get the same result with Docker or by installing Ollama directly. The setup takes longer but works the same way.

Real-World Usage

Instant AI Chat

# Download and run a model
ollama pull llama3.2:3b
ollama run llama3.2:3b

# Start chatting immediately
> Explain quantum computing in simple terms
> Write a Python function for binary search
> Help me debug this JavaScript code

Programming Integration

Your local AI works with any programming language:

Python Integration:

import requests

def ask_local_ai(prompt):
    response = requests.post('http://localhost:11434/api/generate',
                           json={"model": "llama3.2:3b", "prompt": prompt})
    return response.json()['response']

# Use AI in your applications
code_review = ask_local_ai("Review this Python function for bugs...")
documentation = ask_local_ai("Write docs for this API endpoint...")

JavaScript/Node.js:

async function codeAssist(prompt) {
    const response = await fetch('http://localhost:11434/api/generate', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ model: "llama3.2:3b", prompt })
    });
    return (await response.json()).response;
}

// AI-powered development
const explanation = await codeAssist("Explain this React component...");
const tests = await codeAssist("Write unit tests for this function...");

Development Workflow Integration

VS Code: Install Ollama extension for code completion
Vim/Neovim: Use ollama.nvim for AI-assisted editing
Emacs: Ellama package provides comprehensive AI integration
Terminal: Direct CLI access for quick questions

Model Ecosystem

Choose models for your specific needs:

Fast Development Models

llama3.2:1b - Lightning fast, basic quality
llama3.2:3b - Best balance for development work
qwen2.5:7b - Excellent coding and reasoning abilities

Specialized Models

deepseek-r1:8b - Advanced reasoning and problem solving
codellama:7b - Code generation and debugging focus
mistral:7b - Strong instruction following

Resource Requirements

1B-3B models: 4GB RAM, run on most systems
7B models: 8GB RAM, better quality
GPU acceleration: 4GB+ VRAM dramatically improves speed

Privacy and Control

Complete Data Privacy

All inference happens locally
No data ever leaves your machine
No API keys, accounts, or external dependencies
Full control over model versions and updates

Security Features

Ollama runs as isolated system user
Proper firewall configuration
Resource limits prevent system impact
Models stored with secure permissions

Offline Capability

Works completely offline
No internet required after model download
Perfect for sensitive environments
Reliable when external services are down

Performance and Efficiency

GPU Acceleration

The flake automatically detects and configures:

NVIDIA CUDA: For GeForce and RTX cards
AMD ROCm: For Radeon graphics
CPU Fallback: Works without GPU

Resource Management

# Fine-tune for your system
services.ollama.environmentVariables = {
  OLLAMA_NUM_PARALLEL = "4";      # Concurrent requests
  OLLAMA_MAX_LOADED_MODELS = "2"; # Memory management
};

Model Loading Strategy

Models load on first use
Multiple models can run simultaneously
Automatic memory management
Fast model switching

Integration Possibilities

Web Interfaces

Open WebUI: Full web interface for Ollama
Anything LLM: Document chat and RAG
Custom dashboards: Build your own UI

Document Analysis

PDF Processing: Analyze documents privately
Code Reviews: AI-powered code analysis
Research Assistant: Query local knowledge bases

Creative Applications

Writing Assistant: Fiction, technical writing, editing
Code Generation: Prototyping, boilerplate, documentation
Learning Aid: Explanations, tutorials, practice problems

Why This Matters

This flake represents a fundamental shift toward AI sovereignty:

Privacy First: Your data never leaves your control
Cost Effective: No per-token charges or API subscriptions
Always Available: No rate limits or service outages
Reproducible: Identical setup across all systems
Integrated: Works seamlessly with NixOS ecosystem
Scalable: From laptop development to server deployment

The Bigger Picture

Local AI is becoming essential for:

Developers: Code assistance without sharing proprietary code
Researchers: Document analysis with full privacy
Writers: Creative assistance without data exposure
Businesses: AI capabilities without external dependencies
Students: Learning aid without academic integrity concerns

This flake makes local AI as easy as any other NixOS service. No more choosing between AI capability and data privacy.

Get started today: Add the flake to your configuration, run ollama pull llama3.2:3b, and start your first private AI conversation. Your data stays yours, your AI stays local, and your privacy stays intact.

The future of AI is local, private, and declarative.

Photo by Markus Spiske on Unsplash

Content on this blog was created using human and AI-assisted workflows described here. Original ideas and editorial decisions by Justin Quaintance.

Local AI Made Simple: NixOS Ollama Flake for Private LLM Hosting