
I Got Tired of Sending My Code to Corporate APIs
Look, I love AI tools, but I was getting uncomfortable with where my data was going. Every time I asked an AI to help debug code or review a document, that data went to external servers. My personal projects, client work, random thoughts at 2am - all flowing through corporate APIs.
Plus the costs were adding up, and I kept hitting rate limits right when I needed help most.
Setting up local AI traditionally sucks though. You spend forever wrestling with Python environments, CUDA drivers, model formats. I'm a Nix fanboy, so obviously I built a flake to solve this, but the same concept works other ways too.
My Nix Solution (Takes About 10 Minutes)
I built a NixOS Ollama Flake that gives you a complete local AI setup. Look, I know Nix isn't for everyone, but if you use it, this will save you hours of setup.
Add this to your flake and you get a production-ready Ollama service with GPU acceleration, API access, and proper security isolation.
What You Get Instantly
- Ollama Service: Professionally configured LLM server
- GPU Acceleration: Automatic CUDA/ROCm detection and setup
- REST API: Full API access on localhost:11434 for integration
- Model Management: Easy download and management of AI models
- Security Isolation: Proper user permissions and firewall rules
- Resource Control: Memory and CPU limits prevent system overload
- Zero Dependencies: No Python environments or manual driver setup
How This Actually Works
Instead of following endless installation guides, you add this to your flake:
{
inputs.ollama-ai.url = "github:yourusername/nixos-ollama-flake";
imports = [ ollama-ai.nixosModules.default ];
}
Run nixos-rebuild switch and you get:
- Ollama service running on localhost:11434
- GPU acceleration automatically configured (if you have a GPU)
- Secure firewall rules and user isolation
- CLI tools for model management
- API ready for any programming language
Don't use Nix? You can get the same result with Docker or by installing Ollama directly. The setup takes longer but works the same way.
Real-World Usage
Instant AI Chat
# Download and run a model
ollama pull llama3.2:3b
ollama run llama3.2:3b
# Start chatting immediately
> Explain quantum computing in simple terms
> Write a Python function for binary search
> Help me debug this JavaScript code
Programming Integration
Your local AI works with any programming language:
Python Integration:
import requests
def ask_local_ai(prompt):
response = requests.post('http://localhost:11434/api/generate',
json={"model": "llama3.2:3b", "prompt": prompt})
return response.json()['response']
# Use AI in your applications
code_review = ask_local_ai("Review this Python function for bugs...")
documentation = ask_local_ai("Write docs for this API endpoint...")
JavaScript/Node.js:
async function codeAssist(prompt) {
const response = await fetch('http://localhost:11434/api/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model: "llama3.2:3b", prompt })
});
return (await response.json()).response;
}
// AI-powered development
const explanation = await codeAssist("Explain this React component...");
const tests = await codeAssist("Write unit tests for this function...");
Development Workflow Integration
- VS Code: Install Ollama extension for code completion
- Vim/Neovim: Use ollama.nvim for AI-assisted editing
- Emacs: Ellama package provides comprehensive AI integration
- Terminal: Direct CLI access for quick questions
Model Ecosystem
Choose models for your specific needs:
Fast Development Models
- llama3.2:1b - Lightning fast, basic quality
- llama3.2:3b - Best balance for development work
- qwen2.5:7b - Excellent coding and reasoning abilities
Specialized Models
- deepseek-r1:8b - Advanced reasoning and problem solving
- codellama:7b - Code generation and debugging focus
- mistral:7b - Strong instruction following
Resource Requirements
- 1B-3B models: 4GB RAM, run on most systems
- 7B models: 8GB RAM, better quality
- GPU acceleration: 4GB+ VRAM dramatically improves speed
Privacy and Control
Complete Data Privacy
- All inference happens locally
- No data ever leaves your machine
- No API keys, accounts, or external dependencies
- Full control over model versions and updates
Security Features
- Ollama runs as isolated system user
- Proper firewall configuration
- Resource limits prevent system impact
- Models stored with secure permissions
Offline Capability
- Works completely offline
- No internet required after model download
- Perfect for sensitive environments
- Reliable when external services are down
Performance and Efficiency
GPU Acceleration
The flake automatically detects and configures:
- NVIDIA CUDA: For GeForce and RTX cards
- AMD ROCm: For Radeon graphics
- CPU Fallback: Works without GPU
Resource Management
# Fine-tune for your system
services.ollama.environmentVariables = {
OLLAMA_NUM_PARALLEL = "4"; # Concurrent requests
OLLAMA_MAX_LOADED_MODELS = "2"; # Memory management
};
Model Loading Strategy
- Models load on first use
- Multiple models can run simultaneously
- Automatic memory management
- Fast model switching
Integration Possibilities
Web Interfaces
- Open WebUI: Full web interface for Ollama
- Anything LLM: Document chat and RAG
- Custom dashboards: Build your own UI
Document Analysis
- PDF Processing: Analyze documents privately
- Code Reviews: AI-powered code analysis
- Research Assistant: Query local knowledge bases
Creative Applications
- Writing Assistant: Fiction, technical writing, editing
- Code Generation: Prototyping, boilerplate, documentation
- Learning Aid: Explanations, tutorials, practice problems
Why This Matters
This flake represents a fundamental shift toward AI sovereignty:
- Privacy First: Your data never leaves your control
- Cost Effective: No per-token charges or API subscriptions
- Always Available: No rate limits or service outages
- Reproducible: Identical setup across all systems
- Integrated: Works seamlessly with NixOS ecosystem
- Scalable: From laptop development to server deployment
The Bigger Picture
Local AI is becoming essential for:
- Developers: Code assistance without sharing proprietary code
- Researchers: Document analysis with full privacy
- Writers: Creative assistance without data exposure
- Businesses: AI capabilities without external dependencies
- Students: Learning aid without academic integrity concerns
This flake makes local AI as easy as any other NixOS service. No more choosing between AI capability and data privacy.
Get started today: Add the flake to your configuration, run ollama pull llama3.2:3b, and start your first private AI conversation. Your data stays yours, your AI stays local, and your privacy stays intact.
The future of AI is local, private, and declarative.
Photo by Markus Spiske on Unsplash
Content on this blog was created using human and AI-assisted workflows described here. Original ideas and editorial decisions by Justin Quaintance.