Every OpenClaw setup guide starts the same way: sign up for an API key, enter your credit card, watch the bill climb. But what if you could run the whole thing locally, on your own hardware, for exactly $0/month? That is what Ollama makes possible.
What You Will Find Here
Why Run OpenClaw With Local Models
Three reasons founders go local:
1. Zero API costs. Claude, GPT, Gemini: they all charge per token. A busy OpenClaw agent running multiple sub-agents can burn through $50-150/month in API fees alone. Ollama models running on your machine cost nothing after the initial hardware.
2. Full privacy. Your conversations, business data, customer info, SOPs: none of it leaves your device. No third-party servers. No data retention policies to worry about. Everything stays on your SSD.
3. Offline capability. Internet goes down? Your agent keeps working. On a plane? Still running. This matters if you are building systems that need to be always available.
The tradeoff is real though. Local models are smaller and less capable than frontier cloud models like Claude Opus or GPT-4.5. For simple tasks (summaries, formatting, calendar checks, file organization) they work great. For complex multi-step reasoning, you will notice the difference.
Install Ollama
Ollama runs on macOS, Linux, and Windows. One command to install:
macOS and Linux:
curl -fsSL https://ollama.com/install.sh | bash
Windows (PowerShell):
iwr -useb https://ollama.com/install.ps1 | iex
Or download the installer directly from ollama.com/download.
After installing, verify it works:
ollama --version
That is it. No accounts, no API keys, no credit cards.
Connect OpenClaw to Ollama
Ollama has a native OpenClaw integration. One command does everything:
ollama launch openclaw
This handles the full setup automatically:
- Installs OpenClaw if you do not have it
- Shows a model selector (local and cloud options)
- Configures the Ollama provider in your OpenClaw config
- Sets your chosen model as the primary
- Starts the gateway and opens the TUI
Already have OpenClaw installed? Use ollama launch openclaw --config to just change the model without starting the gateway. Or pass a specific model directly: ollama launch openclaw --model glm-4.7-flash
If the gateway is already running, it restarts automatically to pick up the new model. No manual config editing needed.
For manual configuration, add the Ollama provider to your OpenClaw config:
{
"models": {
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"apiKey": "ollama-local",
"api": "openai-completions"
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "ollama/glm-4.7-flash"
}
}
}
}
Best Local Models for OpenClaw
OpenClaw needs models with at least 64k context window and solid tool-calling support. Not every model handles that well. Here are the ones that actually work, according to Ollama's official recommendations and community testing:
| Model | Size | Best For | VRAM Needed |
|---|---|---|---|
| qwen3-coder | Various | Coding tasks, tool calling | 8-16 GB |
| glm-4.7 | Large | General purpose, reasoning | ~25 GB |
| glm-4.7-flash | Medium | Balanced speed and quality | ~25 GB |
| gpt-oss:20b | 20B | Balanced performance | 12-16 GB |
| gpt-oss:120b | 120B | Higher capability | 48+ GB |
Community picks: The r/LocalLLaMA subreddit reports that qwen2.5:14b-instruct and mistral-nemo also handle OpenClaw's tool calls well if you want alternatives.
Pull any model with one command:
ollama pull glm-4.7-flash
Then set it as your OpenClaw primary:
ollama launch openclaw --model glm-4.7-flash
Cloud Models Through Ollama (Still Free)
Here is something most people miss: Ollama also offers cloud models that are free to start. These run on Ollama's infrastructure but connect through the same interface:
| Cloud Model | Description |
|---|---|
| kimi-k2.5 | 1T parameter model, multimodal reasoning with sub-agents |
| minimax-m2.1 | Multilingual capabilities, fast coding |
| glm-4.7 (cloud) | Strong general-purpose reasoning |
Use them the same way:
ollama launch openclaw --model kimi-k2.5:cloud
This gives you frontier-level performance without paying for Anthropic or OpenAI API keys. The catch: your data does leave your machine for these cloud models. Pick based on your privacy needs.
The Hybrid Setup: Best of Both Worlds
The setup I actually recommend for most founders: local model as your primary, cloud API as a fallback.
Why? Local models handle 80-90% of daily tasks perfectly fine. Summaries, file management, scheduling, simple queries. But when your agent hits something complex (multi-step research, long code generation, nuanced writing), the cloud model takes over.
In your OpenClaw config, set the primary model to a local Ollama model. Then configure a cloud provider (Anthropic, OpenAI, or Ollama cloud) as a secondary. Your agent uses the local model by default and only hits the API when it needs to.
Result: privacy and zero cost for the majority of your messages. API-quality responses when it actually matters.
Pro tip: Start with a local-only setup for a week. Track which tasks the model struggles with. Then add a cloud fallback specifically for those use cases. You will be surprised how few tasks actually need frontier models.
Hardware Requirements
Local models need GPU memory (VRAM) or a lot of unified memory (Apple Silicon). Here is the reality:
| Hardware | What You Can Run |
|---|---|
| 8 GB RAM (M1/M2 MacBook) | 7B-8B models (qwen3:8b, phi-4-mini) |
| 16 GB RAM (M1/M2/M4 Pro) | 14B-20B models (gpt-oss:20b, qwen2.5:14b) |
| 32 GB RAM (M4 Pro/Max) | Most models including glm-4.7-flash |
| 64+ GB RAM (M4 Max/Ultra) | Everything, including 120B models |
| NVIDIA GPU 12 GB+ VRAM | 20B models comfortably |
| NVIDIA GPU 24 GB+ VRAM | Most recommended models |
Apple Silicon is the sweet spot for local AI. The unified memory architecture lets you run bigger models than equivalent discrete GPUs. A Mac Mini M4 with 32 GB handles most OpenClaw workloads without breaking a sweat. I run my entire 13-agent setup on one.
Important: OpenClaw requires at least 64k token context window for proper operation. Make sure your chosen model supports this. Smaller context windows cause truncation errors and broken tool calls.
Troubleshooting Common Issues
Model not responding or tool calls failing: Small models (7B and under) often struggle with OpenClaw's tool-calling format. Upgrade to at least a 14B parameter model. The community on r/LocalLLaMA confirms that qwen2.5:14b-instruct and mistral-nemo handle tool calls significantly better than smaller options.
Out of memory errors: Your model is too large for your hardware. Either use a smaller model or close other applications to free memory. On macOS, check Activity Monitor for memory pressure.
Ollama not connecting: Make sure the Ollama service is running (ollama serve) and accessible at http://localhost:11434. Test with: curl http://localhost:11434/api/tags
Slow responses: Expected with local models, especially on CPU-only machines. GPU acceleration makes a massive difference. On Apple Silicon, responses are fast because the GPU and CPU share memory.
Context window errors: Set the context length explicitly when pulling a model: ollama run glm-4.7-flash --context 65536. OpenClaw needs that 64k minimum.
For more OpenClaw errors and fixes, check the full troubleshooting guide.
Getting started with OpenClaw? Install it in under 5 minutes at installopenclawnow.com.
OpenClaw Lab is the #1 community for founders building AI agent systems. I share the exact playbooks, skill files, and workflows inside. Weekly lives, expert AMAs, and 265+ members building real systems.
Join OpenClaw Lab →