Every OpenClaw setup guide starts the same way: sign up for an API key, enter your credit card, watch the bill climb. But what if you could run the whole thing locally, on your own hardware, for exactly $0/month? That is what Ollama makes possible.

Why Run OpenClaw With Local Models

Three reasons founders go local:

1. Zero API costs. Claude, GPT, Gemini: they all charge per token. A busy OpenClaw agent running multiple sub-agents can burn through $50-150/month in API fees alone. Ollama models running on your machine cost nothing after the initial hardware.

2. Full privacy. Your conversations, business data, customer info, SOPs: none of it leaves your device. No third-party servers. No data retention policies to worry about. Everything stays on your SSD.

3. Offline capability. Internet goes down? Your agent keeps working. On a plane? Still running. This matters if you are building systems that need to be always available.

The tradeoff is real though. Local models are smaller and less capable than frontier cloud models like Claude Opus or GPT-4.5. For simple tasks (summaries, formatting, calendar checks, file organization) they work great. For complex multi-step reasoning, you will notice the difference.

Install Ollama

Ollama runs on macOS, Linux, and Windows. One command to install:

macOS and Linux:

curl -fsSL https://ollama.com/install.sh | bash

Windows (PowerShell):

iwr -useb https://ollama.com/install.ps1 | iex

Or download the installer directly from ollama.com/download.

After installing, verify it works:

ollama --version

That is it. No accounts, no API keys, no credit cards.

Connect OpenClaw to Ollama

Ollama has a native OpenClaw integration. One command does everything:

ollama launch openclaw

This handles the full setup automatically:

Already have OpenClaw installed? Use ollama launch openclaw --config to just change the model without starting the gateway. Or pass a specific model directly: ollama launch openclaw --model glm-4.7-flash

If the gateway is already running, it restarts automatically to pick up the new model. No manual config editing needed.

For manual configuration, add the Ollama provider to your OpenClaw config:

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://localhost:11434/v1",
        "apiKey": "ollama-local",
        "api": "openai-completions"
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/glm-4.7-flash"
      }
    }
  }
}

Best Local Models for OpenClaw

OpenClaw needs models with at least 64k context window and solid tool-calling support. Not every model handles that well. Here are the ones that actually work, according to Ollama's official recommendations and community testing:

ModelSizeBest ForVRAM Needed
qwen3-coderVariousCoding tasks, tool calling8-16 GB
glm-4.7LargeGeneral purpose, reasoning~25 GB
glm-4.7-flashMediumBalanced speed and quality~25 GB
gpt-oss:20b20BBalanced performance12-16 GB
gpt-oss:120b120BHigher capability48+ GB

Community picks: The r/LocalLLaMA subreddit reports that qwen2.5:14b-instruct and mistral-nemo also handle OpenClaw's tool calls well if you want alternatives.

Pull any model with one command:

ollama pull glm-4.7-flash

Then set it as your OpenClaw primary:

ollama launch openclaw --model glm-4.7-flash

Cloud Models Through Ollama (Still Free)

Here is something most people miss: Ollama also offers cloud models that are free to start. These run on Ollama's infrastructure but connect through the same interface:

Cloud ModelDescription
kimi-k2.51T parameter model, multimodal reasoning with sub-agents
minimax-m2.1Multilingual capabilities, fast coding
glm-4.7 (cloud)Strong general-purpose reasoning

Use them the same way:

ollama launch openclaw --model kimi-k2.5:cloud

This gives you frontier-level performance without paying for Anthropic or OpenAI API keys. The catch: your data does leave your machine for these cloud models. Pick based on your privacy needs.

The Hybrid Setup: Best of Both Worlds

The setup I actually recommend for most founders: local model as your primary, cloud API as a fallback.

Why? Local models handle 80-90% of daily tasks perfectly fine. Summaries, file management, scheduling, simple queries. But when your agent hits something complex (multi-step research, long code generation, nuanced writing), the cloud model takes over.

In your OpenClaw config, set the primary model to a local Ollama model. Then configure a cloud provider (Anthropic, OpenAI, or Ollama cloud) as a secondary. Your agent uses the local model by default and only hits the API when it needs to.

Result: privacy and zero cost for the majority of your messages. API-quality responses when it actually matters.

Pro tip: Start with a local-only setup for a week. Track which tasks the model struggles with. Then add a cloud fallback specifically for those use cases. You will be surprised how few tasks actually need frontier models.

Hardware Requirements

Local models need GPU memory (VRAM) or a lot of unified memory (Apple Silicon). Here is the reality:

HardwareWhat You Can Run
8 GB RAM (M1/M2 MacBook)7B-8B models (qwen3:8b, phi-4-mini)
16 GB RAM (M1/M2/M4 Pro)14B-20B models (gpt-oss:20b, qwen2.5:14b)
32 GB RAM (M4 Pro/Max)Most models including glm-4.7-flash
64+ GB RAM (M4 Max/Ultra)Everything, including 120B models
NVIDIA GPU 12 GB+ VRAM20B models comfortably
NVIDIA GPU 24 GB+ VRAMMost recommended models

Apple Silicon is the sweet spot for local AI. The unified memory architecture lets you run bigger models than equivalent discrete GPUs. A Mac Mini M4 with 32 GB handles most OpenClaw workloads without breaking a sweat. I run my entire 13-agent setup on one.

Important: OpenClaw requires at least 64k token context window for proper operation. Make sure your chosen model supports this. Smaller context windows cause truncation errors and broken tool calls.

Troubleshooting Common Issues

Model not responding or tool calls failing: Small models (7B and under) often struggle with OpenClaw's tool-calling format. Upgrade to at least a 14B parameter model. The community on r/LocalLLaMA confirms that qwen2.5:14b-instruct and mistral-nemo handle tool calls significantly better than smaller options.

Out of memory errors: Your model is too large for your hardware. Either use a smaller model or close other applications to free memory. On macOS, check Activity Monitor for memory pressure.

Ollama not connecting: Make sure the Ollama service is running (ollama serve) and accessible at http://localhost:11434. Test with: curl http://localhost:11434/api/tags

Slow responses: Expected with local models, especially on CPU-only machines. GPU acceleration makes a massive difference. On Apple Silicon, responses are fast because the GPU and CPU share memory.

Context window errors: Set the context length explicitly when pulling a model: ollama run glm-4.7-flash --context 65536. OpenClaw needs that 64k minimum.

For more OpenClaw errors and fixes, check the full troubleshooting guide.

Getting started with OpenClaw? Install it in under 5 minutes at installopenclawnow.com.

OpenClaw Lab is the #1 community for founders building AI agent systems. I share the exact playbooks, skill files, and workflows inside. Weekly lives, expert AMAs, and 265+ members building real systems.

Join OpenClaw Lab →