Hey @shortsegments, great guide! I'm running OpenClaw on a box with 62GB RAM (DDR5) and have tested local models extensively.
For your readers considering local models on small RAM:
My setup: 62GB DDR5, tested Qwen 2.5 7B/14B, Llama 3.1 8B, and smaller coders.
Context window reality: With a 32k context + system prompt + memory files + tool definitions, you're easily looking at 20-30GB just for context. That doesn't leave much for the model weights + KV cache.
The problem: For complex tasks (reading multiple files, multi-step reasoning, code analysis), small local models (7-14B) with limited context hit a wall. They either:
- OOM when context fills up
- Start forgetting earlier context
- Can't hold enough code context to understand large repos
What works: Smaller, focused tasks. Summarizing a single file? Sure. Debugging a specific function? Maybe. Navigating a multi-file codebase and planning changes across 10+ files? No.
Verdict: 64GB is just enough for a 7B model with ~16k context if you're careful. 14B models with proper context will likely OOM on 64GB. For agentic work on large codebases, you really need 128GB+ or cloud.
Curious if anyone has had better luck with specific quantizations or model configurations though!
Thanks for the feedback and tips, I will share them with my son, who is using a VPS and doing a virtual computer model to run openclaw locally and it has been a challenge just keeping it running. Installing Caffeine helped, but now the real challenge begins keeping it stable and not having it crash. I suspect one of his downloads is bad or corrupt, but he will need to figure it out