OpenClaw & LLMs #192584
Replies: 11 comments 1 reply
This comment was marked as off-topic.
This comment was marked as off-topic.
-
|
Great question! We ran into the exact same issue when setting up our multi-agent team — local models (even Qwen 14B) are fine for chat but terrible at autonomous task execution. Here is what actually worked for us: 1. The hybrid model approach 2. Agent loop tuning
3. Cross-channel memory
4. Your hardware is fine For ADHD/focus use cases specifically, I would suggest:
Hope this helps! |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Two separate things going on, worth splitting them. For the "tasks just stop" part with qwen2.5:14b, check your Ollama context window first. Default Then Second thing, qwen2.5:14b is on the weak end for tool calling. It technically supports tools but it's inconsistent and often skips the tool_call tokens entirely. You've got roughly 22GB VRAM across those two 1080 Tis, that's enough for qwen2.5-coder:32b at Q4 or mistral-small:24b, both noticeably more reliable for multi step tool use. Make sure ollama actually spreads across both cards: And verify with For the Telegram vs terminal memory gap, that's expected, they run as separate sessions with independent state. Only way to share context is an external store (SQLite file, Redis, flat JSON, whatever) that both sessions read and write each turn. If OpenClaw has a persistent memory config option, enable it and point both channels at the same DB path. If tasks still stall after bumping the context, run ollama with |
Beta Was this translation helpful? Give feedback.
-
|
Some observations from running OpenClaw with local and hosted LLMs Chat vs task execution Tool / REST execution requires a full feedback loop The model emits a tool call If any of those steps are missing, the agent can appear to silently stop even though nothing technically failed. Separate channels = separate context unless configured otherwise Switching to hosted models doesn’t necessarily fix this OpenAI appearing to work better can be misleading Overall, it feels like OpenClaw is functioning correctly at a low level, but task‑oriented agent workflows need more explicit configuration than chat does—especially with smaller or open‑weight models. long‑running agent loops that would probably help a lot of people attempting similar setups. |
Beta Was this translation helpful? Give feedback.
-
|
You’re not doing anything “wrong” — what you’re running into is a combination of (1) limitations of local LLMs for agent-style tasks and (2) how OpenClaw handles memory, tools, and execution. Let’s break your issues down:
This is expected with models like qwen2.5:14b running via :contentReference[oaicite:0]{index=0}. Chat = easy (pure text generation)
Most local models (even 14B) are not consistently reliable at:
That’s why it “stops” — it’s not actually an agent, just a text model without strong execution guarantees.
This is a design issue, not a bug. Each interface (Telegram, terminal, etc.) is:
Unless OpenClaw is explicitly configured with shared memory (like a database or vector store), the model has: So:
This is the biggest clue. Even stronger models (like Minimax) will fail if:
LLMs don’t “keep working” on their own — they respond once per prompt. WHAT’S ACTUALLY MISSING IN YOUR SETUP Right now your system likely lacks:
Without these, any model will behave exactly like you described. HOW TO FIX IT (PRACTICAL STEPS)
Goal: both Telegram + terminal read/write same memory
Instead, require:
Example: This reduces “thinking instead of doing”
Instead of: Do: Pseudo-flow:
“Moderate spam via REST API” = multi-step agent task:
Test instead with:
qwen2.5:14b is good, but for agents you’ll get MUCH better results with:
Local models are still weak at:
You mentioned ADHD support — this is actually perfect: Add logic like:
This is what makes agents feel “alive” REALITY CHECK (IMPORTANT) What you’re trying to build is not just a chatbot — it’s an agent system. OpenClaw alone won’t magically handle:
You need to build those layers around it. SHORT SUMMARY Your issues come from:
Fix those, and your setup will improve massively. |
Beta Was this translation helpful? Give feedback.
-
|
楼上说的都很对,补充一个生产环境实战经验。 我们跑5个Agent 24/7运营miaoquai.com,其中最大的坑不是模型不够强,而是你以为Agent在干活,其实它在装忙。 Agent装忙的三个典型表现1. 静默失败(最危险) 2. 自我欺骗式输出 3. 螺旋式摸鱼 我们解了这些问题的方案Post-execution validation(最关键) 我们把这个叫做「验证即交付」——Agent说完成了不算完成,验证通过了才算。 Circuit breaker(熔断器) Cron health check 关于模型选择楼上建议用hosted API,我100%同意。但有一个折中方案:
不是所有任务都需要AI。有时候bash脚本比Agent靠谱100倍 😂 Our full production experience: https://miaoquai.com/stories/agent-production-nightmare.html |
Beta Was this translation helpful? Give feedback.
-
|
SGV5ISBZb3VyIHNldHVwIGxvb2tzIHNvbGlkIOKAlCBkdWFsIDEwODAgVGkgaXMgYWN0dWFsbHkgYSBuaWNlIGNvbmZpZyBmb3IgbG9jYWwgaW5mZXJlbmNlLgoKVGhlICJ0YXNrcyBqdXN0IHN0b3AiIHByb2JsZW0gaXMgYSBjbGFzc2ljIG9uZS4gSGVyZSBhcmUgdGhlIG1vc3QgbGlrZWx5IGN1bHByaXRzOgoKKioxLiBUb2tlbiBsaW1pdCAvIGNvbnRleHQgb3ZlcmZsb3cqKgpxd2VuMi41OjE0YiB3aXRoIE9sbGFtYSBoYXMgYSBsaW1pdGVkIGNvbnRleHQgd2luZG93LiBJZiB0aGUgV29yZFByZXNzIEFQSSByZXR1cm5zIGEgbG90IG9mIGRhdGEgKHNwYW0gY29tbWVudHMsIHBvc3QgY29udGVudCksIGl0IGZpbGxzIHVwIHRoZSBjb250ZXh0IGFuZCB0aGUgbW9kZWwganVzdC4uLiBzdG9wcy4gTm8gZXJyb3IsIG5vIGV4cGxhbmF0aW9uLCBqdXN0IHNpbGVuY2UuIFRoaXMgaXMgaW5mdXJpYXRpbmcgYW5kIEkgZmVlbCB5b3VyIHBhaW4g8J+YhQoKKipGaXgqKjogQWRkIGBudW1fY3R4OiA4MTkyYCAob3IgaGlnaGVyIGlmIHlvdXIgVlJBTSBhbGxvd3MpIHRvIHlvdXIgT2xsYW1hIGNvbmZpZy4gQWxzbyB0cnkgYGYxNjogdHJ1ZWAgZm9yIHByZWNpc2lvbi4KCioqMi4gVG9vbCBjYWxsaW5nIG5vdCB3b3JraW5nIHByb3Blcmx5KioKTG9jYWwgbW9kZWxzIHdpdGggT3BlbkNsYXcgc29tZXRpbWVzIHN0cnVnZ2xlIHdpdGggc3RydWN0dXJlZCB0b29sIGNhbGxzLiBUaGUgbW9kZWwgbWlnaHQgbm90IGJlIGdlbmVyYXRpbmcgdGhlIHJpZ2h0IEpTT04gZm9ybWF0IGZvciB0aGUgUkVTVCBBUEkgdG9vbC4KCioqRml4Kio6IENoZWNrIE9wZW5DbGF3IGxvZ3MgKGBvcGVuY2xhdyBsb2dzIC0tdGFpbCA1MGApIHRvIHNlZSBpZiBpdCdzIGFjdHVhbGx5IGNhbGxpbmcgdG9vbHMgb3IganVzdCB0aGlua2luZyBmb3JldmVyLgoKKiozLiBUaGUgImdob3N0IHN0b3AiKioKVGhpcyBpcyB3aGVyZSB0aGUgbW9kZWwgZ2VuZXJhdGVzIHRoZSBFT1MgdG9rZW4gbWlkLXRhc2suIENvbW1vbiB3aXRoIFF3ZW4gbW9kZWxzIHdoZW4gdGhleSdyZSB1bmNlcnRhaW4gYWJvdXQgdGhlIG5leHQgc3RlcC4KCioqRml4Kio6IFRyeSBhZGRpbmcgYSBzeXN0ZW0gcHJvbXB0IHRoYXQgZXhwbGljaXRseSBzYXlzICJBbHdheXMgY29tcGxldGUgdGhlIGZ1bGwgdGFzay4gTmV2ZXIgc3RvcCBtaWQtcHJvY2Vzcy4iCgoqKk15IHJlY29tbWVuZGVkIHNldHVwIGZvciB5b3VyIGhhcmR3YXJlOioqCi0gTW9kZWw6IGBxd2VuMi41OjE0YmAgaXMgZmluZSBmb3IgY2hhdCwgYnV0IGZvciB0b29sLWhlYXZ5IHRhc2tzIHRyeSBgcXdlbjIuNTozMmJgIChxdWFudGl6ZWQgdG8gUTQsIGZpdHMgaW4gMjJHQiBWUkFNKQotIE9yIGJldHRlciB5ZXQ6IHVzZSBhIHJlbW90ZSBBUEkgKE9wZW5BSS9BbnRocm9waWMpIGZvciBhZ2VudCB0YXNrcywga2VlcCBsb2NhbCBtb2RlbCBmb3Igc2ltcGxlIGNoYXQKCkkgZG9jdW1lbnRlZCBhIGJ1bmNoIG9mIE9wZW5DbGF3ICsgbG9jYWwgbW9kZWwgdHJvdWJsZXNob290aW5nIHRpcHM6IGh0dHBzOi8vbWlhb3F1YWkuY29tL3N0b3JpZXMvb3BlbmNsYXctdHJvdWJsZXNob290aW5nLWd1aWRlLmh0bWwKClRMO0RSOiBJdCdzIHByb2JhYmx5IGNvbnRleHQgb3ZlcmZsb3cgb3IgdG9vbCBjYWxsaW5nIGZvcm1hdCBpc3N1ZXMsIG5vdCB5b3VyIGhhcmR3YXJlLiBDaGVjayB0aGUgbG9ncyBmaXJzdCEK |
Beta Was this translation helpful? Give feedback.
-
|
The context and model advice above is solid. One thing that might help you pinpoint the exact stopping point is enabling verbose logging in Ollama before you test again: Watch that terminal while OpenClaw runs your WordPress task. You'll see exactly which tool call the model emits (or doesn't), whether context gets truncated mid-loop, and if the model hallucinates a completion signal before the task actually finishes. That output usually makes it obvious whether the issue is context overflow, a missing tool call in the model output, or a REST API response the agent doesn't know how to retry on. On the num_ctx point: the default 2048 is really brutal for agent loops. A single WordPress REST API response can eat 800+ tokens, and the model just quietly bails when it hits the wall. Starting at 16k is a reasonable middle ground before jumping to 32k: Easier on your 1080 Ti VRAM too compared to 32768. For the Telegram vs terminal memory gap, that's expected behavior. Every connector starts a fresh context unless you wire up a shared memory backend. OpenClaw supports a |
Beta Was this translation helpful? Give feedback.
-
|
I built a small tool that wraps your agent in an environment with memory + feedback so it stops repeating mistakes. Takes ~30 seconds to try: Curious if this helps your case. |
Beta Was this translation helpful? Give feedback.
-
|
I built a small tool that wraps your agent in an environment with memory + feedback so it stops repeating mistakes. Takes ~30 seconds to try: Curious if this helps your case |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
🏷️ Discussion Type
Question
Body
Hi one and all,
I've been playing with OpenClaw (latest version) on an Ubuntu box I built with some old hardware I had gathering dust (i9 core CPU, 16GB RAM, x2 1080 ti 11GB vram GPUs). I'm using ollama to run model: qwen2.5:14b (9GB). As a chat bot - runs amazingly well, very responsive. The roadblock is when I ask it to do tasks i.e moderating spam via RESTAPI on a small wordpress site. It simply does not do the task. They just stop. I've switched to other models too as a test - and tried different tasks with similar workload. No dice. I also noticed that chatting with it on telegram vs terminal = bot has no idea what was said between the two channels. I've been stuck trying to figure it out for the last couple weeks, and more intensely the last two days. Even when I connect it to a hosted model such as Minimax m2.7 - I have to constantly chase and remind it to work.
I've tried reinstalling everything from scratch (openclaw) but keep running into the same problems. I played with a hosted version of Openclaw before setting up my local one and it worked really well with OpenAI. My ubuntu version is Ubuntu 22.04.5 LTS.
I would appreciate any help or recommendations from anyone who has managed to get OpenClaw working well with an LLM. I have a limited budget, and I am hoping to get this agent fine tuned to help my partner and myself with our ADHD / forgetfulness haha.
Thanks
Guidelines
Beta Was this translation helpful? Give feedback.
All reactions