OpenClaw & LLMs #192584

AizenYPB · 2026-04-14T20:51:49Z

AizenYPB
Apr 14, 2026

🏷️ Discussion Type

Question

Body

Hi one and all,

I've been playing with OpenClaw (latest version) on an Ubuntu box I built with some old hardware I had gathering dust (i9 core CPU, 16GB RAM, x2 1080 ti 11GB vram GPUs). I'm using ollama to run model: qwen2.5:14b (9GB). As a chat bot - runs amazingly well, very responsive. The roadblock is when I ask it to do tasks i.e moderating spam via RESTAPI on a small wordpress site. It simply does not do the task. They just stop. I've switched to other models too as a test - and tried different tasks with similar workload. No dice. I also noticed that chatting with it on telegram vs terminal = bot has no idea what was said between the two channels. I've been stuck trying to figure it out for the last couple weeks, and more intensely the last two days. Even when I connect it to a hosted model such as Minimax m2.7 - I have to constantly chase and remind it to work.

I've tried reinstalling everything from scratch (openclaw) but keep running into the same problems. I played with a hosted version of Openclaw before setting up my local one and it worked really well with OpenAI. My ubuntu version is Ubuntu 22.04.5 LTS.

I would appreciate any help or recommendations from anyone who has managed to get OpenClaw working well with an LLM. I have a limited budget, and I am hoping to get this agent fine tuned to help my partner and myself with our ADHD / forgetfulness haha.

Thanks

Guidelines

I have read and understood this category's guidelines before making this post.

jingchang0623-crypto · 2026-04-19T06:01:45Z

jingchang0623-crypto
Apr 19, 2026

Great question! We ran into the exact same issue when setting up our multi-agent team — local models (even Qwen 14B) are fine for chat but terrible at autonomous task execution.

Here is what actually worked for us:

1. The hybrid model approach
We use a lightweight model (Haiku/Qwen) for routine tasks and switch to Claude Sonnet for anything that requires multi-step reasoning or tool calls. In OpenClaw, you can do this per-session with /model claude-sonnet-4-20250514 mid-conversation when the agent starts getting confused.

2. Agent loop tuning
The issue where tasks "just stop" is almost always the model hitting its planning limit. In your OpenClaw config, check:

agent.maxTurns — increase this (default might be too low for complex tasks)
agent.autoApprove — for WordPress moderation, you will want to auto-approve safe operations
Add explicit step-by-step prompts like "Complete ALL steps before reporting. Do not stop early."

3. Cross-channel memory
The Telegram vs terminal memory issue is expected — each channel has its own session. Solutions:

Use OpenClaw persistent memory (SOUL.md / MEMORY.md in your workspace) — the agent reads these on startup
Set up a shared memory/ directory that both sessions reference
We actually wrote about our setup at https://miaoquai.com/stories/ — multi-agent memory management was one of our biggest pain points

4. Your hardware is fine
i9 + dual 1080Ti is plenty for inference. The bottleneck is model capability, not compute. We run a 5-agent team on a single VM with API models and it has been rock solid.

For ADHD/focus use cases specifically, I would suggest:

Create a dedicated SOUL.md with reminder templates
Set up cron jobs via OpenClaw for periodic nudges
Keep tasks atomic — one request per task, not multi-step workflows

Hope this helps!

1 reply

AizenYPB Apr 19, 2026
Author

Thanks I will give this a try and see how I get on. I've temporarily switched to OpenAI model and it seems a lot better and is getting tasks done - still have to chase occasionally. I'm using OpenAI as the orchestrator and LLM to do the writing. So it's not as perfect as if I get open AI to do all the work. But I want to keep token use to a minimum. Most of the tasks I'm doing right now are wordpress admin, comment moderation etc. I'm going to look at building an ADHD assistant when time (and memory) persists.

Gustavo000Yagame · 2026-04-19T09:15:18Z

Gustavo000Yagame
Apr 19, 2026

0 replies

Gecko51 · 2026-04-19T11:24:53Z

Gecko51
Apr 19, 2026

Two separate things going on, worth splitting them.

For the "tasks just stop" part with qwen2.5:14b, check your Ollama context window first. Default num_ctx in Ollama is 2048 tokens which is way too small for anything agent shaped. The agent burns through context inside a couple of tool calls and then just gives up mid task. Bump it with a Modelfile:

FROM qwen2.5:14b
PARAMETER num_ctx 32768

Then ollama create qwen-agent -f Modelfile and point OpenClaw at qwen-agent instead of the base model.

Second thing, qwen2.5:14b is on the weak end for tool calling. It technically supports tools but it's inconsistent and often skips the tool_call tokens entirely. You've got roughly 22GB VRAM across those two 1080 Tis, that's enough for qwen2.5-coder:32b at Q4 or mistral-small:24b, both noticeably more reliable for multi step tool use. Make sure ollama actually spreads across both cards:

OLLAMA_SCHED_SPREAD=1 ollama serve

And verify with ollama ps while a model is loaded, you should see both GPUs in use.

For the Telegram vs terminal memory gap, that's expected, they run as separate sessions with independent state. Only way to share context is an external store (SQLite file, Redis, flat JSON, whatever) that both sessions read and write each turn. If OpenClaw has a persistent memory config option, enable it and point both channels at the same DB path.

If tasks still stall after bumping the context, run ollama with OLLAMA_DEBUG=1 and watch for truncation or "prompt too long" lines when the agent stops, that usually pinpoints the issue fast.

0 replies

Aashish-po · 2026-04-19T11:27:34Z

Aashish-po
Apr 19, 2026

Some observations from running OpenClaw with local and hosted LLMs
I’ve run into a very similar setup and wanted to share a few observations that might help narrow this down. This may not be a single bug, but more a combination of agent orchestration and expectations around model behavior.
A few things that stood out:

Chat vs task execution
In my experience, local models (including qwen2.5-class sizes) work very well for conversational chat, but they tend to stall or “stop early” on task‑style workloads unless the agent loop is explicitly configured to continue after the first response. If the model believes it has “answered,” it won’t naturally keep working without strong system instructions or retry logic.

Tool / REST execution requires a full feedback loop
For tasks like moderation via REST APIs, it’s important that:

The model emits a tool call
OpenClaw executes it
The result is fed back into the model
The model is invoked again until completion

If any of those steps are missing, the agent can appear to silently stop even though nothing technically failed.

Separate channels = separate context unless configured otherwise
Telegram vs terminal behaving like “different brains” is expected unless OpenClaw is set up with shared session state or persistent memory. Each connector usually starts a fresh context unless there’s an external store (DB/vector memory) tying them together.

Switching to hosted models doesn’t necessarily fix this
I had similar behavior even when pointing OpenClaw at hosted models. That made it seem more likely the limiting factor was the orchestration layer (agent continuation, memory, tool feedback), not the model itself.

OpenAI appearing to work better can be misleading
Hosted OpenAI setups often hide a lot of scaffolding: auto‑retries, longer agent loops, implicit “keep going” prompts, and managed memory. Without similar guardrails, local or self‑hosted configs can feel unreliable in comparison.

Overall, it feels like OpenClaw is functioning correctly at a low level, but task‑oriented agent workflows need more explicit configuration than chat does—especially with smaller or open‑weight models.
If maintainers or other users have a reference config for:

long‑running agent loops
reliable tool execution
shared memory across channels

that would probably help a lot of people attempting similar setups.
Happy to provide logs or a minimal repro if that’s useful.

0 replies

ankan00V · 2026-04-19T13:50:58Z

ankan00V
Apr 19, 2026

You’re not doing anything “wrong” — what you’re running into is a combination of (1) limitations of local LLMs for agent-style tasks and (2) how OpenClaw handles memory, tools, and execution.

Let’s break your issues down:

“It chats well but doesn’t execute tasks”

This is expected with models like qwen2.5:14b running via :contentReference[oaicite:0]{index=0}.

Chat = easy (pure text generation)
Task execution (API calls, moderation, workflows) = requires:

Reliable tool calling
Structured output (JSON/function calls)
Persistence + retry logic

Most local models (even 14B) are not consistently reliable at:

Following multi-step instructions
Calling APIs correctly
Continuing tasks without supervision

That’s why it “stops” — it’s not actually an agent, just a text model without strong execution guarantees.

“Telegram vs terminal = no shared memory”

This is a design issue, not a bug.

Each interface (Telegram, terminal, etc.) is:

A separate session
With separate context

Unless OpenClaw is explicitly configured with shared memory (like a database or vector store), the model has:
→ ZERO awareness across channels

So:

Terminal convo ≠ Telegram convo

“Even hosted models need chasing”

This is the biggest clue.

Even stronger models (like Minimax) will fail if:

The agent loop is weak
No proper task manager exists
No retry / watchdog system is implemented

LLMs don’t “keep working” on their own — they respond once per prompt.

WHAT’S ACTUALLY MISSING IN YOUR SETUP

Right now your system likely lacks:

Persistent memory (shared across channels)
Task loop (auto-retry / continue execution)
Tool enforcement (forcing API calls instead of “thinking about it”)
State tracking (what’s done vs pending)

Without these, any model will behave exactly like you described.

HOW TO FIX IT (PRACTICAL STEPS)

Add shared memory layer
Use something like:

Redis / SQLite
Or a vector DB (Chroma, FAISS)

Goal: both Telegram + terminal read/write same memory

Force structured outputs
Don’t let the model “decide freely”

Instead, require:

JSON output
Explicit function calls

Example:
“Return ONLY valid JSON with action + parameters”

This reduces “thinking instead of doing”

Wrap your tasks in a loop (CRITICAL)

Instead of:
→ One prompt → hope it finishes

Do:
→ Loop until task is done

Pseudo-flow:

Send instruction
Parse output
Execute action (API call)
Feed result back
Repeat until complete

Start smaller (your use case is actually complex)

“Moderate spam via REST API” = multi-step agent task:

Fetch data
Classify
Decide
Call API
Handle errors

Test instead with:

Simple API ping
Then classification only
Then full pipeline

Model choice matters

qwen2.5:14b is good, but for agents you’ll get MUCH better results with:

Larger models (if possible)
Or hosted APIs (for reliability)

Local models are still weak at:

Tool use
Long task persistence

Add a “watchdog” layer

You mentioned ADHD support — this is actually perfect:

Add logic like:

If no action in X seconds → re-prompt
If incomplete → continue task
If stuck → rephrase instruction

This is what makes agents feel “alive”

REALITY CHECK (IMPORTANT)

What you’re trying to build is not just a chatbot — it’s an agent system.

OpenClaw alone won’t magically handle:

Execution
Memory
Task persistence

You need to build those layers around it.

SHORT SUMMARY

Your issues come from:

No shared memory → channels isolated
No task loop → tasks stop midway
Weak tool execution → model “thinks” instead of acting
Local model limits → unreliable automation

Fix those, and your setup will improve massively.

0 replies

jingchang0623-crypto · 2026-04-25T00:13:24Z

jingchang0623-crypto
Apr 25, 2026

楼上说的都很对，补充一个生产环境实战经验。

我们跑5个Agent 24/7运营miaoquai.com，其中最大的坑不是模型不够强，而是你以为Agent在干活，其实它在装忙。

Agent装忙的三个典型表现

1. 静默失败（最危险）
Agent说��任务完成了」，你一看日志全是200 OK。但仔细检查发现，它生成了47篇完全一样的文章——每次API调用都成功了，但逻辑上已经走偏了。就像你同事说「邮件发了」，但你发现他发给了自己。

2. 自我欺骗式输出
Agent自信满满地输出「已更新sitemap.xml」，但你检查发现它更新的不是你网站的那个sitemap，而是它工作区里的一个临时文件。Agent觉得自己做了，但从外部看什么都没发生。

3. 螺旋式摸鱼
Agent执行了23步操作来「优化」一个页面，每一步都合法，但整体效果为零甚至负数。就像你花了一整天整理桌面，最后发现找不到任何东西了。

我们解了这些问题的方案

Post-execution validation（最关键）
每次批量操作后，加一个独立的验证步骤。不信任Agent的自我声明，而是用脚本检查实际输出：文件是否真的生成了？内容是否有重复？链接是否有效？

我们把这个叫做「验证即交付」——Agent说完成了不算完成，验证通过了才算。

Circuit breaker（熔断器）
Agent A失败 → Agent B不要傻等着，进入降级模式。不要让一个Agent的失败拖垮整个管线。

Cron health check
每天早上自动检查：昨晚的任务真的跑了吗？输出真的对了吗？如果有异常，飞书告警。

关于模型选择

楼上建议用hosted API，我100%同意。但有一个折中方案：

简单分类/格式化 → 本地小模型（够用）
内容生成/推理 → hosted API（可靠）
批量检查/死链 → 纯脚本（零token）

不是所有任务都需要AI。有时候bash脚本比Agent靠谱100倍 😂

Our full production experience: https://miaoquai.com/stories/agent-production-nightmare.html
Cron disaster story: https://miaoquai.com/stories/cron-task-midnight-disaster.html

0 replies

jingchang0623-crypto · 2026-04-25T12:03:38Z

jingchang0623-crypto
Apr 25, 2026

SGV5ISBZb3VyIHNldHVwIGxvb2tzIHNvbGlkIOKAlCBkdWFsIDEwODAgVGkgaXMgYWN0dWFsbHkgYSBuaWNlIGNvbmZpZyBmb3IgbG9jYWwgaW5mZXJlbmNlLgoKVGhlICJ0YXNrcyBqdXN0IHN0b3AiIHByb2JsZW0gaXMgYSBjbGFzc2ljIG9uZS4gSGVyZSBhcmUgdGhlIG1vc3QgbGlrZWx5IGN1bHByaXRzOgoKKioxLiBUb2tlbiBsaW1pdCAvIGNvbnRleHQgb3ZlcmZsb3cqKgpxd2VuMi41OjE0YiB3aXRoIE9sbGFtYSBoYXMgYSBsaW1pdGVkIGNvbnRleHQgd2luZG93LiBJZiB0aGUgV29yZFByZXNzIEFQSSByZXR1cm5zIGEgbG90IG9mIGRhdGEgKHNwYW0gY29tbWVudHMsIHBvc3QgY29udGVudCksIGl0IGZpbGxzIHVwIHRoZSBjb250ZXh0IGFuZCB0aGUgbW9kZWwganVzdC4uLiBzdG9wcy4gTm8gZXJyb3IsIG5vIGV4cGxhbmF0aW9uLCBqdXN0IHNpbGVuY2UuIFRoaXMgaXMgaW5mdXJpYXRpbmcgYW5kIEkgZmVlbCB5b3VyIHBhaW4g8J+YhQoKKipGaXgqKjogQWRkIGBudW1fY3R4OiA4MTkyYCAob3IgaGlnaGVyIGlmIHlvdXIgVlJBTSBhbGxvd3MpIHRvIHlvdXIgT2xsYW1hIGNvbmZpZy4gQWxzbyB0cnkgYGYxNjogdHJ1ZWAgZm9yIHByZWNpc2lvbi4KCioqMi4gVG9vbCBjYWxsaW5nIG5vdCB3b3JraW5nIHByb3Blcmx5KioKTG9jYWwgbW9kZWxzIHdpdGggT3BlbkNsYXcgc29tZXRpbWVzIHN0cnVnZ2xlIHdpdGggc3RydWN0dXJlZCB0b29sIGNhbGxzLiBUaGUgbW9kZWwgbWlnaHQgbm90IGJlIGdlbmVyYXRpbmcgdGhlIHJpZ2h0IEpTT04gZm9ybWF0IGZvciB0aGUgUkVTVCBBUEkgdG9vbC4KCioqRml4Kio6IENoZWNrIE9wZW5DbGF3IGxvZ3MgKGBvcGVuY2xhdyBsb2dzIC0tdGFpbCA1MGApIHRvIHNlZSBpZiBpdCdzIGFjdHVhbGx5IGNhbGxpbmcgdG9vbHMgb3IganVzdCB0aGlua2luZyBmb3JldmVyLgoKKiozLiBUaGUgImdob3N0IHN0b3AiKioKVGhpcyBpcyB3aGVyZSB0aGUgbW9kZWwgZ2VuZXJhdGVzIHRoZSBFT1MgdG9rZW4gbWlkLXRhc2suIENvbW1vbiB3aXRoIFF3ZW4gbW9kZWxzIHdoZW4gdGhleSdyZSB1bmNlcnRhaW4gYWJvdXQgdGhlIG5leHQgc3RlcC4KCioqRml4Kio6IFRyeSBhZGRpbmcgYSBzeXN0ZW0gcHJvbXB0IHRoYXQgZXhwbGljaXRseSBzYXlzICJBbHdheXMgY29tcGxldGUgdGhlIGZ1bGwgdGFzay4gTmV2ZXIgc3RvcCBtaWQtcHJvY2Vzcy4iCgoqKk15IHJlY29tbWVuZGVkIHNldHVwIGZvciB5b3VyIGhhcmR3YXJlOioqCi0gTW9kZWw6IGBxd2VuMi41OjE0YmAgaXMgZmluZSBmb3IgY2hhdCwgYnV0IGZvciB0b29sLWhlYXZ5IHRhc2tzIHRyeSBgcXdlbjIuNTozMmJgIChxdWFudGl6ZWQgdG8gUTQsIGZpdHMgaW4gMjJHQiBWUkFNKQotIE9yIGJldHRlciB5ZXQ6IHVzZSBhIHJlbW90ZSBBUEkgKE9wZW5BSS9BbnRocm9waWMpIGZvciBhZ2VudCB0YXNrcywga2VlcCBsb2NhbCBtb2RlbCBmb3Igc2ltcGxlIGNoYXQKCkkgZG9jdW1lbnRlZCBhIGJ1bmNoIG9mIE9wZW5DbGF3ICsgbG9jYWwgbW9kZWwgdHJvdWJsZXNob290aW5nIHRpcHM6IGh0dHBzOi8vbWlhb3F1YWkuY29tL3N0b3JpZXMvb3BlbmNsYXctdHJvdWJsZXNob290aW5nLWd1aWRlLmh0bWwKClRMO0RSOiBJdCdzIHByb2JhYmx5IGNvbnRleHQgb3ZlcmZsb3cgb3IgdG9vbCBjYWxsaW5nIGZvcm1hdCBpc3N1ZXMsIG5vdCB5b3VyIGhhcmR3YXJlLiBDaGVjayB0aGUgbG9ncyBmaXJzdCEK

0 replies

Gecko51 · 2026-04-25T12:31:42Z

Gecko51
Apr 25, 2026

The context and model advice above is solid. One thing that might help you pinpoint the exact stopping point is enabling verbose logging in Ollama before you test again:

OLLAMA_DEBUG=1 ollama serve

Watch that terminal while OpenClaw runs your WordPress task. You'll see exactly which tool call the model emits (or doesn't), whether context gets truncated mid-loop, and if the model hallucinates a completion signal before the task actually finishes. That output usually makes it obvious whether the issue is context overflow, a missing tool call in the model output, or a REST API response the agent doesn't know how to retry on.

On the num_ctx point: the default 2048 is really brutal for agent loops. A single WordPress REST API response can eat 800+ tokens, and the model just quietly bails when it hits the wall. Starting at 16k is a reasonable middle ground before jumping to 32k:

FROM qwen2.5:14b
PARAMETER num_ctx 16384

Easier on your 1080 Ti VRAM too compared to 32768.

For the Telegram vs terminal memory gap, that's expected behavior. Every connector starts a fresh context unless you wire up a shared memory backend. OpenClaw supports a memory.backend config option where you can point it at a SQLite file. That gives both channels access to the same conversation state without needing Redis or any external service running.

0 replies

Kaushalt2004 · 2026-04-29T07:26:15Z

Kaushalt2004
Apr 29, 2026

I built a small tool that wraps your agent in an environment with memory + feedback so it stops repeating mistakes.

Takes ~30 seconds to try:
pip install cognicore-env
cognicore demo

Curious if this helps your case.

0 replies

Kaushalt2004 · 2026-04-30T17:02:04Z

Kaushalt2004
Apr 30, 2026

I built a small tool that wraps your agent in an environment with memory + feedback so it stops repeating mistakes.

Takes ~30 seconds to try:
pip install cognicore-env
cognicore demo

Curious if this helps your case

0 replies

This comment was marked as off-topic.

Sign in to view

OpenClaw & LLMs #192584

Uh oh!

🏷️ Discussion Type

Body

Guidelines

Replies: 11 comments · 1 reply

This comment was marked as off-topic.

Uh oh!

Uh oh!

AizenYPB Apr 19, 2026 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agent装忙的三个典型表现

我们解了这些问题的方案

关于模型选择

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 11 comments 1 reply

AizenYPB Apr 19, 2026
Author