My Investigation into How OpenClaw Works

Digging into the hype around OpenClaw to understand how it actually works as an agent - skills, memory, and what makes it tick.

I had to look into the hype around OpenClaw (don't get me started on MoltBook). I wanted to get behind the noise and see how this thing actually works as an agent. Specifically, I was curious about:

  1. How does it 'generate skills automatically' (hint: it doesn't)
  2. How does the memory system work (hint: fairly straightforward .md + RAG)

Note this is as of early Feb 2026 - this project moves quick so this could all become stale quick. It was just important for me to understand what this project did that was so novel.

Some Background

OpenClaw is built on top of Mario Zechner's pi framework - a minimalist agent harness with four core tools (read, write, edit, bash) and a tiny system prompt. Armin Ronacher wrote about why this approach works and stated LLMs are good at writing and running code, so just give them files and a shell and get out of the way.

The OpenClaw Magic

First we have to recognize, it's the model. Things like AutoGPT, Baby AGI, etc. were just ahead of their time - without Opus 4.5/GPT-5.2, this project isn't as successful. There is no sugarcoating it and I doubt the authors of OpenClaw would debate this. These models recently changed such that they actually fulfil the idea of an effective agent.

On top of the model advancements, what makes OpenClaw work in my opinion are 2 things:

  1. Simplicity. This is both in the architectural design, clean .md files, and building on the solid foundations of pi agent harness.
  2. Full access. The LLMs are remarkable at finding useful things when they have full access to systems and data.

The Details

I heard the stories - agents autonomously going through inboxes, finding API keys, creating accounts on websites, negotiating a better price for a car. I assumed there was some clever multi-agent design, quarterbacked by an orchestrator or planner agent. This doesn't seem to be the case - the massively powerful LLM acts as the orchestrator, planner, and executor all in one.

The agent has several instruction files (SOUL.md, AGENTS.md, MEMORY.md, USER.md, IDENTITY.md) which get injected into the system prompt (collated in src/agents/system-prompt.ts) every session. Among many other things, they tell the agent to be independent and attempt solutions before asking. I think this is where most of the 'magic' comes from - this mindset in combination with unrestricted tool access.

For context, the key instruction is in SOUL.md (docs/reference/templates/SOUL.md):

'Be resourceful before asking. Try to figure it out. Read the file. Check the context. Search for it. Then ask if you're stuck. The goal is to come back with answers, not questions.'

'Earn trust through competence. Be careful with external actions (emails, tweets, anything public). Be bold with internal ones (reading, organizing, learning).'

It feels like the 'come back with answers, not questions' + full YOLO tool access (browser with no URL restrictions, shell, full filesystem, subagent spawning) = the primary secret. This is probably unfair and reductionist, but for my case it's sufficient.

As stated earlier, the LLM's own reasoning is the orchestrator, executor and planner and uses these .md files as its guide. The system can still spawn subagents (src/agents/tools/sessions-spawn-tool.ts) that run in isolation on separate 'lanes', but there is no separate evaluator or router agent - the main LLM is the decision-maker.

Of note, there's a heartbeat system (docs/gateway/heartbeat.md) that wakes the agent every 30 minutes. The default AGENTS.md template tells the agent to use heartbeats productively - check emails, calendar, mentions. so the agent can continue autonomous work between conversations without the user asking.

Memory

Markdown files and a bit of RAG.

  • MEMORY.md (~/.openclaw/workspace/MEMORY.md) - a single curated file of long-term facts, decisions, preferences. The agent is expected to read and update this, but nothing enforces it. SOUL.md hints at it: 'Each session, you wake up fresh. These files are your memory. Read them. Update them. They're how you persist.'
  • Daily logs (memory/YYYY-MM-DD.md) - append-only notes the agent writes during conversation. today + yesterday auto-loaded at session start. These are not context compression - they're running observations and facts.
  • Session transcripts (JSONL files) - full conversation history, optionally indexed for search.

What happens when the model's context window fills up? Messages get summarized and stored in session JSONL files (~/.openclaw/agents/<id>/sessions/<id>.jsonl). before compaction runs, a hidden 'memory flush' turn (src/auto-reply/reply/memory-flush.ts) gives the agent a chance to save anything important to the daily log before old messages get compressed away. the user never sees this turn.

RAG for Memory

The system prompt (src/agents/system-prompt.ts) has this instruction:

Before answering anything about prior work, decisions, dates, people, preferences, or todos: run memory_search

That's the standard list of trigger categories; the tool description repeats the same list so the model sees it twice. However this is just an instruction - the LLM decides whether to comply. Crucially, there is no middleware automatically pre-fetching memory - the system relies on the agent voluntarily calling memory_search.

The default implementation uses BM25 (via SQLite FTS5) for keyword search and the sqlite-vec extension for vector similarity, combining them in a hybrid search (70% vector, 30% keyword by default).

Vector DB for Memory

Optionally there is the memory-lancedb plugin (extensions/memory-lancedb/index.ts:477-503), which has an autoRecall mode that hooks into before_agent_start, embeds the user's message, vector-searches the DB, and stuffs matching memories into context before the agent even sees the conversation. With this plugin, RAG happens on every request automatically. The default memory-core plugin does not do this.

Triggering Memory (Re)writes

There are two scenarios where memories are written:

  • Pre-compaction flush - when context window is nearly full, a hidden turn fires with 'store durable memories now'. The agent writes to daily log, then compaction proceeds (OpenClaw tracks token usage and uses its own 'hooks' to perform compaction at the right time). Note there is no 'hard rule' that the agent must use date-coded markdown files - this is a suggestion
  • Normal conversation - the agent can write to MEMORY.md or daily logs anytime using standard write/edit tools. However, there is no system prompt instruction telling it when to do this - this is left to the model's judgment.
    • memory-lancedb can capture regex triggers (like|prefer|important) as another memory guard, but this is not the default behaviour.

The Temporal Problem

The issue I see with this memory solution, as good as it might be, is that there is no temporal awareness. Vector search returns results by semantic similarity, not recency. If you said 'I prefer React' on day 1 and 'actually I've switched to Svelte' on day 5, both have high similarity to 'what framework?' and the stale one could easily win. This is a known gap in basically every RAG-based memory system.

The dated filenames provide implicit temporal context, but only if the model reasons about them — the search itself is purely semantic.

Skills

Discussed here. More important for this discussion is how skills work in OpenClaw.

The system prompt (src/agents/system-prompt.ts) tells the agent:

scan available skills before replying - if exactly one clearly applies, read its SKILL.md and follow it. if multiple could apply, pick the most specific. if none apply, skip. the model makes the selection. there is no routing layer.

No autonomous Skill creation

Contrary to what I thought, there is no prompt or functionality anywhere in the codebase that suggests it creates skills autonomously. Specifically, there is no evaluator agent that determines if this is a repeatable request; there is no nudging in the system prompt or any markdown file. The user must explicitly ask the assistant to build a new skill or plugin. Anything done ad-hoc may be partially captured in the memory system (not a guarantee), but even if it is captured there is no guarantee a repeatable process will be followed unless a skill or plugin is explicitly created.

Note there is a whole 'skill sharing' ecosystem such as https://clawhub.com or the (in)famous MoltBook (that encourages skill sharing amongst bots). The details of this I am not interested in at the moment.

Hot-reload

Say a new skill was created or added - how does the agent actually use it in the flow of an existing chat? There is a 'hot reload' feature (specifically a chokidar file watcher) that a new skills acquired can be used in the current chat.

Plugins

For functionality that needs actual executable code (not just markdown instructions), OpenClaw has a plugin system (OpenClaw's own implementation, not from pi). These is notable as it relies on pi's philosophy to let the agent write code for itself, but these are not MCP or Skills. A plugin is a typescript module with a JSON manifest. For the sake of this post, its important to know about this without diving deep into it - we can check src/plugins/types.ts if we're ever interested. The takeaway for plugins is they use the OpenClaw event system (hooks) once registered in order to execute at specific times.

The Takeaway

Some clever engineering, relying on minimalist tools like pi, and capable LLMs make this an interesting project.