How to Turn OpenClaw into a Personal Team That Multiplies Your Productivity — Principles
A lot of people install OpenClaw and split into two camps: one group claims their productivity has multiplied several times over; the other group thinks — that's it? How is this any different from Yuanbao or Doubao? Isn't it just a large model with search bolted on?
Both groups are using the exact same tool. So where does the gap come from?
The problem isn't the tool itself. It's that we carry our "chatbot" habits — unchanged — into something fundamentally different. We're used to typing a sentence, waiting for a reply, and if we don't like it, rewriting the prompt. But OpenClaw isn't a chat box. It's a system capable of housing an entire team. Operating it with a chatbot mindset is like getting a piano and poking at it with one finger — it makes sound, but it has nothing to do with music.
I've spent a lot of time recently playing with OpenClaw, going from an initial "that's it?" to genuinely experiencing a several-fold boost in productivity. To help others bootstrap their own effective Lobster setup, I've put together a simple tutorial series:
- Principles (this post): The foundational design principles for OpenClaw — these determine whether your Lobster actually works well
- Roles & Skills: Design patterns and approaches for common agent roles and skills
- Code Development: How to let your Lobster manage Claude Code and oversee projects
- Security: How to use OpenClaw safely
This post is about principles. None of these five principles are about prompt tricks. They all point to the same core insight: managing AI agents is a management problem, not a prompt engineering problem.
1. Establish Values Before Starting Work — An Agent's Character Determines Its Judgment
After installing OpenClaw, the first thing to do is not to start working — it's to design.
Most people's approach: keep the default configuration, answer a few guided questions, and immediately have the agent start executing tasks. This is like starting a company, grabbing a random person off the street, giving them no training, no explanation of what the company does, not even telling them who you are — and just throwing work at them.
The result is predictable: they can do the work, but they work like a temp. Every time you have to hand-hold them through instructions. Without your explicit direction, they don't know how to make judgment calls. Output quality is all over the place.
The real way to make an agent reliable is to treat it like a new hire going through onboarding. In my system, every agent has four core files:
SOUL.md — Values. Not work instructions, but values and character. My Director Agent's SOUL contains:
"Never assume for the user. One good question beats three paragraphs of wrong work."
"You own quality — ruthlessly... Before delivering anything, ask yourself: Would I be embarrassed if the user found an obvious problem in this?"
"Verify, don't guess. 'I think this might work' is not a solution; 'I tested this and it works' is."
Notice — these aren't task instructions. They're "values" used for making judgment calls when no explicit instruction covers the situation. For example, when the agent encounters a vague requirement, "Never assume" makes it choose to ask rather than guess. When it gets a result that barely passes muster, "You own quality — ruthlessly" makes it choose to redo the work rather than ship something sloppy. The essence of values is: when nobody tells you what to do, what do you base your decisions on?
AGENTS.md — The Job Description. Defines this agent's scope of responsibility, work methodology, and the skills and tools it can invoke. Think of it as the "responsibilities" and "requirements" sections on a job posting.
skills/ — Work Methodology. Specific, reusable capabilities. Things like how to write an article, how to do data analysis, how to run a code review. Think of it as the company's SOP manual.
USER.md — Know Your Boss. This is a profile about "you." Your preferences, your aesthetic sensibilities, mistakes you've been burned by. My USER.md contains entries like:
"Values depth over breadth — surface-level output is #1 frustration"
"Blog writing: Flow > rigid structure. Frameworks are thinking tools, not labels."
The agent reads this file every time it starts up. It's not executing a cold, impersonal task — it's working for a person it understands better and better over time.
2. Build a Team, Don't Pile On Capabilities — Let Different Agents Own Their Roles
Out of the box, OpenClaw is a single agent — one agent handling everything. That's like one person simultaneously being the CEO, programmer, accountant, and product manager.
Many people think multi-agent setups are just for show: if the underlying model is the same, what's the point of splitting into multiple agents?
The point isn't capability — it's cognitive mode.
What qualities does a competent data collector need? Systematicness, fault tolerance, sensitivity to detail. My Scout Agent's SOUL defines itself this way:
"I gather intelligence. My scripts are my deliverables."
"When a script breaks: debug until it runs."
It's an executor. Its methodology is "explore manually first, then script it, then maintain the script."
And what does a competent content analyst need? Judgment, synthesis, an obsession with quality. My Secretary Agent's SOUL reads:
"I'm not a formatter or task executor — I bring judgment."
"Synthesis over listing — connect dots, surface the big picture."
These two cognitive modes conflict. Asking one agent to maintain mechanical precision during data collection while also exhibiting flexible judgment during content analysis — it compromises between the two modes, and the result is mediocre at both.
Here's a real example: suppose the same agent first does data collection — strictly following scripts, not missing a single result — and then immediately does content analysis — which requires bold prioritization and distilling insights. You'll find that during analysis, it retains the "don't miss anything" mindset from the collection phase, producing a bloated play-by-play that mentions every data point instead of a sharp, opinionated analysis. Cognitive mode inertia is real.
My system is a six-person team:
User
|
Director 🎯 Orchestrates everything, sole external interface
|
┌────────────────┼───────────────────┐
| | |
Scout 🔍 Secretary 📋 PM 📐
Data Collection Analysis & Synthesis Requirements
|
Developer 🏗️
Architecture & Implementation
Board 📊 Independent oversight, daily Director review
There's one role worth highlighting: Board (the board of directors). It sits outside the entire execution chain as an independent overseer. Every day it reviews the Director's behavior, comparing the values the Director claims in its SOUL against its actual actions. The Director can't review itself — just as in corporate governance, audit can't audit itself.
Another easily overlooked benefit is experience isolation. Scout accumulates platform knowledge like "use old.reddit.com to bypass Reddit CAPTCHAs." Secretary accumulates taste judgments like "the user doesn't like it when articles force connections between unrelated topics." If you mix these together, the agent's memory becomes a jumbled mess where critical lessons get drowned in noise.
3. Equip Your Agents with the Right Tools — Their Ceiling Is Defined by Their Toolchain
No matter how smart an agent is, it's all talk without the right tools. Most complaints about "AI can't do X" really mean "AI doesn't have the tools to do X."
There's a key design principle here: CLI first.
Humans are used to GUI interactions — clicking, dragging, scrolling. AI can operate GUIs too (e.g., through browser automation), but it's far less effective than CLI. GUIs are designed for human eyes; when layouts shift or buttons move, automation breaks. CLI takes parameters in and outputs structured data — for AI, that's like reading and writing its native language.
My Scout has a complete command-line toolchain:
node twitter_feed.js --max-items 50
node twitter_search.js --topic "AI agents" --keywords "AI agent,multi-agent" --max-items 30
node query_data.js --platform twitter --since 2025-02-20 --min-score 5 --limit 20
Every script takes parameters, outputs structured JSON, and writes to a SQLite database. Secretary doesn't touch raw data files directly — it queries through query_data.js by conditions: platform, date range, score threshold, count limit — all in one command.
Developer is even more interesting: it doesn't write code itself. It does architecture design, then spawns an independent coding_agent session via tmux to execute the coding, while monitoring progress and quality. It's like a tech lead — draws the architecture diagrams, hands the coding to engineers, but code review must go through them.
So when you want an agent to do something, ask one question first: does it have the tools to do it? If you want it to scrape Twitter, give it a Twitter scraping script. If you want it to manage multiple Claude Code instances, give it the ability to launch and monitor Claude Code. Chrome MCP is one approach; writing your own CLI wrappers is another. Without the right tools, an agent's intelligence is purely theoretical — all strategy, no execution.
4. Accumulate Experience Instead of Starting from Scratch Every Time — The Playbook Mechanism
Last year, workflow tools like ComfyUI were all the rage — break tasks into nodes, wire them up, and let AI follow the pipeline. For example, an automated Twitter digest: run the collection script, filter for topics of interest, summarize with a large model.
This approach is cheap, stable, and predictable. The problem is it can only do what you've predefined. When conditions change, the pipeline has to be rebuilt.
The other extreme is letting the agent freely explore: you say "check out what's interesting on Twitter today," and the agent figures it out from scratch. It might eventually arrive at the exact same process you would have manually defined — but it burned tens of times more tokens, and next time it does the same task, it'll have to figure it out all over again.
Both approaches have fatal flaws: pipelines can't adapt to change; free exploration can't accumulate experience.
My solution is a four-tier experience accumulation system:
| Stage | Where It Lives | Trigger |
|---|---|---|
| Observation | Daily log | A pattern is noticed |
| Memory | MEMORY.md | The same pattern appears twice |
| Playbook | playbooks/ | Proven effective and likely to recur |
| Skill | skills/ | Stable enough to be scripted |
The key is: don't rush the promotion. There's a line in PRINCIPLE.md that I particularly like:
"Don't rush promotion. A memory note that works is better than a premature playbook. A playbook that works is better than a buggy skill."
A useful note is worth more than a half-baked playbook. A working playbook is worth more than a buggy automated skill. This isn't bureaucracy — it's respect for the maturity of experience.
How does it work in practice? The first time Scout scraped Twitter, it explored manually: opened a browser, tried different extraction methods, found workable DOM selectors. That process burned a lot of tokens. After exploration, I had it write the process into a playbook. Next time it scraped Twitter, it checked the playbooks first for an existing procedure. If one existed, it followed the procedure. If not, it went the exploration route. If it discovered a better approach while executing the procedure, it updated the playbook on the spot.
Every playbook starts with the header "CURRENT BEST PRACTICE — Subject to change." It's not scripture — it's a recipe written in pencil, where you can cross out a line and scribble a note in the margin anytime.
5. Gets Better with Use — Let the Agent Evolve on Its Own
The first four principles address "how to make agents work well." This one addresses "how to make them work better over time."
When most people think about AI productivity gains, they think about single interactions: write a better prompt, get a better output. More advanced users add persistent memory. But memory is passive — the agent remembers past mistakes, but does it proactively change its own behavior?
I've designed two self-improvement mechanisms.
The first is performance reviews. The Board Agent reviews the Director's work at a fixed time every day. Not a vague "how's it going" review, but a line-by-line comparison of every value the Director claims in its SOUL against the day's actual behavior. Was each value practiced? Was it violated? Is there evidence?
This mechanism genuinely catches problems. In one review, the Board discovered a persistent error pattern:
"Director does not follow through on action items from previous reviews. 2026-03-02 review had action item to add proactive digest monitoring. 2026-03-03: same issue repeated — Scout collected 2,611 items, no digest suggestion."
The Director talked a big game about valuing quality, but in practice it was ignoring the Board's improvement recommendations. This isn't a hypothetical scenario — it actually happened, captured automatically by the system. After receiving this review finding a second time, the Director investigated the scheduled tasks and discovered that the cron job for automatically generating the daily digest was misconfigured — and fixed the issue.
The second is tiered autonomy. Not all self-improvement requires human approval — but you can't let it run completely unsupervised either:
| Level | What Changes | Who Approves |
|---|---|---|
| Level 1 | Memory files | Automatic |
| Level 2 | Scoring weights, heuristic rules | Automatic + version logged |
| Level 3 | Scripts | Self-tested and passing |
| Level 4 | AGENTS.md / SOUL.md | Director review |
| Level 5 | Architecture-level changes | User approval |
Changing a memory file? Go ahead. Changing scoring rules? Make the change and log the version. Changing its own SOUL? That needs approval from above. Changing system architecture? The user must sign off. It's like a company's authorization framework — routine expenses get self-reimbursed; major purchases go through an approval process.
These two mechanisms together form a closed loop: Board identifies problem -> Director receives feedback -> Director modifies its own behavioral guidelines or playbooks -> behavior improves on next execution -> Board verifies whether the improvement took effect.
This loop doesn't need your involvement. You only need to make decisions on Level 5 proposals — the rest, the system evolves on its own.
Conclusion: From Using a Tool to Building a Team
Looking back at these five principles — give your agents values, split them into a specialized team, equip them with the right tools, build a system for accumulating experience, and let the system evolve on its own — they share a common thread: stop treating AI as a tool and start running it as a team.
Tools are static. You buy a hammer, and ten years later it's still the same hammer. But a team learns, specializes, and improves. When you manage OpenClaw the way you'd manage a team, what you get isn't a tool with fixed capabilities, but a system that understands you better and grows more efficient over time.
Is this system perfect? Not even close. The Board genuinely caught the Director paying lip service to improvement suggestions. Some playbooks needed multiple iterations to get right. Some tools still need constant refinement. But that's exactly the point — this system can discover its own problems and attempt to fix them, rather than leaving all the discovery and repair burden on you.
There's an interesting historical analogy. In early 19th-century England, literacy rates were low. Machines like the sewing machine only required physical strength to operate, and the concept of "human capital" didn't yet exist. But in the latter half of the 19th century, large factories, steam engines, and complex accounting systems emerged — factories needed people who could manage, operate precision machinery, and keep books. Literacy and knowledge went from "nice to have" to "essential," and England's literacy rate skyrocketed as a result.
We're in a similar position with AI agents today. Right now, "managing AI" is a skill held by a few — how to divide labor, how to design incentives, how to make systems self-improve. Most people think it has nothing to do with them. But as AI agents permeate more and more work scenarios, these management skills will shift from "nice to have" to "fundamental," just as literacy eventually became a universal capability in modern society.
This is just the principles post. In upcoming posts, I'll cover specific patterns for role design, how to let agents manage Claude Code for development projects, and the security concerns you can't afford to ignore along the way.
Whether you can successfully bootstrap an effective agent team is, in the long run, a lever worth investing in. I hope these five principles help you avoid some detours along the way.