I've been spending a lot of time recently with a simple question: what happens when an AI agent stops being something you open and start being something that's always there?
OpenClaw forced me to think about this. Not because it's the most technically sophisticated agent — it isn't — but because the way people use it reveals something structural about where computing is headed. Users don't launch OpenClaw to do a task and close it when they're done. They deploy it on a VPS or a Mac Mini, connect it to WhatsApp or Telegram, and let it run. It has a heartbeat mechanism that wakes it up on a schedule to check email, review calendars, monitor web pages. It reads and writes files, runs shell commands, executes scripts. It remembers things across sessions.
At some point, you have to ask: is this still an application? Or is it becoming a layer of the operating system?
I think it's the latter. And I think the implications are much larger than most people realize.
The shift from resource management to intent management #
Traditional operating systems manage resources. Processes, memory, files, devices, permissions — the entire abstraction stack is built around the question: how do we allocate finite computational resources across competing demands?
An agent layer manages something fundamentally different. It manages intent. What does the user want to achieve? What context is relevant? What actions are permitted? What should be remembered? Who should be consulted? These are not resource allocation questions. They are questions about goals, boundaries, trust, and judgment.
This means the agent layer isn't replacing Linux or iOS. It's growing on top of them. The traditional OS continues to manage hardware and processes. The agent OS manages everything above that: understanding what you want, deciding how to get it done, coordinating with other agents and services, and accumulating knowledge over time about you and your environment.
The analogy that keeps coming back to me is the GUI revolution. The command line didn't disappear when graphical interfaces arrived — it got covered by a new interaction paradigm. The GUI is now getting covered by another one. Instead of clicking through menus and filling out forms, you describe what you want, and an agent figures out how to make it happen across apps, devices, and networks. The human role shifts from operator to supervisor.
From function call to daemon #
There's a conceptual gap in how most people think about AI agents. They think of an agent as a fancy function call: you send a prompt, you get a response, the interaction ends. But OpenClaw — and the agents that will follow it — operates more like a Unix daemon. It's a background process with persistent state, continuous awareness of its environment, and the ability to act without being prompted.
This distinction matters enormously. A function call is stateless and reactive. A daemon is stateful and proactive. The difference between the two is the difference between a calculator and an assistant who sits in your office, watches your calendar, reads your email, and occasionally taps you on the shoulder to say "you should probably deal with this."
OpenClaw already demonstrates all five roles that a mature agent runtime would need to fill: system daemon (background execution), personal operator (task management), device governor (hardware interaction), communications proxy (multi-channel messaging), and memory engine (persistent context). No single one of these is revolutionary. The combination is.
And the combination has been validated by the market. 145,000 GitHub stars in the first week. 13% of OpenRouter's total token consumption. OpenAI recruiting the founder to lead their personal AI agent division. These are not signals of a niche experiment. They're signals of a product category being born.
Every device grows an agent #
Here's where the thought experiment gets interesting. If an agent can run on a Raspberry Pi and be operated remotely through a chat app, there's no architectural reason it can't run on a security camera, a car, a door lock, or a thermostat.
But "running on" is the wrong framing. The right framing is: every device grows a small agent that has local perception, local judgment, local memory, and the ability to communicate with other agents.
A camera agent doesn't just stream video. It understands scenes locally, detects anomalies, decides to upload a summary rather than raw footage, triggers the door lock agent if something looks wrong, and writes the event into a household memory system. That's a complete loop — perceive, judge, act, coordinate, record — instantiated on a single device.
A car agent doesn't just navigate. It learns driving patterns, monitors vehicle health, communicates with your calendar and the weather service, books maintenance appointments on your behalf, and negotiates with charging station agents for optimal scheduling.
None of these devices need to run a large language model locally. The realistic architecture is a lightweight agent shell on each device — a small model for fast local decisions — with heavy reasoning delegated to a cloud endpoint or a local compute hub (a Mac Mini in the closet serving as your household's AI brain). This is the "strong model leads, small models execute" pattern that Anthropic validated internally, now deployed at the physical edge.
The personal agent mesh #
Once every device has its own agent, the boundary of your "operating system" is no longer a single machine. It's your entire environment.
Your phone agent, laptop agent, car agent, home agent, and camera agents are distributed across different hardware, but they form a personal agent mesh. What they share isn't raw data — that would be a privacy and bandwidth nightmare. They share permission tokens, event summaries, state synchronization signals, task handoffs, memory indices, and goal priorities.
"My computer" becomes a weaker concept. What persists isn't the device. What persists is the agent identity and its accumulated memory. The device is just a host. This maps directly to the Context Capsule concept I've been developing — a structured, portable representation of who you are, what you're doing, and what you know — except now the portability isn't just across services. It's across physical devices.
The more I think about the architecture, the more I believe the endpoint isn't a flat mesh where every agent is equal. It's a hierarchy: one personal master agent that manages identity, long-term memory, and strategic priorities, with multiple device sub-agents that expose limited tools and local context. The master agent is the "you" in the system. The sub-agents are its eyes, ears, and hands in different physical locations. This is more realistic than a flat mesh because different devices have wildly different latency requirements, privacy levels, and failure isolation needs. Your door lock agent needs to make a decision in milliseconds. Your investment agent can take hours. They shouldn't share an architecture.
A new type of context: ambient memory #
This is the part that excites me most, because it extends a framework I've been building.
When an agent is always on and always perceiving its environment, it accumulates a kind of knowledge that doesn't exist in current AI systems. I'm calling it ambient memory. Your camera agent learns that traffic peaks at the intersection every day at 3 PM. Your thermostat agent learns that you sleep in on weekends. Your car agent learns your commute patterns and which routes you prefer when you're in a hurry versus when you're relaxed.
This is fundamentally different from conversational memory (what you told the AI) or document memory (what the AI read). Ambient memory is generated autonomously by agents observing the physical world over time. The user never explicitly provides it. It emerges from sustained attention.
In the Context Capsule framework I've been developing, the original design had three layers: a core identity layer (stable attributes), a situation layer (current state), and a knowledge layer (what you know). Ambient memory demands a fourth layer: an environmental perception layer — the agent's continuously updated model of the physical world around it. The key difference from the other three layers is that this one isn't built by the user. It's built by the agent through observation.
What gets really interesting is cross-device emergence. When the car agent's commute memory is combined with the home agent's routine memory by the master agent, patterns appear that no individual agent could see: "The user has been coming home an hour later every day for two weeks — the home energy schedule should adapt." Individual devices observe locally. The mesh generates insight globally. The user might not even be aware of the pattern until their agent surfaces it.
What's missing from the protocol stack #
We have MCP for the vertical connection between agents and tools. We have A2A for the horizontal connection between agents. But device-level agents need something we don't have yet: a local coordination layer.
When a camera agent detects an intruder and needs to trigger the door lock agent, that communication can't route through a cloud server. It needs millisecond-level latency. This is closer to mesh networking than to HTTP. It's closer to Bluetooth Low Energy than to REST APIs. The protocol stack for agent-to-agent communication within a local physical environment doesn't exist yet, and it's going to be a critical piece of infrastructure.
Identity and trust also need to be instantiated at device granularity. What decisions can your camera agent make on your behalf? What's the trust relationship between your camera agent and your neighbor's camera agent? The Delegation Contract framework — which I originally designed for user-to-agent authorization — needs to extend to user → master agent → sub-agent hierarchies. Each level needs its own permission boundaries, escalation rules, and rollback mechanisms.
And here's where it connects to another thread I've been pulling: the multi-model deliberation layer. When a dozen device agents need to make a coordinated decision — "should we lock the door, turn on the lights, and notify the user based on what the camera saw?" — that's not a single-model inference. It's a council. The camera agent contributes visual judgment, the door lock agent contributes access history, the phone agent contributes user location. They need to deliberate, weigh evidence, and reach a decision. This is exactly the deliberation layer I argued was missing from the agent protocol stack — except now it's instantiated in the physical world rather than in a software workflow.
Security is not a feature. It's a precondition. #
I need to be blunt about this because the excitement around agent-as-OS tends to skip over it.
CrowdStrike and Microsoft's security teams have already published detailed warnings about OpenClaw-class systems. The core vulnerability is architectural: these agents ingest external content (emails, web pages, documents) as data, while simultaneously holding system-level permissions and persistent credentials. A malicious instruction hidden in a web page can hijack the agent's decision chain. In March 2026, The Hacker News reported a vulnerability that enabled data exfiltration without any user interaction.
Now scale that to a world where every device in your home runs an agent. Camera agent, door lock agent, payment agent — all compromised simultaneously. The attack surface isn't one computer anymore. It's your entire life.
In multi-agent scenarios, the risk compounds: a poisoned output from one agent becomes the input to the next agent in the coordination chain. Indirect prompt injection can cascade across the mesh.
The correct design philosophy — and this is where I'm increasingly confident — is "minimum privilege, replayable, revocable, isolatable." Not "make the agent more powerful first, add security later." Security and capability must co-evolve, and security must lead. The Delegation Contract (pre-action constraints) and Action Receipt (post-action audit trail) framework I've been developing isn't optional infrastructure. It's the prerequisite for any of this to be trustworthy.
Three phases #
I see this playing out in three phases.
Phase 1 (now): Agent as application-layer enhancement. OpenClaw, Claude Code, Codex, various desktop assistants. The agent doesn't have system sovereignty. You summon it to do a task. It's a tool.
Phase 2 (2-3 years): Agent as system-resident layer. The agent starts deeply integrating with files, notifications, calendars, browsers, sensors, background tasks. This is the most critical and most dangerous transition — because it demands that permission systems, identity frameworks, and audit mechanisms mature simultaneously. If governance lags behind capability, we'll see large-scale security incidents followed by regulatory crackdowns that could set the entire direction back by years.
Phase 3 (5+ years): Agent as default device interface. Users don't buy "hardware + apps." They buy "hardware + agent personality + service network." Device competition shifts from chips, UI, and battery life to: Is your agent reliable? Does it understand you? Is it auditable? Is it secure? Can it collaborate with other agents?
The framework test #
One thing that surprised me when working through this analysis: almost all eight AI-era abstractions I've been developing activate in a single device-agent scenario.
Camera agent detects anomaly → generates an Intent Object → checks the user's Delegation Contract for authorization → triggers the door lock agent via A2A communication → generates an Action Receipt → writes the event to the Context Capsule's new ambient memory layer → if multi-device coordination is needed, the Emergent Agent Protocol kicks in to manage cross-agent negotiation.
Six of eight abstractions, activated by one camera noticing something unusual. That's a useful robustness check for the framework. If the abstractions were over-engineered or too abstract, they wouldn't map this cleanly to a concrete physical scenario.
The thin-versus-thick middleware test also applies here. A company that only provides an agent runtime (thin) will likely be absorbed by Apple, Google, or Microsoft — the same way thin API wrappers get absorbed by platform players. But if the runtime accumulates ambient memory, cross-device coordination patterns, and deep user preference models (thick), the switching cost becomes a relationship cost. Switching your agent ecosystem is like switching a colleague you've worked with for years. That's a moat.
And the deepest judgment remains the same one from my February checkpoint: the competition isn't about models. It's about context. Capability replication cost is approaching zero. Context construction cost is not. Whoever's agent accumulates the deepest ambient memory and the most complete user model has the strongest lock-in.
Open questions #
A few things I don't have answers to yet.
Which entry point does Agent OS eat first? The PC/phone OS layer where Apple, Google, and Microsoft have natural advantages? Or the smart home hub where fragmentation is severe and no single platform dominates? I lean toward the latter as the more realistic beachhead — precisely because there's no dominant OS for the smart home, so an agent layer can become the de facto coordination layer without displacing an incumbent.
What's the legal status of a "domesticated digital life form"? When an agent persists continuously, has memory, pursues goals, and makes autonomous decisions, who bears legal responsibility for its actions? The current Delegation Contract framework assumes "the principal is liable." But what happens when the agent's decisions exceed what the principal can understand or foresee?
Will the open-source vs. closed-source divide replay at the agent OS layer? OpenClaw's open-source nature was critical to its explosion. But an agent OS requires deep system integration, which is traditionally closed-source territory. Will we see a "Linux moment" — an open-source agent OS becoming the de facto standard, with commercial companies building services on top?
How do organizations govern thousands of non-human identities? Current data suggests the average enterprise employee already corresponds to 144 non-human identities (AI agents, bots, service accounts). In a world where each person has a dozen device-level agents, that number could reach thousands. Most organizations have no governance framework for this scale. Device-level agent governance isn't just a technical problem — it's an organizational architecture problem requiring new roles, new processes, and new audit standards.
What I'm tracking next #
Three threads to follow from here.
First, ambient memory as the fourth layer of Context Capsule. This needs more rigorous definition: what's the data model? How does it differ from RAG over sensor logs? What are the privacy implications of agents autonomously building environmental models?
Second, the local coordination protocol gap. MCP and A2A are necessary but insufficient for device-level agent meshes. The latency, reliability, and trust requirements of physical-world agent coordination are fundamentally different from cloud-based agent workflows. Someone needs to build the "local A2A."
Third, the connection between device-level multi-agent decision-making and the deliberation layer thesis. If a dozen agents need to coordinate a response to a physical event, the council mechanism — propose, critique, vote, synthesize — might be the right primitive. But it needs to work at millisecond timescales, not the multi-second timescales of current LLM inference.
This is where the frontier is. Not in making models smarter. In making the world around us agent-native.
This post synthesizes independent analyses from Claude Opus 4.6, GPT 5.4, Gemini 3.1 Pro, and GPT 5.4 via Perplexity on the same thesis. Each contributed distinct insights: GPT 5.4 provided the most complete hierarchical projection and the "domesticated digital life form" framing; Gemini 3.1 Pro supplied concrete security evidence; GPT 5.4 via Perplexity delivered the most pragmatic master-agent-plus-sub-agent architecture; Claude contributed the framework mappings, the ambient memory concept, and the Context Capsule extension.
I run the same question through multiple AI systems as a research methodology — not to get "the right answer," but to surface blind spots and stress-test ideas. If you're interested in the frameworks referenced here (Intent Object, Context Capsule, Delegation Contract, Action Receipt, Emergent Agent Protocol, the deliberation layer thesis), they're developed in earlier posts on this blog.