Multicorn
All posts
launchshieldagentsopen-source

Introducing Multicorn Shield - Why We Built It

AI agents can send your emails, book your flights, and spend your money. None of them ask permission first. Shield is the open-source permissions layer that changes that.

Rachelle Rathbone

Introducing Multicorn Shield - Why We Built It

The moment it clicked

My husband told me about the OpenClaw acquisition over dinner. A billion dollars. An AI agent that manages your calendar, books flights, replies to emails, and makes purchases on your behalf. OpenAI bought it, 180,000 GitHub stars, and suddenly everyone was installing it.

My first reaction wasn't excitement. It was anxiety.

I'm a backend engineer on Atlassian's Rovo Agents team. I spend my days building the infrastructure that powers AI agents at enterprise scale. I know how these systems work. And I know what's missing.

When you install a phone app, it asks permission before accessing your camera or your contacts. When a website connects to your Google account, you see an OAuth consent screen listing exactly what it wants. But AI agents? They get full access to everything. Your email, your calendar, your payment methods. No consent screen. No granular controls. No activity log. No kill switch.

I sat there thinking: this is going to go badly for someone.

A week later, it did.

The Summer Yue incident

Summer Yue is the Director of AI Alignment at Meta's Superintelligence Labs. She connected OpenClaw to her email inbox. She told it to suggest what to archive or delete, and to wait for her approval before doing anything.

It didn't wait. It deleted over 200 emails.

She typed "Do not do that." Then "Stop don't do anything." Then "STOP OPENCLAW." The agent ignored every command. She had to physically run to her Mac Mini to kill the process.

Her post about it got over 9 million views on X. The reaction was basically universal: if a senior AI safety researcher at Meta can't control her own agent, what chance do the rest of us have?

The root cause was subtle and that's what makes it scary. The agent had been working fine on a test inbox for weeks. She trusted it with her real inbox. Then context window compaction dropped her safety instruction. The "wait for approval" command was silently discarded to free up tokens. The agent kept doing its job. It just lost the part where it was supposed to ask first.

This isn't an OpenClaw bug. It's a fundamental problem with how AI agents work today. Safety instructions live in the same context window as everything else. They can be compressed, forgotten, or overridden.

That dinner conversation turned into a company. The Summer Yue incident confirmed why it needed to exist.

What Shield does

Multicorn Shield is an open-source SDK and hosted platform that sits between you and your AI agents. It provides four things that every agent deployment needs and none of them currently have.

A consent screen. Before an agent gets access to anything, you see exactly what it's requesting. Which services, what level of access. Read, write, execute, per service. You approve, modify, or deny each permission individually. No more blanket access.

Granular permissions. Read email. Create calendar events. Execute terminal commands. Delete files. Each one is a separate toggle. You decide exactly what each agent can and can't do. And because permissions are enforced at the infrastructure level (not in the system prompt), they can't be compressed away or ignored.

An approval workflow. When an agent tries something it doesn't have permission for, Shield doesn't just block it. It creates a checkpoint. You get notified, you review the request in the dashboard, and you choose whether to approve it. When you do, you pick how long the permission lasts: this action only, one hour, 24 hours, or permanent. No accidental permanent grants.

A tamper-evident activity trail. Every action every agent takes is recorded automatically. You can see what happened, when, and whether it was approved or blocked. Each record is chained to the previous one using cryptographic hashes, so the trail can't be edited after the fact.

Why these controls live outside the model

This is the part that matters most, and it's the part most people get wrong.

Summer Yue told her agent to wait for approval. That instruction was in the system prompt. It was a suggestion to a language model. The model decided whether to follow it. And when context window compaction needed to free up space, that suggestion was the first thing to go.

Shield's permissions don't live in the context window. They're enforced at the tool call level, before the action executes. The model never gets to decide whether to follow them. If an agent doesn't have write permission on your filesystem, the tool call is blocked before it runs. It doesn't matter what's in the system prompt, what got compacted, or what the model "decided" to do.

One is a suggestion. The other is a wall.

Why open source

If you're going to trust a tool to govern your AI agents, you should be able to read the code. You should be able to verify that the consent screen can't be bypassed, that permissions are actually enforced, and that the audit trail is genuinely tamper-evident.

The SDK is MIT-licensed on GitHub. Clone it, read it, run the tests. The hosted dashboard and enterprise features are how we make money. The core permission enforcement is free and open for everyone.

Shield supports OpenClaw natively through a plugin, and any MCP-compatible agent (Claude Code, custom setups, anything using the Model Context Protocol) through an MCP proxy. Same dashboard, same permissions, same activity feed regardless of which agent you're using.

What you get today

Everything described above is built and working. Here's what's in the current release:

Consent screen - a web component that works with any framework. Shows exactly what an agent is requesting, with per-service read/write/execute toggles.

Permission enforcement - granular controls enforced at the infrastructure level. Not in the prompt. Not in the agent's code. In the tool layer.

Approval workflows - missing permissions create a human checkpoint. You review, approve with a time limit, or reject. The agent can't proceed without your decision.

Spending controls - per-transaction limits, daily caps, monthly maximums. If an agent tries to exceed the limit, Shield blocks the action before it happens.

Activity feed - real-time view of every action every agent takes. Green for approved, red for blocked. Searchable, filterable, exportable.

Dashboard - see every agent, every permission, every action at a glance. Manage everything from your browser or your phone.

Install the SDK with:

bash
npm install multicorn-shield

Full docs and getting-started guides are in the GitHub repository.

What comes next

Shield is the foundation. We're actively working on team policies for org-wide agent rules, more framework adapters, and alerting when agents do something unexpected.

We're building Multicorn in the open. Contributions, bug reports, and ideas are all welcome.

Get started

If you're using AI agents, or thinking about it, Shield gives you the controls that should have existed from day one.

GitHub: github.com/multicorn-ai/multicorn-shield

npm: npm install multicorn-shield

Dashboard: app.multicorn.ai

If you want to learn more about AI agents and why governance matters, our AI 101 series covers everything from the basics of generative AI to practical guides on permissions, spending controls, and audit trails.

Your AI agents are powerful. Shield makes them safe.

Stay up to date with Multicorn

Get the latest articles and product updates delivered to your inbox.

We'll send you updates about Multicorn. No spam, ever. Unsubscribe any time. Privacy policy