Shield threat model

Multicorn Shield is a control layer for AI agents. Where it sits in your stack decides what it can govern. This page explains the two integration shapes we ship today, what kinds of misuse Shield is designed to catch, and what sits outside the boundary so you are not surprised after you connect a host.

How Shield integrates

Native plugin

The plugin runs inside the host. It sees the same tool and action surface the host exposes to the model, not only traffic that happens to go out through MCP. Consent, permissions, spending checks, and activity records can apply to the full picture of what the agent tried to do.

Native plugin integrations today include OpenClaw, Claude Code, Windsurf, Cline, and Gemini CLI. When we say "full coverage" on this page, we mean this path.

Hosted MCP proxy

The host sends MCP tool calls through Shield's hosted proxy. Shield governs that MCP-shaped traffic: approvals, policy, logging, and spend rules attach to calls that flow through the proxy. Anything the host does without routing through MCP never touches Shield, by design.

This is the right mental model for Cursor, Kilo Code, GitHub Copilot, Continue, Goose, and other clients you connect through the proxy. Windsurf also supports hosted proxy mode alongside its native plugin. You get strong governance for MCP tools. You do not get visibility into built-in host capabilities that never pass through MCP.

Coverage by scenario

The table is read left to right: each row is a situation teams ask about. A check means Shield is positioned to detect, block, or record that class of behavior in that integration mode. An × means the behavior can happen without passing through Shield in hosted-proxy mode, so Shield does not see it.

Credential replay is a special case. For every action Shield logs, the activity trail is tamper-evident: entries are chained so a forged "past" record does not match the chain. That property holds for hosted-proxy mode as well as native plugin mode, as long as the action was one Shield recorded. Replay of credentials or history outside that recorded surface is a different problem.

Scenario	Native plugin	Hosted proxy (MCP)
Rogue or malicious MCP server
Credential replay against recorded activity
Agent misuses approved MCP access
Runaway spending on governed actions
Misuse of the host app's built-in tools
Direct external API calls that bypass MCP
Native shell, file, or system access outside MCP

What Shield is built to catch

Rogue or malicious MCP server. Shield sits on the path between your agent and MCP tools you configure. Traffic can be inspected, policy can require approval, and unexpected servers or tool shapes surface in activity rather than failing silently inside a long-running session.

Credential replay and audit integrity. When Shield writes an activity record, it becomes part of a hash-linked chain. An attacker who wants to pretend an action already happened, or to substitute a different past, breaks that chain. That is true in native plugin mode for the full action set, and in hosted-proxy mode for every MCP call Shield actually logged.

An agent stretching past the access you granted. If the model keeps calling tools you did not approve, or probes for broader access, Shield can block, prompt, or flag those attempts depending on your policy. The important part is the attempt is visible and attributable, not lost in console noise.

Runaway spending on governed actions. Limits and alerts attach to the operations Shield sees. On the native plugin path that is the broadest set of spend. On the hosted-proxy path that is spend that flows through MCP under Shield. Both are real controls; the difference is how much of the host's behavior Shield can govern.

What hosted proxy does not see

Hosted MCP proxy mode is intentional about its boundary. The host application may ship its own tools: file pickers, terminals, browser automation, first-party APIs, and other shortcuts that never emit an MCP request through your Shield URL. Shield does not intercept those calls. They are not hidden on purpose; they are simply a different channel.

That shows up in three ways teams care about. Built-in tools can read, write, or call services without MCP. Direct external API usage from the host skips the proxy if the integration is not MCP-shaped. Native shell or file access on the machine runs under the host's own permissions. None of that traffic is automatically mirrored into Shield unless you route it there.

If your risk model assumes Shield "sees everything the agent can do," hosted proxy mode does not meet that bar. It meets the bar for "everything the agent does through MCP while pointed at Shield." That is a meaningful slice for many teams, but it is not the same as full host coverage.

When you need full coverage

For the strongest governance story, use a native plugin integration where your host supports one. You get consent, permissions, activity, and spending aligned to the full tool surface the model can reach, not only MCP-routed work. Start from the docs if you are wiring a new agent: Getting started.

If you stay on hosted proxy mode, treat MCP traffic as the governed zone and plan controls elsewhere for anything the host can still do on its own.