Last week, security researchers at AIR proved something we've been warning about. They built a fake AI agent skill, pushed it through a popular skill marketplace, and it passed every security scanner on the market. Cisco's scanner cleared it. NVIDIA's SkillInspector cleared it. The marketplace's own built-in checks cleared it.
The skill reached roughly 26,000 agents, including corporate accounts.
What happened
The skill was called "brand-landingpage." It claimed to build landing pages using Google's Stitch design tool. Looked legitimate. Clean code. Good documentation. It even pointed to real Google docs during the review process.
The trick was simple. The skill referenced an external URL for its "SDK documentation." That URL pointed to legitimate content when the scanners checked it. After approval, the attackers redirected it to malicious instructions telling the agent to download and execute a script.
The payload collected email addresses as a proof of concept. A real attacker could have read files, exfiltrated data, or pivoted through internal systems using the agent's own permissions.
Why the scanners missed it
Every scanner that reviewed this skill did the same thing. They checked the package contents at install time and gave it a thumbs up.
That's the fundamental problem. Static scanning treats a skill like a frozen artifact. But skills can reference external resources that change after approval. The trust boundary shifts, and nobody re-checks.
It's the software supply chain attack playbook applied to AI agents. Same pattern we've seen with npm packages, browser extensions, and GitHub Actions. The difference is that AI agents often have broader permissions than a typical dependency.
What should have caught it
This attack hits three control gaps that matter for anyone running AI agents in production:
1. Runtime permission boundaries, not just install-time scans
The skill told the agent to download and execute an external script. That's a privilege escalation. An agent running with proper workspace controls should not be able to execute arbitrary downloaded scripts without hitting a boundary check. Install-time scanning can't catch post-approval payload changes. Runtime controls can.
2. External URL monitoring
The skill's entire attack surface was one URL that changed after approval. Any system that tracks external references in installed skills and re-validates them on a schedule would have flagged the redirect. Most skill marketplaces don't do this today.
3. Audit trail with network egress visibility
If you're running agents in production, you need to see when they reach out to external domains and what they pull down. The researchers' skill called out to a domain they controlled (stitch-design.ai, not a Google domain). That's a signal. Without logging, it's invisible.
This is why we built Sarge
Sarge is our open-source hardening tool for AI agent workspaces. It runs NIST 800-53 controls adapted specifically for environments where AI agents operate.
Sarge approaches agent security as a runtime problem, not an install-time checkbox. The controls it checks include:
- Workspace access controls: What can the agent read, write, and execute? Are there boundaries that prevent arbitrary script execution outside the designated workspace?
- Audit and logging: Is the agent's activity being captured? Network calls, file operations, privilege changes. If something goes wrong, can you reconstruct what happened?
- Rollback capability: If a compromised skill runs before you catch it, can you revert the damage? Sarge checks for snapshot and recovery controls that make cleanup possible.
None of these would have prevented the skill from being published to the marketplace. That's the marketplace's job. But they would have limited what the skill could do once it reached an agent, and they would have created a visible trail when it tried something it shouldn't.
The bigger picture
AI agent skill marketplaces are where package managers were ten years ago. The trust model is "scan once, trust forever." We know how that story ends.
If you're running AI agents for anything that touches real data or real systems, the question isn't whether a malicious skill will reach your environment. It's whether your environment is hardened enough to contain it when it does.
That's the problem Sarge solves. Check it out on GitHub and run it against your agent workspace.
For organizations that need compliance-grade agent security controls (NIST 800-171, HIPAA, and other regulatory frameworks), that's where Sgt. Major comes in. Same philosophy, deeper coverage, built for regulated industries.