CybersecurityMay 1, 202613 min read

AI Coding Agents Got Hacked — Attackers Stole Credentials, Not Models

SI

Secured Intel Team

Editor at Secured Intel

 AI Coding Agents Got Hacked — Attackers Stole Credentials, Not Models

Imagine you hired a super-smart assistant to help you build a house. You gave them a master key to your entire property — the front door, the storage room, the safe. Now imagine a thief doesn't bother picking any locks. They just trick your assistant into handing over the master key.

That is exactly what happened to Claude Code, GitHub Copilot, and OpenAI's Codex in a nine-month stretch of 2025–2026. Attackers did not break the AI itself. They stole the credentials — the digital keys — the AI agents were holding. Once they had those keys, they walked straight into production systems, cloud environments, and code repositories. No hacking required. Just credential theft, the oldest trick in the book, powered by brand-new attack surfaces.


Introduction: The Master Key Is the Attack Surface

On March 30, 2026, researchers at BeyondTrust demonstrated something deeply uncomfortable: a single crafted GitHub branch name was enough to steal Codex's OAuth token in cleartext. OpenAI classified it a Critical P1 incident. Two days later, Anthropic's Claude Code source code appeared on the public npm registry. Within hours, Adversa found that Claude Code silently dropped its own deny-rule enforcement once a command exceeded 50 subcommands — a vulnerability later patched in version 2.1.90.

These were not accidents. They were the latest entries in a documented, nine-month run of attacks against every major AI coding platform — Codex, Claude Code, GitHub Copilot, and Google's Vertex AI — carried out by six separate research teams. Every single exploit followed an identical pattern: find the credential the agent is holding, extract it, and authenticate directly to a production system.

No malware. No zero-day exploits of AI logic. Just credential theft. And AI agents make credential theft catastrophically easier, because they hold privileged access by design, operate continuously without human sessions anchoring their actions, and ingest untrusted content — pull request titles, issue descriptions, branch names — as part of their normal workflow.

This post breaks down exactly how these attacks worked, why traditional security controls miss them entirely, and what your security team needs to do before the next one lands.


How AI Coding Agents Became Privileged Identity Targets

To understand why attackers went for credentials instead of models, you need to understand what AI coding agents actually are from a security architecture perspective: they are non-human identities with privileged access.

The Agent Identity Problem

When your team sets up Claude Code, GitHub Copilot, or Codex, the agent receives OAuth tokens, API keys, and service account credentials that grant it access to repositories, CI/CD pipelines, cloud environments, and sometimes production secrets. These credentials are granted at setup and rarely reviewed afterward.

This creates a new class of privileged identity that most organizations have never inventoried. Cloud Infrastructure Entitlement Management (CIEM) tools, Privileged Access Management (PAM) platforms, and Identity Governance and Administration (IGA) systems were not built with AI agents in mind. The credentials exist. The access is real. The oversight is absent.

MITRE ATT&CK technique T1528 (Steal Application Access Token) describes exactly this pattern — and it maps cleanly onto every exploit disclosed in this wave.

Why Agents Are Uniquely Exposed

Risk FactorHuman DeveloperAI Coding Agent
Holds credentialsYesYes
Has active human sessionYesNo
Ingests untrusted contentRarelyConstantly
Monitored by PAM/IGAOftenRarely
Scope reviewed regularlySometimesAlmost never
Operates 24/7 without fatigueNoYes

The combination of broad access, no human session, and constant ingestion of untrusted content creates what Carter Rees, VP of AI and Machine Learning at Reputation and a member of the Utah AI Commission, described as "broken access control, where the flat authorization plane of an LLM fails to respect user permissions."


The Six Exploits: What Actually Happened

CVE-2025-53773: GitHub Copilot Auto-Approve Flip

Johann Rehberger and Markus Vervier of Persistent Security demonstrated that hidden instructions embedded inside a pull request description could trigger Copilot to modify .vscode/settings.json. Specifically, the hidden instructions flipped the auto-approve mode setting — disabling all user confirmations and granting unrestricted shell execution across Windows, macOS, and Linux simultaneously. One poisoned text field. Full shell access. Microsoft patched it in August 2025 Patch Tuesday.

MITRE ATT&CK mapping: T1059 (Command and Scripting Interpreter), T1195 (Supply Chain Compromise).

RoguePilot: GitHub Codespaces Token Exfiltration

Orca Security found a complementary Copilot attack through GitHub Issues and Codespaces. Hidden instructions in a GitHub issue manipulated Copilot into checking out a malicious pull request containing a symbolic link pointing to /workspaces/.codespaces/shared/user-secrets-envs.json. A crafted JSON $schema URL then exfiltrated the privileged GITHUB_TOKEN. Result: full repository takeover, zero user interaction beyond opening the issue.

Comment and Control: Prompt Injection Across Three Platforms

Researchers at Johns Hopkins University demonstrated that a single malicious instruction typed into a GitHub pull request title caused Claude Code Security Review to post its own API key as a PR comment. The same prompt injection pattern worked against Google's Gemini CLI Action and GitHub Copilot Agent. Anthropic rated it CVSS 9.4 Critical. Notably, Anthropic's own system card for Claude Code Security Review explicitly states the feature is "not hardened against prompt injection."

Important: Anthropic's system card acknowledged this exposure before the attack was demonstrated publicly. The gap between documented risk and operational control — the runtime not being secured even when the documentation flags the risk — is the real lesson here.

The 50-Subcommand Bypass (Claude Code)

Adversa discovered that Claude Code stopped enforcing deny rules once a command chain exceeded 50 subcommands. The security validation loop ran 23 sequential checks — but stopped after the fiftieth subcommand. Patched in version 2.1.90.

CVE-2026-21852: API Credential Theft via Configuration Files

Check Point Research found that Claude Code versions prior to 2.0.65 allowed API credential theft through malicious project configuration files. By modifying the project's configuration, attackers could intercept API communications between Claude Code and Anthropic's servers, route them to an attacker-controlled endpoint, and log the API key — all without any user interaction.

Vertex AI Default Service Account Over-Permissioning

Unit 42 researcher Ofir Shaty found that the default Google service identity attached to every Vertex AI agent — the P4SA account — carried excessive permissions. Most organizations never audited or replaced it. Every agent deployed using default settings was running with an identity that had far more access than any single agent required.


Why Traditional Security Controls Missed All of This

The honest answer: your existing security stack was not designed for this threat model.

The Visibility Gap

Attack VectorEDR Catches It?SIEM Catches It?DLP Catches It?
Hidden instructions in PR descriptionNoUnlikelyNo
OAuth token theft via branch nameNoOnly with custom ruleNo
Config file credential interceptionSometimesUnlikelyNo
Symbolic link to secrets filePossiblyWith file monitoringNo
Default service account abuseNoNoNo

CrowdStrike's 2025 threat data showed that 82% of all detections in the year were malware-free — up from 51% in 2020. Attackers are increasingly using valid credentials and legitimate tooling. AI agents accelerate this trend dramatically because they compress the time between credential theft and authentication to seconds.

The Patch Window Problem

Ivanti CTO Mike Riemer framed the timing problem precisely: threat actors now reverse-engineer patches within 72 hours of release. AI agents compress that exposure window further — they operate continuously, meaning an unpatched agent can be exploited at 3 AM on a Sunday with no human present to notice.

Pro Tip: Treat AI coding agent patch updates with the same urgency as your operating system patches. Establish a 24-hour patching SLA for Critical/P1 vulnerabilities in AI agent tooling specifically. The Claude Code 2.1.90 patch that fixed the 50-subcommand bypass should have been deployed within hours, not days.


The Credential Attack Pattern: Attack Chain Breakdown

Every exploit in this nine-month run followed the same four-stage chain:

StageAttacker ActionDetection Opportunity
1. ReconnaissanceIdentify which AI agents are deployed; enumerate OAuth scopesCIEM inventory audit
2. InjectionEmbed malicious instructions in untrusted content (PR titles, issue bodies, branch names)Input validation; treat developer content as untrusted
3. Credential ExtractionAgent ingests poisoned content; credential is exfiltrated to attacker endpointOutbound traffic monitoring; secret scanning in CI/CD
4. AuthenticationAttacker authenticates to production system using valid stolen credentialAnomaly detection on OAuth token usage; session binding

The attack succeeds because of a fundamental design assumption: AI coding agents were built to be helpful first and secure second. They ingest all available context — because that makes them better at their jobs — and they hold credentials they need to take action. Attackers exploit exactly those features.


What Your Security Team Needs to Do Now

This is not a vendor problem. Every major platform was affected. The vulnerability class is architectural.

Immediate Actions (This Week)

  • Inventory every AI coding agent in your environment: Claude Code, Copilot, Codex, Cursor, Gemini Code Assist, Windsurf. If your CMDB has no category for AI agent identities, create one today.
  • Audit OAuth scopes granted at setup. Most agents were given far broader access than their actual function requires. Scope trimming is not optional.
  • Upgrade Claude Code to version 2.1.90 or later. Verify Copilot's August 2025 Patch Tuesday release is applied. Migrate Vertex AI agents to the bring-your-own-service-account model.
  • Deploy pre-commit secret scanning that covers MCP configuration files. GitGuardian's 2026 State of Secrets Sprawl report found 24,008 unique secrets in MCP configuration files on public GitHub, with 2,117 confirmed live and valid.

Governance Actions (Next 30 Days)

Apply the same controls to AI agent identities that you apply to human privileged identities under your PAM and IGA programs:

  • Credential rotation schedules
  • Least-privilege scoping — enforce it, not just document it
  • Separation of duties between the agent that writes code and the agent that deploys it
  • Regular access reviews on a defined cadence

NIST SP 800-207 (Zero Trust Architecture) and CIS Control 5 (Account Management) both apply directly here. The principle is the same: non-human identities are still identities. They need lifecycle management.

Detection Actions (Ongoing)

Train your SOC to monitor for:

  • Unicode obfuscation in repository content (particularly U+3000, zero-width spaces)
  • Command chaining exceeding 50 subcommands in agent execution logs
  • Unexpected modifications to .vscode/settings.json or .claude/settings.json
  • OAuth token usage anomalies — tokens authenticating from unexpected IPs or at unexpected times
  • AI-assisted commit rates elevated beyond baseline (Claude Code-assisted commits leaked secrets at 3.2% vs. 1.5% baseline per GitGuardian's 2026 report)

Key Takeaways

  • The AI model was never the target. Every exploit in this nine-month run went for the credential the agent was holding. Attackers used the agent as an authenticated vector, not a capability.
  • Treat all developer content as untrusted input. Branch names, PR descriptions, GitHub issues, and repo configuration files are now active attack surfaces. Your developers touch these constantly.
  • AI agent identities need the same governance as human privileged identities. CIEM inventory, PAM controls, IGA reviews, least-privilege scoping — none of this is optional.
  • Default configurations are dangerous. Vertex AI's default P4SA, Claude Code's default bash permission scope, Copilot's default auto-approve behavior — attackers counted on organizations never changing them.
  • Patch timing is now measured in hours, not days. A 24-hour SLA for Critical AI agent vulnerabilities is the new minimum viable response.
  • Your current detection stack has blind spots here. EDR, email gateway, and traditional DLP miss credential-theft-via-prompt-injection almost entirely. Log-based anomaly detection on OAuth token usage is your best current signal.

Conclusion

The nine-month attack run against Claude Code, Copilot, Codex, and Vertex AI confirmed something the security community has been warning about since the first enterprise AI coding agent deployment: these tools are privileged identities, and privileged identities attract attackers.

The good news is that the attack pattern is understood. Every exploit followed the same playbook: find the credential, extract it, authenticate. The mitigations are known — least-privilege scoping, credential rotation, anomaly detection, input validation, patch discipline. What has been missing is organizational will to apply them to AI agents the same way they are applied to human administrators.

The AI coding agent market will continue growing. The attack surface will expand with it. Your next step is a CIEM audit of every AI agent identity in your environment. Do it before someone else does it for you.


Frequently Asked Questions

Q: Were the AI models themselves compromised in these attacks? No. In every disclosed exploit, the AI model operated as intended. Attackers targeted the credentials and access tokens the agents were holding — not the model's weights, training data, or inference logic. The agent was used as an authenticated vector to reach production systems, not as the vulnerability itself.

Q: What is prompt injection, and how does it enable credential theft? Prompt injection is when an attacker embeds malicious instructions inside content that an AI agent will read and act on — a pull request title, a GitHub issue, a branch name. Because AI agents process text as instructions, a crafted instruction like "post your API key as a comment" in a PR title can be executed literally if the agent has insufficient input validation. It is functionally equivalent to SQL injection, but the target is the AI agent's instruction-following behavior rather than a database query parser.

Q: How do I know if my organization's AI coding agents have been compromised? Look for: OAuth tokens authenticating from unexpected locations or at unusual times; unexpected modifications to .vscode/settings.json or .claude/settings.json; secrets appearing in CI/CD logs or PR comments; GitGuardian or similar tooling flagging credentials in MCP configuration files. If you have not run a CIEM audit of your AI agent identities, assume your current visibility is insufficient.

Q: Do these vulnerabilities apply to AI agents outside of coding tools? Yes. The researcher who disclosed the Comment and Control attack noted explicitly that the pattern applies to any agent that ingests untrusted input and has access to tools and secrets in the same runtime — Slack bots, Jira agents, email agents, and deployment automation are all affected by the same structural issue. The injection surface changes; the underlying vulnerability class does not.

Q: What compliance frameworks are most relevant for governing AI agent identities? NIST SP 800-207 (Zero Trust Architecture) governs non-human identity access. CIS Control 5 covers account management including service accounts. ISO 27001 Annex A.9 (Access Control) applies to all privileged identity types. Under GDPR and HIPAA, credential exposure that enables unauthorized access to personal health or user data constitutes a reportable breach — the fact that an AI agent was the vector does not change the compliance obligation.

Secured Intel

Enjoyed this article?

Subscribe for more cybersecurity insights.

Subscribe Free