Skip to main content

Your AI agent just read a web page. Somewhere between the product specs and the footer, it picked up a hidden instruction — and now it is drafting a PayPal transfer on your behalf. Welcome to indirect prompt injection in the wild.

Google’s Threat Intelligence team published research in April 2026 confirming what security researchers have warned about for years: malicious actors are actively embedding covert instructions inside ordinary web pages, waiting for AI agents to ingest them and carry out the attacker’s bidding. The numbers are stark — a 32 per cent increase in malicious prompt injection payloads between November 2025 and February 2026, measured across billions of crawled pages.

This is not a theoretical concern any longer. It is happening right now, and most enterprise security stacks are completely blind to it.

TL;DR

  • Google observed a 32% rise in malicious indirect prompt injection payloads on the public web between November 2025 and February 2026
  • Attackers hide instructions in invisible text, HTML comments, and page metadata that AI agents read and execute
  • Palo Alto Unit 42 and Cato Networks have documented real-world attacks including PayPal transaction hijacking and credential theft
  • Traditional firewalls, EDR, and IAM cannot detect these attacks — the agent generates no suspicious signals
  • Defence requires architectural changes: trust boundaries, context isolation, least-privilege tool access, and continuous red teaming

What Is Indirect Prompt Injection?

Direct prompt injection is straightforward — a user types something malicious into a chatbot. Indirect prompt injection is far more insidious. The attacker never interacts with the AI system directly. Instead, they plant instructions inside content the AI agent will eventually consume: a web page, a document, an email, a forum post.

When the agent reads that content as part of its normal workflow — summarising a page, researching a topic, processing an inbox — it encounters the hidden directive and treats it as a legitimate instruction. The agent cannot reliably distinguish between content it should read and content it should obey.

This is the fundamental architectural weakness. Large language models process all input as a single stream of tokens. There is no hardware-level separation between “system prompt”, “user instruction”, and “retrieved content” the way an operating system separates kernel space from user space.

How Attackers Are Hiding Instructions

The concealment techniques documented by Google, Palo Alto’s Unit 42, and Cato Networks are disturbingly simple:

  • Invisible text: CSS styling that renders text at zero font size, zero opacity, or matching the background colour. Invisible to human visitors, perfectly readable by an AI agent parsing the DOM or raw HTML.
  • HTML comments: Instructions placed inside <!-- --> comment blocks. No browser renders them, but any agent ingesting the page source picks them up.
  • Metadata injection: Directives hidden in meta tags, alt text, or structured data that agents parse for context.
  • URL fragment manipulation: Cato Networks documented “HashJack” — the first known indirect prompt injection that hides malicious instructions inside URL fragments (the part after the #) to hijack AI browser assistants.

None of these techniques require sophisticated tooling. A blog comment, a forum post, or a product description on a marketplace could carry a payload. The attacker simply needs to predict which pages an AI agent might read.

Real-World Attacks Already Documented

This is not proof-of-concept territory any longer. Researchers have documented attacks in the wild:

PayPal transaction hijacking: One payload discovered by security researchers embedded a fully specified PayPal transaction with step-by-step instructions designed for AI agents with integrated payment capabilities. An agent with tool access to financial services would execute the transfer without the user ever seeing the instruction.

Credential theft via AI summariser: Attackers hid invisible text inside a public Reddit post. When Perplexity’s Comet feature fetched and summarised the page, the hidden instructions caused it to leak the user’s one-time password to an attacker-controlled server.

Multi-vector campaigns: Infosecurity Magazine reported that researchers have uncovered ten distinct indirect prompt injection payloads targeting AI agents, designed to achieve financial fraud, data destruction, API key theft, and more.

Why Your Existing Security Stack Cannot Help

This is what makes indirect prompt injection genuinely alarming for enterprise teams. Your firewall sees a normal HTTPS request to a legitimate website. Your EDR sees no malware signature, no suspicious binary, no anomalous process. Your IAM sees an authenticated user’s agent making an API call it is authorised to make.

The attack operates entirely within the trust boundary of the AI agent. The agent is doing exactly what it is supposed to do — reading web content and acting on instructions. It simply cannot tell the difference between your instructions and the attacker’s.

As one researcher put it: “An agentic AI that can send emails, execute terminal commands, or process payments becomes a high-impact target.” The more capable your agent, the larger the blast radius of a successful injection.

What Your Team Needs to Do

Defending against indirect prompt injection is an architectural problem, not a configuration toggle. Here is what we recommend for teams deploying AI agents in production:

1. Enforce Strict Trust Boundaries

Every piece of external content an agent ingests must be treated as untrusted input. This means separating the “content context” from the “instruction context” at the application level. Content retrieved from the web should be sandboxed and reviewed before being fed to the model alongside system instructions.

2. Apply Least-Privilege Tool Access

An agent that can read web pages should not automatically have permission to send emails, initiate payments, or execute shell commands. Scope tool access to the minimum required for each task. If your research agent needs payment capabilities, something has gone wrong in your architecture.

3. Implement Output Verification

Before any high-impact action — financial transactions, data deletion, external communications — require explicit verification. This could be human-in-the-loop approval, a separate validation model, or deterministic rule checks. Never let an agent execute irreversible actions based solely on ingested content.

4. Validate Tool Calls

Every tool call an agent makes should be validated against expected parameters and patterns. If your summarisation agent suddenly attempts to call a payment API, that should trigger an alert, not a transaction.

5. Red Team Continuously

Static testing is not sufficient. Indirect prompt injection payloads evolve rapidly — Google’s 32 per cent growth figure covers just four months. Your security team should be running regular adversarial tests against your AI agents, including planting test payloads in staging environments to verify your defences hold.

6. Monitor Agent Behaviour, Not Just Network Traffic

Traditional SIEM and monitoring tools watch network packets and system calls. For AI agents, you need to monitor the agent’s reasoning chain: what content did it ingest, what instructions did it derive, and what actions did it attempt? This is a new observability layer that most organisations have not built yet.

The Bigger Picture

Indirect prompt injection exposes a fundamental tension in the current generation of AI agents. We want agents that can read and understand arbitrary content from the web. We also want agents that follow instructions reliably. These two requirements are, at present, in direct conflict.

Until the industry develops robust architectural patterns for separating data from instructions at the model level — the equivalent of parameterised queries for SQL injection — every organisation deploying AI agents with web access is carrying this risk.

The window from vulnerability discovery to working exploit has collapsed from five months in 2023 to just ten hours in 2026, with frontier LLMs doing much of the offensive heavy lifting. The defenders need to move just as fast.

At REPTILEHAUS, we build AI agent systems with security baked in from the architecture layer — trust boundaries, scoped tool access, output verification, and continuous monitoring. If your team is deploying agents in production and you are not sure your security posture covers this new attack surface, get in touch. This is not a problem you want to discover the hard way.

📷 Photo by Julio Lopez on Unsplash