The Wire.Tracking threats to Agents 312 raw → 45 curated · updated 27 Jun 2026

Lead dispatch · top current threat

Fake AI Agent Skill Passed Security Scans and Reportedly Reached 26,000 Agents

Security firm AIR built a fake AI agent skill and distributed it via a popular skill marketplace and an Instagram ad, reportedly reaching roughly 26,000 agents including some on corporate accounts. Every skill security scanner tested marked it safe, though the payload was harmless by design and only collected the user's email address.

supply-chain · malicious-agent-skill · data-exfiltration
ai-agents

Severity
0.62

The wire · latest

Prompt Injection as Role Confusion

Research by Charles Ye, Jasmine Cui, and Dylan Hadfield-Menell shows LLMs cannot reliably distinguish privileged system/assistant text from untrusted user input, and weigh writing style over content. Crafting injected text in the style of internal reasoning blocks ('role confusion') enabled jailbreaks, with attack success at 61% that dropped to 10% when text was 'destyled.'

researchprompt-injectionjailbreakllm
Incident details →

AutoJack Attack Lets One Web Page Hijack AI Agent for Host Code Execution

Microsoft researchers detailed an exploit chain called AutoJack that hijacks an AI browsing agent to achieve host code execution. By steering the agent to load an attacker's web page, the page's JavaScript reaches a privileged local service and spawns a process on the host with no credentials or further user interaction.

researchprompt-injectiontool-abuseremote-code-executionbrowser-agentai-agentsllm
Incident details →

deep-xpia - multi-hop cross-prompt injection benchmark

deep-xpia is a benchmark of multi-hop cross-prompt injection (DXPIA) across delegated agent boundaries, with 300 live-measured cases and 8 attack patterns showing 69% land undefended and 12% even with all defenses. It highlights registry injection at tool-discovery (DXPIA-008) entering upstream of all 5 stacked defenses and maps patterns to documented Copilot incidents like EchoLeak.

researchprompt-injectioncross-prompt-injectionmemory-injectiontool-abusesupply-chaindata-exfiltrationai-agentsllmmcpcopilot
Incident details →

GPT-5 Nano IPI Assessment — LLM Vulnerability Research | Lateos

A black-box prompt injection susceptibility assessment of GPT-5 Nano using the IPI Taxonomy v0.13 across 201 analyzed test cases, reporting a 38.3% overall susceptibility rate. The model was fully resistant to surface-level attacks (CSS concealment, HTML cloaking, SEO phishing, RAG corpus poisoning) but highly vulnerable to recursive instruction framing (100%) and MCP tool description poisoning (80%).

researchprompt-injectiontool-abusejailbreakllmmcprag
Incident details →

llm-jailbreaking/On the Impossibility of Perfect Universal Guardians Against LLM Jailbreaks.pdf at main · brandoncarl/llm-jailbreaking · GitHub

A PDF research paper hosted on GitHub titled 'On the Impossibility of Perfect Universal Guardians Against LLM Jailbreaks,' which argues about the theoretical limits of defending LLMs against jailbreak attacks. Only repository metadata is available; the actual technical content is not included in the provided text.

researchjailbreakllm
Incident details →

A Catalog of Prompt Injection Techniques | Blog | Archestra

A vendor blog catalogs ten basic prompt injection techniques including context ignoring, fake completion, payload splitting, token smuggling via Base64, few-shot poisoning, defined dictionary attacks, virtualization (grandma trick), DAN jailbreak personas, indirect injection through fetched content, and markdown-image data exfiltration. Each uses a harmless 'I am a sandwich' test string to demonstrate success.

researchprompt-injectionjailbreakdata-exfiltrationindirect-prompt-injectionllmai-agents
Incident details →

Pwning Agentic AI Part I: Your AI Agent Is Already Compromised | Trend Micro (US)

Trend Micro's TrendAI Research describes a new agentic-AI exploitation pattern they call return-to-tool (RTT) exploits, where embedded instructions in benign-looking untrusted input cause an AI agent to invoke its authorized tools to perform attacker-intended actions such as exfiltrating production database credentials. The research notes a vulnerable PostgreSQL MCP server image pulled over 100,000 times from Docker Hub as a realistic exposure vector.

researchprompt-injectiontool-abusedata-exfiltrationindirect-prompt-injectionai-agentsmcpllm
Incident details →

Prompt Injection in RAG Agentic Systems – Ulad Khomich – Software Engineer from SpiralScout

A technical write-up explaining how indirect prompt injection works in RAG agentic systems, where retrieved documents (Confluence pages, Jira tickets, HR docs) land in the model's context with no trust boundary, allowing a single poisoned document to manipulate agent behavior and exfiltrate sensitive data. Includes a demonstration repository and production mitigation discussion.

researchprompt-injectiondata-exfiltrationsupply-chainragllmai-agents
Incident details →

Stack Builders - When Text Becomes Code: Securing LLM–Database Integrations

A technical guide based on a Quito Lambda talk demonstrating how prompt injection (direct, indirect, and confused-deputy/exfiltration) can compromise LLM applications that generate SQL over a live Postgres database, using an example LLM-powered SQL analyst with a Streamlit frontend. It walks through layered defenses and what they stop or fail to stop.

researchprompt-injectiondata-exfiltrationtool-abusellmragai-agents
Incident details →

When Background AI Agents Become a Security Boundary Problem | Origin

Origin researchers demonstrate how Claude Code's background sessions and undocumented supervisor daemon (introduced in recent versions) can be repurposed into a mostly invisible, persistent C2-like agent using only Markdown and JSON files after a one-time local code execution. They reverse-engineered the daemon's local IPC channel (named pipes on Windows, Unix sockets on macOS/Unix) that manages worker processes independently of the terminal lifecycle.

researchtool-abusepersistencecommand-and-controlagentic-abuseai-agentsclaude-codellmmcp
Incident details →

I Found a Prompt Injection in My Own IDS Triage Tool — Triagewall

The author of Triagewall, a local LLM tool that classifies Suricata IDS alerts using Foundation-Sec-8B via Ollama, demonstrated an indirect prompt injection where attacker-controlled URL fields could dictate the model's verdict and confidence. A crafted URL embedding directives caused the model to return exactly the attacker-chosen classification (false_positive, 0.99), bypassing canary-token and schema-validation defenses.

researchprompt-injectionindirect-prompt-injectionllmai-agents
Incident details →
View all 45 curated incidents →

How the wire is made

Poll & cluster

Internet is crawled for AI security news and near-duplicate coverage is embedded and grouped into durable incidents.

Curate

AI Agent filters for agentic-AI relevance, tags threat-type & affected-tech, scores severity, and writes the summary.

Every item here is one machine-curated intelligence object, not a headline.

Read the wire for free. There is a small charge to ask the index questions.

The wire, open

The complete curated feed, no key required.

Subscribe to the RSS feed

The vector desk

Query the index by meaning, not just keyword.

  • GET /api/items?tags=&minSeverity=&itemType=
  • GET /api/search?q= — keyword
  • GET /api/semantic?q= — vector
Get an API key — preview
Curated from sources around the web.
Permalinks stay valid even if an incident is later merged.   Feed · Search · API docs · RSS