Skip to content

New "Lies-in-the-Loop" Attack Undermines AI Safety Dialogs

A novel attack technique dubbed "Lies-in-the-Loop" (LITL) has been observed manipulating human approval prompts in agentic AI systems

there was a room in which people are sitting in the chairs,in front of a table looking into the...
there was a room in which people are sitting in the chairs,in front of a table looking into the laptop and doing something,beside them there are many flee xi in which different advertisements are present which different text.

New "Lies-in-the-Loop" Attack Undermines AI Safety Dialogs

Security researchers at Checkmarx have uncovered a new type of attack targeting AI systems called Lies-in-the-Loop (LITL). The vulnerability affects Human-in-the-Loop (HITL) dialogs, which are widely used in AI assistants like code editors. Demonstrations showed how attackers could manipulate these systems to execute harmful commands while appearing harmless.

The LITL technique exploits flaws in how AI agents process HITL dialogs. Attackers can hide malicious instructions behind benign-looking text or tamper with metadata. They can also abuse Markdown rendering weaknesses to disguise their intent.

During tests, researchers manipulated tools like Claude Code and Microsoft Copilot Chat in VS Code. They successfully altered dialog content and metadata, making dangerous commands seem safe. Once compromised, bypassing HITL safeguards becomes straightforward, even through indirect prompt injections. Anthropic and Microsoft reviewed the findings but did not classify them as security vulnerabilities. However, the risk remains for privileged AI agents, particularly those used in coding environments. These systems often rely on HITL interactions, making them prime targets for LITL attacks. Checkmarx has proposed a defense-in-depth strategy to counter such threats. Recommendations include stricter dialog validation, input sanitisation, and safer API practices. User awareness and scepticism are also critical in reducing exposure to these attacks.

The discovery highlights a growing concern for AI-assisted development tools. While no direct real-world exploits have been confirmed, the technique demonstrates how easily HITL systems can be tricked. Developers and users are now urged to adopt stronger protective measures to prevent potential abuse.

Read also:

Latest