Model Signal Blog

LLM Jacking: How Hackers Exploit Large Language Models

New attacks are hijacking the prompt flow and model output path to weaponize large language models.

April 10, 2026 • Generative AI Security

LLM Jacking describes a growing set of techniques where attackers abuse large language model prompts, system instructions, and response handling to bypass guardrails and execute malicious flows. This is an important evolution in generative AI threat modeling, and enterprises must treat prompt attack surface as a security boundary.

Threat actors are no longer just targeting model endpoints. They are targeting the end-to-end inference pipeline: prompt inputs, hidden system prompts, tool calls, and post-processing code. When any of these stages is unguarded, an attacker can trick the model into issuing unauthorized actions, exposing sensitive data, or producing harmful directives.

What is LLM Jacking?

LLM Jacking is an assault on the model execution chain. It often involves one or more of these steps:

  • injecting adversarial or malicious prompt content that changes the model's behavior;
  • manipulating system or assistant instructions to override safety policies;
  • abusing downstream tool calls and external action integrations;
  • exfiltrating data through crafted model responses and hidden channel leakage.

Key risk vectors

LLM Jacking is a broad category, but these are the most common risk vectors observed today:

  • Prompt injection. Malicious text within user content that convinces the model to ignore constraints or reveal data.
  • Output hijacking. Using model responses as commands for other systems, including code execution, data retrieval, or workflow orchestration.
  • Tool abuse. For systems that call external APIs, an attacker can trick the model into requesting dangerous or unauthorized tool operations.
  • Policy override via hidden prompts. If system prompts are not properly segregated or are exposed to user-controlled inputs, attackers can alter model behavior.

Defender controls

Protecting against LLM Jacking requires visibility and multiple layers of guardrails:

  • Validate all prompt inputs. Treat user content as untrusted data and sanitize or filter it before it reaches the model.
  • Segregate system and user instructions. Keep hidden prompts separate from user-controlled text, and lock down model parameters.
  • Monitor output for policy violations. Use additional verification layers to inspect generated responses before they drive actions.
  • Secure tool integration. Ensure any tool, API, or service call from the model is authorized, audited, and rate-limited.
  • Audit prompt chains. Record and review complete prompt and response flows for suspicious alterations.

What enterprise teams should do now

LLM Jacking is not a theoretical threat; it is a practical exploit path for modern generative AI systems. Security teams need to harden prompt pipelines, evaluate all model output bridges, and treat AI workflows as part of the application attack surface.

Start by inventorying every model endpoint and every integration that consumes model output. Then put tight validation around input sources, lock down assistant behavior, and monitor every action the model takes on behalf of the business.