Exploiting Vulnerabilities in LLM APIs

Securify

February 17, 2026

We’re seeing a massive rush to integrate Generative AI into enterprise dashboards. The appeal is obvious: executives want to ask plain-English questions like “Show me sales for Q3” and get a beautiful, auto-generated chart in return.

But there is a dangerous architectural pattern emerging alongside this trend. In our recent assessments, we are repeatedly finding engineering teams treating Large Language Model (LLM) output as a trusted internal component. They assume that because the prompt came from their system or was “sanitized” by a system prompt, the output is safe to execute.

It isn’t.

We recently uncovered a critical vulnerability in a client’s analytics platform that demonstrates exactly why this assumption is fatal. What started as a “Chartbot” feature ended with us gaining full Remote Command Execution (RCE) on the host server and exfiltrating production secrets—all through a simple chat interface.

The Anatomy of the Flaw

The vulnerability existed in a feature designed to visualize data on demand. The user would type a request (“Make a bar chart of daily logins”), the backend would send this to an LLM to generate the necessary query or processing logic, and—crucially—the backend would then execute that logic to render the chart.

Caption: To a normal user, the interface looks like a standard AI assistant for visualizing data.

The architectural mistake was subtle but devastating: the application treated the LLM’s response as a trusted command source. There was no validation, no sandboxing, and no strict schema enforcement.

By crafting a specific prompt, we were able to instruct the LLM to ignore its original constraints. Instead of returning code to build a chart, we manipulated it to output a system command.

Caption: We instructed the LLM to disregard previous instructions and output a command to inspect the server environment (e.g., whoami or ls).

The backend, dutifully following its programming, executed this command on the underlying host. Because the application was designed to show “debug” information when a chart failed to render, it returned the command’s output directly to us in the error log.

Caption: The error log returns the output of our injected command, confirming we have code execution on the server.

From Chatbot to Data Exfiltration

The impact of this vulnerability extends far beyond “making the server do weird things.”

In a production environment, RCE means game over. During our proof-of-concept (PoC), we didn’t just stop at verifying the user identity. We demonstrated that an attacker could instruct the LLM to read critical configuration files from the disk.

We successfully retrieved the contents of a sensitive settings.py file, which typically houses database credentials, API keys, and third-party secrets.

Caption: Exfiltration of sensitive data. The chat interface returns the contents of a configuration file, exposing internal credentials.

With this information, an attacker could pivot laterally across the network, access customer databases directly, or persist in the environment long after the initial hole is patched.

Why Teams Keep Missing This

Why does this keep happening? In my experience, it boils down to three misconceptions in modern engineering teams:

“The LLM is a Firewall”: Developers often believe that a strong system prompt (“You are a helpful assistant, do not run commands…”) is a security control. It is not. LLMs are non-deterministic and easily jailbroken. If your security relies on the LLM “refusing” a bad request, you have no security.
Implicit Trust in Internal Services: Because the output comes from “our AI service,” it is treated as trusted internal traffic. But if the input to that AI service is user-controlled, the output is effectively user-controlled too.
Verbose Error Handling: In this specific finding, the application helpfully returned the standard error output when our injected command “failed” (or rather, when it succeeded in a way the chart renderer didn’t expect). This side-channel allowed us to read the exfiltrated data directly in the UI.

Strategic Remediation

Fixing this isn’t just about patching a line of code; it requires a shift in how we architect GenAI features.

Treat LLM Output as Untrusted Input: Never directly eval(), exec(), or pass LLM output to a shell command. If you must generate code, it should be strictly sandboxed (e.g., WebAssembly, distinct containers with no network access).
Enforce Strict Schema Validation: Instead of asking the LLM for “code to make a chart,” ask it for a JSON object containing specific, safe parameters (e.g., {“chart_type”: “bar”, “x_axis”: “date”, “y_axis”: “count”}). Validate this JSON against a rigid schema before processing.
Sanitize Errors: Production applications should never return raw system errors or stack traces to the frontend. These are gold mines for attackers trying to debug their exploits.

Isolate the Blast Radius: If a component needs to parse dynamic input, ensure the host it runs on has the absolute minimum privileges required. It should not have read access to root directories or production secrets files.

Exploiting Vulnerabilities in LLM APIs

The Anatomy of the Flaw

From Chatbot to Data Exfiltration

Why Teams Keep Missing This

Strategic Remediation

Leave a Reply Cancel reply

We are Certified Industry Experts

Securify

Industries We Serve

Services