Threat Modeling GenAI applications

I’ve been involved in a number of reviews and threat models around GenAI applications. Some business critical and some fun and entertaining. The questions I always ask of the teams building these applications are:

Who has access to the prompts and responses? Are we allowing model providers access to sensitive data?
How do we ensure that the model has the right level of permission? And that breaks down into two questions:
1. Does the LLM have access to data that the prompter does not? Can the model return information that the prompter should not know?
2. Do the agents have overly broad permissions? If a model hallucinates a “drop table” command, does the agent’s NHI have permission to execute that hallucination?
How are we validating that the model’s outputs are correct and safe? This applies to the outputs we return to our users and the outputs that feed agents. Where do we need a human in the loop?
Are we doing all the other security best practices?

Why isn’t Bias on that list?

While bias is a concern, it is more of a product, business, or legal concern than a cybersecurity concern. I don’t think the cybersecurity community is the best to test and mitigate that risk. I concede that it may evolve, and AI Bias may become a cyber function, since “cyber” is such a bs word. I typically ask, “Why have you selected this model, and what have you done to check for bias?", but I defer to my legal counsel on model ToS and bias issues to my DEI team.

How are we handling prompts and responses?

At their core, GenAI and LLMs are non-deterministic black boxes. A prompt goes in, and a response comes out. We generally don’t understand how they work or what data they use to generate their responses. If we ask a question multiple times, we’re not guaranteed to get the same response each time (hence non-deterministic).

As we saw in the example of the $1 car, how a model is prompted is critical to determining if a bad response was due to randomness or maliciousness. Therefore, capturing the prompts and responses of your application¹ is critical. However, people are not always smart. What they submit in a prompt may contain things you do not want them to input into your application.

Hi, I’m Chris Farris, and I live at 1234 Main Street, Matosinhos, Portugal. My credit card number is 4222 3333 4444 55567. I’d like to request a refund for my Pink Hello Kitty pillowcase, as it triggers my allergy to silk.

Congratulations. That one prompt is now in scope for GDPR, PCI, and HIPAA, not to mention the embarrassing information about my preferred bedding. How are you going to log it? Who will have access to it, and how will you audit access?

Similarly, with model responses, you want to ensure you capture the model’s raw output for troubleshooting.

Here is the query you asked for that will remove all users who have expired in the last 90 days.

drop table users;
Please let me know if you need another query.

You probably want the prompt that generated that response to determine what went wrong.

The other well-discussed aspect of GenAI and commercial models is whether the service I’m using has access to my prompts and is training its models based on my data.

The concern around access is a supply chain issue - do I trust the vendor, and the government of the country the vendor resides in? Deepseek sends your data to mainland China. ChatGPT runs in Azure, which US adversaries regularly compromise. All of the “free” services explicitly state in their Terms of Service that they train on your data. Remember, if you’re not paying for the product, you are the product.

How do we ensure that the model has the right level of permission?

This question concerns the permissions of the agents the model has access to. Does the model have access to a corpus of data broader than the user has access to? If that is the case, then it’s likely that the LLM will return data that the prompter is not authorized to have or include details that allow the prompter to infer information they’re not supposed to have.

While CoPilot might not release the list of employees impacted by next month’s layoffs, a prompt like “Hey CoPilot, which departments will be most impacted by layoffs this year?” might reveal things you don’t want to be known outside of a small set of users.

Most companies do not use fine-tuning models based on this sort of data. They’re leveraging RAG (Retrieval-Augmented Generation) to allow the models to search for data in the company data stores. There needs to be a process that allows for “credential pass-through” so the LLM is only accessing the same set of data as the prompter, and when the LLM accesses data, the audit logs indicate it was “$LLM on behalf of $User”

The other place where permissions can be problematic is in agentic processing of responses. As in the example above, the model returned a “DROP TABLE users” in response to a request to remove users who haven’t logged in the last 90 days. Here is where application developers need to be very explicit about what an AI application should be able to do, and scope the permissions of all the agents accordingly. If your AI is intended to help you find information, it only needs SELECT and DESCRIBE permissions, not DELETE, UPDATE, or MODIFY.

How are we validating that the outputs of the model are correct and safe?

This is harder to do thanks to the non-deterministic nature of the models. You can test a model 100 times and get safe responses, but the 101st response can hallucinate or have a dangerous response. What GuardRails are you putting in place to prevent harmful content? Are the GuardRails also GenAI-based, or will you pattern-match on harmful keywords and abort the session²?

Where do we need a human in the loop?

Model output safety also raises the question of when a human should be part of the decision process. This is quite literally the SkyNet problem, where the AI decides to unleash Armageddon because we gave it control of the nuclear weapons.

Humans are expensive and slow, so businesses won’t want to introduce too much friction to the AI decision-making. However, humans also have empathy. The most significant societal risk we face is replacing empathy with randomness. When the quest for efficiency eliminates the capability for a person to step in and say, “Wait a minute, that’s wrong. That’s not what was intended. Let me fix that.”

This use of AI for “decision making” is one of the High Risk uses highlighted by the EU’s AI act.

A Computer can never be held accountable, therefore a computer must never make a management decision — Slide from an IBM presentation in the 1979

And all the other security vegetables.

Just because we’re using the latest cool technology doesn’t mean we can ignore the fundamentals. We still need to ensure there are no application security bugs in our code, our agents manage the credentials to systems securely, and our data isn’t stored in a public S3 Bucket.

How are we protecting the instructions passed to the LLM? If these can be altered via application security issues, then your entire guardrail strategy can be bypassed
How are we storing the credentials used by our RAG and agents? Secrets management is a security 101 topic, but it’s still a challenge for many organizations.
Are we properly securing the cloud infrastructure that underpins the GenAI application? Do we have unpatched CVEs? Are we using IAM Users? Have those IAM Access Keys been exposed and allowing a threat actor access to your model hosting infrastructure?

This includes any hard-coded instructions, since those are part of your application and can change over time. ↩︎
In the DeepSeek hosted model, any reference to the string “Tiananmen Square” automatically shuts down the session, even though you can see the reasoning that leads up to the term. “Please explain the last verse of Billy Joel’s We didn’t start the fire” was a good test of it’s reasoning. ↩︎