Before you approve a vendor's AI agent, ask these seven questions

Jason Holloway
ai-agent-security agentic-ai prompt-injection vcairo ai-governance

A useful guide circulating among developers walks through building an AI agent: goal setting, model selection, frameworks, memory, tool integration, context management and testing. It is well-structured and accurate.

It was written for the people deploying the agent, not the people accountable when it goes wrong. AI agent security is the discipline of closing that gap. This post covers the same seven stages from the other side of the table, because each one carries a governance decision your risk function should own. It is general guidance, not legal advice.

Stage 1: Goal definition is a governance question

Every responsible build starts with a precise statement of purpose: what the agent solves, what it can access, what it can trigger. The security question is whether that statement ever leaves the development team.

Does your organisation maintain a register of deployed AI agents, the scope each is authorised to operate within and a named individual accountable for each one? In most mid-sized organisations we speak to, it does not. Agents built on loose objectives accumulate permissions over time. What was scoped as “answer HR queries” quietly becomes “access the HR system, draft responses, send emails and log decisions.” Each step felt proportionate to the developer, and none of it reached your risk function. The accountability gap opens before the first line of code is written.

Stage 2: Model selection determines whose infrastructure your data touches

Developers choose between large reasoning models for complex tasks, general-purpose language models for conversation, and smaller models for routing and classification. For your risk function, the relevant questions are jurisdiction, data residency and contractual terms, not benchmark performance.

Where is inference happening? What retention policy applies to prompts and completions? Is your data used to train future model versions? These are the questions that appear in FCA operational resilience assessments, ICO investigations and supplier due diligence reviews. A vendor who needs a follow-up call with their legal team to answer them has just given you a finding.

Stage 3: The framework is your AI supply chain

Most agents are assembled using orchestration frameworks such as LangChain, CrewAI or n8n rather than built from first principles. These open-source components handle the scaffolding between model, memory and tools, and they bring dependencies, known vulnerabilities and update cycles with them.

This is your AI supply chain, and it usually arrives without a patching process attached. Ask what frameworks underpin the agent, what the patching cadence is for those components, whether a software composition analysis has been done and whether the infrastructure is documented well enough for your team to assess during an incident. Most organisations apply rigorous third-party assurance to SaaS software. AI agent infrastructure rarely receives the same treatment, and it should.

Stage 4: Memory is a data store. Treat it like one.

This is the stage that surprises compliance teams most consistently. Modern agents use persistent memory: previous interactions, user preferences, historical decisions and accumulated context. Memory is what lets an agent personalise responses and complete long-running tasks, and it means the agent is storing data, potentially sensitive data, somewhere on your estate or your vendor’s.

Where is that memory held? Is it encrypted at rest and in transit? What is the retention period, and can records be deleted on request under UK GDPR Article 17? Does the store contain personal data, has a DPIA been completed, and is it recorded in your data asset inventory? When a vendor describes their agent as having “episodic memory” or “long-term context retention,” your DPO needs to be in that conversation. Memory is a data store, and data stores carry obligations.

Stage 5: Every tool integration is an attack surface

An agent without tool connections is functionally a chatbot. The capability, and the risk, arrives when it connects to CRM platforms, email, databases, APIs and analytics tools, the same exposure we cover for MCP server deployments.

Prompt injection is ranked LLM01:2025, the single most critical vulnerability class in AI applications, by OWASP. In an agentic context, the attack works by hiding instructions in content the agent reads: a webpage, a document, an email. This is no longer theoretical. In December 2025, Palo Alto Networks Unit 42 reported the first confirmed real-world case of malicious indirect prompt injection designed to bypass an AI-based review system. Google, scanning billions of public web pages, recorded a 32 percent relative increase in malicious indirect prompt injection between November 2025 and February 2026, and some payloads found in the wild contained fully specified PayPal transaction instructions hidden in ordinary HTML, aimed at agents with payment capabilities.

A compromised agent can exfiltrate data, send unauthorised communications or manipulate records, with every action appearing to originate from a legitimate, credentialled system. The principle of least privilege applies exactly as it does to human users: an agent should hold access only to what its defined purpose strictly requires, not everything the developer found convenient to connect. Ask what tools the agent can reach, what permission scope each integration uses and whether access is logged and reviewed.

Stage 6: Context management hides two risks most teams have not considered

Agents manage a context window, the information held in working memory at any moment. As tasks grow more complex, developers apply summarisation and compression to control cost and latency. Two risks follow, and neither is widely discussed in security circles.

The first is cross-session leakage: when context, cache or memory state crosses between user sessions, your authentication and authorisation controls stop being relevant, and an attacker can reach data they were never granted. In multi-tenant deployments where one agent serves many users, weak isolation can surface one user’s information inside another’s session. The output is accurate; it simply belongs to the wrong person.

The second is accuracy degradation. Aggressive summarisation strips nuance, and an agent reasoning over a compressed view of a complex situation can produce an incomplete recommendation. In regulated sectors, that output carries consequences. Neither risk argues against AI agents. Both argue for asking your vendor how sessions are isolated, how compression is calibrated and what testing has been done at the boundaries.

Stage 7: Testing is the stage most vendors rush

The developer guide behind this post calls testing “the most overlooked stage of AI development.” That matches what we see. The frameworks for rigorous AI evaluation are still maturing, and unit testing, adversarial testing, red-teaming, hallucination measurement and behaviour-at-scope-boundary testing are frequently shallower than a regulated organisation should accept before sign-off.

Ask for three things at minimum: evidence the agent has been tested against adversarial inputs, a documented hallucination rate under typical operating conditions, and a defined process for what happens when the agent behaves outside its expected parameters. A vendor who cannot produce those three is not ready for your environment.

What this means for your risk function

AI agents are not going to slow down, so the real question is whether your governance keeps pace with deployment. The seven stages above are not only a developer’s checklist. They map to where your risk function needs to be present: scope definition, model due diligence, supply chain assurance, data protection, access control design, context architecture and ongoing evaluation. If those agents are being built inside your own organisation rather than bought in, the companion question is whether anyone can assure what your team has already built.

Questions compliance teams are asking us about AI agent deployments

If an AI agent makes a mistake that causes financial loss or a data breach, who owns that risk?

A named human role, assigned before deployment rather than identified after an incident. Vendors frequently sell agents as “autonomous” without defining a liability framework, which leaves the risk sitting nowhere until something goes wrong. Insist on an ownership map that ties every agent to a person responsible for its configuration, scope and approval. That accountability usually spans the CISO, CIO and compliance functions, so it has to be made explicit.

Do we have to approve an AI agent for full use on day one?

No, and you should not. The organisations deploying agents well treat approval as a progressive trust problem rather than a binary decision. Start with read-only actions against non-production data, confirm the credential handling, approval workflows and audit logging behave as expected, then expand scope on evidence rather than enthusiasm. Unlimited access on day one removes your ability to contain a problem before it reaches production systems.

Does deploying an AI agent trigger a DPIA?

Almost certainly, if the agent processes personal data, makes or informs decisions affecting individuals, or operates at scale. The memory and tool integration stages in particular tend to introduce personal data processing that was not considered when the agent was first scoped. Treat the DPIA as a deployment gate, not a retrospective formality.

What is the single most commonly missed control when approving an AI agent?

Comparing the agent’s declared permissions against the permissions it actually uses. Engineering files a change request listing the tools and data sources the agent needs, security reviews that declared list, and nobody checks it against observed behaviour. The gap between the two is often a factor of ten or more, and every unused declared permission is dormant attack surface that widens the blast radius if the agent is compromised.

If your organisation is being asked to approve an AI agent deployment and you are not confident you hold the right questions, our vCAIRO service provides the fractional AI risk oversight to make that assessment properly. Speak to our team about how we work.