SaaS vs VPC vs on-prem for regulated GenAI: reference architecture & checklist

Most vendor security reviews of GenAI platforms stall in the same place. The CISO has a control questionnaire built for SaaS analytics tools. The platform engineering lead has a network diagram built for a private model gateway. The AI risk officer has a checklist mapped to ISO 42001 and the EU AI Act. Three artifacts, one workload, and no shared frame for deciding which deployment shape — multi-tenant SaaS, single-tenant SaaS, customer VPC, or on-prem — actually clears procurement.

We treat deployment shape as the upstream decision because everything downstream depends on it: where data physically sits, who holds the keys, whether the platform can enforce policy at the request boundary or only watch it, and which standards control families the vendor can actually evidence rather than gesture at. This guide walks the four shapes through residency, training-data posture, identity and key custody, audit retention, and runtime policy enforcement, and maps each procurement dimension to specific clauses in ISO/IEC 42001, the NIST AI Risk Management Framework, the EU AI Act, the HIPAA Security Rule, and the ENISA Multilayer Framework for AI. The closing procurement checklist is the artifact to run during the actual review.

TL;DR for a vendor security review

Multi-tenant SaaS is acceptable for GenAI workloads that do not touch regulated data, are not classified high-risk under EU AI Act Annex III intended-purpose categories, and can absorb the tenant-isolation boundary the vendor offers. Single-tenant SaaS clears more residency and isolation objections but leaves key custody and audit-log retention dependent on the vendor's control plane. Customer-VPC deployments move the data plane inside the buyer's network boundary, which is what makes runtime policy enforcement and BAA scope realistic. On-prem and air-gapped become the answer where residency, training-data posture, or contractual constraints rule out any vendor-operated plane.

Reviewers — CISOs, AI risk officers, platform engineers — should use this article alongside the procurement checklist further down. Every checklist row maps to a primary-source control family so the line item is defensible inside the buyer's own governance review.

Why deployment shape decides every other AI governance decision

The familiar framing — "pick the vendor with the best controls" — treats deployment shape as a footnote. In practice it is the upstream variable that determines which controls are even available to evaluate. A SaaS-only governance vendor can observe a model, score outputs, and emit telemetry, but if the data plane never enters the customer's network it cannot block a prompt-injection payload at the request boundary. OWASP's LLM application risk taxonomy defines prompt injection as a vulnerability where user prompts alter model behavior in unintended ways; mitigating it requires inline interception, which is a property of where the gateway runs, not of how good the dashboards look.

The same logic applies to data residency. EU AI Act recital framing and HIPAA's CIA-triad obligation in the Security Rule general rules both assume the regulated entity can demonstrate end-to-end custody of in-scope data. A vendor that proxies inference through a multi-tenant data plane in a region the buyer cannot constrain has, by construction, conceded part of that custody. No amount of monitoring closes the gap.

We see two failure modes repeat in regulated procurement. The first is SaaS-only governance vendors arriving with an ISO 42001 alignment story and a control questionnaire built for shared infrastructure, then losing the deal at the data-plane-residency line. The second is VPC-only gateways arriving with a runtime policy story and no governance loop, then losing the deal at the AI risk register line. The platforms that clear both lanes ship a runtime data plane the buyer can place inside their own boundary and a governance plane that maps to the buyer's standards stack — ABV's GenAI Risk Protection surface is positioned exactly there.

The four deployment shapes

We anchor the comparison on five dimensions: data-plane location, key custody, training-data posture, audit-log retention, and the realistic runtime control surface.

Multi-tenant SaaS

Multi-tenant SaaS runs every customer's inference traffic through shared infrastructure inside the vendor's environment. Tenant isolation is logical — namespace, IAM, encryption-context separation — and the vendor controls the data plane end to end. For non-regulated GenAI workloads with permissive training-on-customer-data defaults, this shape is the default and is fine.

It becomes disqualifying in two cases. First, when the workload handles ePHI: the HIPAA Security Rule places confidentiality, integrity, and availability of ePHI on the regulated entity, and a shared data plane without a BAA-bound region forces the buyer to inherit a tenancy boundary they cannot independently audit. Second, when the workload sits in an EU AI Act high-risk class. Recital 52 of the Act classifies stand-alone systems as high-risk when their intended purpose poses high risk to health, safety, or fundamental rights; that label triggers conformity-assessment obligations on both provider and deployer that a multi-tenant substrate complicates rather than satisfies.

Single-tenant SaaS

Single-tenant SaaS isolates the data plane to a customer-specific instance inside the vendor's cloud. It typically resolves region-of-residency objections, removes noisy-neighbor exposure on the inference path, and lets the vendor sign region-bound contracts. Key custody usually still sits with the vendor; audit-log retention is whatever the control plane offers; the runtime policy boundary is wherever the vendor places it.

This shape clears most procurement reviews that fail multi-tenant on residency or noisy-neighbor grounds. It does not by itself satisfy buyers who need BYO key custody, full audit-log export under their retention SLO, or runtime policy enforcement inside their network boundary. We treat single-tenant SaaS as the right shape when the buyer's gating constraints are residency and tenancy, not custody and runtime control.

Customer VPC

A customer-VPC deployment puts the data plane inside the buyer's cloud account. The vendor delivers the gateway, runtime policy engine, and observability collectors as workloads the buyer operates; the control plane (configuration, evaluation dashboards, policy authoring) stays with the vendor. This is the shape that makes BYO KMS keys realistic, lets the buyer set retention on inference logs to match their own SLOs, and puts runtime policy enforcement at the request boundary the buyer already controls.

The tradeoffs are operational. The buyer inherits patching cadence, capacity planning, and incident-response handoffs that a SaaS shape absorbs. Identity stops being "the vendor's SSO integration" and becomes the buyer's existing IAM, which removes a class of audit findings but adds a class of integration work. For regulated AI workloads that need a defensible answer to "where does our data physically live and who can touch it," customer VPC is usually the shape that survives the security review without compromise.

On-prem and air-gapped

On-prem deployment runs the entire stack — data plane, control plane, model weights — on customer-owned infrastructure inside the customer data center. Air-gapped variants remove the network bridge to the vendor entirely, with model updates and policy bundles arriving through controlled out-of-band channels.

This shape is correct when residency rules out any vendor-operated plane, when contractual constraints prohibit egress of inference traffic, or when the workload sits inside a classified or critical-infrastructure boundary that cannot transit a third-party network. It is not automatically safer. ISO/IEC 42001 is a management-system standard governing the Plan-Do-Check-Act loop around AI policies and procedures; it applies inside the customer perimeter exactly as it does inside a vendor's. The OWASP LLM risk taxonomy applies the same way. What changes is who operates the model-update story, who carries the cost of the GPU footprint, and who is on the hook when a runtime policy needs to be revised after a new attack pattern shows up in production. We treat on-prem as a control trade — residency and custody bought at the price of operating burden — not as a default.

Shape comparison at a glance

Four-column comparison of multi-tenant SaaS, single-tenant SaaS, customer VPC, and on-prem GenAI deployment shapes against data-plane location, key custody, training defaults, audit retention, and runtime policy boundary. — Deployment shapes at a glance: where the data plane sits, who holds the keys, and where the runtime policy boundary actually lives.

Dimension	Multi-tenant SaaS	Single-tenant SaaS	Customer VPC	On-prem / air-gapped
Data-plane location	Vendor shared	Vendor isolated	Customer cloud account	Customer data center
Key custody	Vendor-managed	Vendor-managed (BYO key options)	Customer KMS realistic	Customer-only
Training on customer data	Vendor default	Vendor default (often disabled)	Buyer-set	Buyer-set
Audit-log retention	Vendor SLO	Vendor SLO	Buyer SLO	Buyer SLO
Runtime policy boundary	Vendor plane	Vendor plane	Buyer network boundary	Buyer perimeter
BAA realistic	Region-dependent	Region-dependent	Yes	Yes
EU AI Act high-risk fit	Limited	Conditional	Yes	Yes

Hierarchy diagram mapping six procurement dimensions — residency, training defaults, identity, audit, latency, operating model — to their ISO 42001, NIST AI RMF, EU AI Act, ENISA, and HIPAA control families. — Procurement dimensions mapped to their primary-source control families.

Procurement decision dimensions

Each dimension below is a column on the procurement checklist and a question the CISO should be able to defend inside their own governance review. We pair every column with the control family that justifies the question.

Data residency and sovereignty

The reviewer needs the regional location of the data plane, the control plane, and any sub-processors, plus the encryption-at-rest and encryption-in-transit posture. ABV documents regional deployment across US (Virginia), EU (Ireland), and dedicated HIPAA regions on its security and compliance overview, which is the surface to cite rather than a sales claim. The control family that anchors this question is EU AI Act Article 10 (data and data governance) for high-risk systems and the data-handling clauses inside ISO/IEC 42001's AIMS scope.

Training, fine-tuning, and prompt-retention defaults

"We do not train on customer data" is a binary the questionnaire should treat as a binary. The reviewer needs the default posture, the contractual override path, and the retention TTL for prompts and completions inside the inference logs. The NIST AI 600-1 Generative AI Profile is the document to anchor template language to, because its risk taxonomy formalizes generative-AI-specific concerns — data confabulation, training-data leakage, IP exposure — that the original AI RMF 1.0 does not enumerate at the same resolution.

Identity, SSO, RBAC, and tenancy boundary

The reviewer needs SSO via the buyer's IdP, RBAC granular enough to constrain which roles can author or override policies, and a clear answer on where the tenancy boundary sits. ISO/IEC 42001 covers the resource and supplier control families that anchor this dimension; NIST AI RMF's Govern function provides the vocabulary for role accountability across the AI lifecycle.

Audit, evidence, and incident response

The reviewer needs the audit-log retention duration, the export format, the immutability story, and the incident-response runbook. The ENISA Multilayer Framework is the cleanest anchor: its three-layer split (cybersecurity foundations, AI-specific controls, sector-specific overlays) lets the reviewer score which security controls each deployment shape actually carries forward. For a regulated workload we recommend retention SLOs of at least 12 months for inference logs, with longer durations where sector overlays (HIPAA, financial services records, public-sector records) apply.

Latency budgets and policy enforcement

This is the engineer-side dimension. The reviewer needs the gateway overhead per request, the policy decision latency, and the failure-mode behavior when the policy engine cannot evaluate in time. Sub-1ms gateway overhead is a useful target for inline gateways; what matters more is the failure mode — fail-closed against high-risk policies, fail-open against advisory ones, with the choice configurable per policy class. ABV positions runtime enforcement specifically against the surfaces traditional security tools miss on its GenAI Risk Protection page.

Operating model and SLA per shape

The SLA the vendor offers in SaaS is not the SLA the buyer operates in VPC or on-prem. The reviewer needs both the vendor's operational commitments (uptime, RTO/RPO on the control plane) and a clear delineation of which responsibilities cross the boundary at each shape. ISO/IEC 42001's clauses on resource controls and supplier obligations are the right anchor here; the IBM Cost of a Data Breach Report provides industry framing for what an ungoverned AI footprint costs when it fails.

Five-plane process flow for a customer-VPC GenAI runtime: ingress gateway, runtime policy engine (allow, redact, block decisions), model endpoint routing, observability trace into customer audit store, and egress allow-list enforcement. — Customer-VPC reference flow: ingress, runtime policy engine, model routing, observability, and egress controls.

Reference architecture: runtime policy enforcement inside a customer VPC

The reference architecture for a customer-VPC deployment has five planes the reviewer needs to see drawn. Ingress lands at the gateway, which is where the runtime policy engine evaluates every request and response. The gateway authenticates against the buyer's IdP, routes to the appropriate model (first-party, third-party, or self-hosted), and emits a structured trace per request into the observability collector. The collector writes to a customer-controlled audit store with retention matched to the buyer's SLO. Egress controls — egress allow-lists, model-endpoint allow-lists, and sub-processor disclosure — close the loop.

The trace the gateway emits per request should record the prompt class, the policy decisions taken (allow, redact, block), the model invocation metadata, the response class, and any human-in-the-loop interventions. A useful engineering acceptance criterion is sub-1ms median gateway overhead with policy decisions completing inside the gateway's request lifecycle for the policy classes the buyer marks fail-closed. Fail-closed for prompt injection and data exfiltration, fail-open for advisory style checks; the engine should never silently downgrade fail-closed to fail-open without an alerting path. ABV's agent observability and GenAI guardrails surfaces describe the trace shape and policy authoring path in more detail.

AI vendor procurement checklist

Use this section as the review artifact. Each row is a question, the evidence the vendor should provide, and the control family that justifies the question.

Dimension	Question for the vendor	Vendor evidence to require	Control family
Deployment shape	Where does this vendor run? SaaS / VPC / on-prem / hybrid?	Architecture diagram with data-plane and control-plane separation	ISO/IEC 42001 AIMS scope
Region of data plane	Which region holds inference traffic and inference logs?	Region list with sub-processor disclosure	EU AI Act Art. 10
Encryption	What is the in-transit and at-rest encryption posture?	Trust-center documentation; ABV documents TLS in transit and AES-256 at rest on its security overview	ISO/IEC 42001 data-handling
Training on customer data	What is the default posture and the override path?	Written contractual term	NIST AI 600-1
Prompt and completion retention	What is the retention TTL and the export format?	Documented retention SLO	NIST AI RMF Map
Key custody	Who holds the encryption keys; is BYO KMS supported?	Key-management architecture diagram	ISO/IEC 27001 cryptographic controls
Identity	SSO via which IdPs; RBAC granularity?	SSO matrix and role catalog	ISO/IEC 42001 resource controls
Audit-log retention	Retention duration, immutability, export format?	Retention SLO and export schema	ENISA Multilayer
Latency	Median and p99 gateway overhead per request?	Benchmark methodology and result set	Engineering SLO
Policy enforcement failure mode	What happens when the engine cannot evaluate in time?	Fail-closed and fail-open policy class definitions	OWASP LLM Top 10
DPA availability	Is a DPA on offer; what processor obligations does it cover?	Signed-ready DPA template	EU AI Act regulator surface
BAA availability	Is a BAA on offer; in which regions?	Signed-ready BAA template	HIPAA Security Rule
EU AI Act intended-purpose class	What is the workload's intended-purpose classification?	Written intended-purpose statement	EU AI Act Annex III
ISO 42001 alignment	What evidence of AIMS implementation is provided?	Internal-audit or third-party report; ABV documents ISO 42001-aligned posture	ISO/IEC 42001
GenAI Profile coverage	Which NIST AI 600-1 risks are mitigated and how?	Mapping of policies to NIST AI 600-1 risk taxonomy	NIST AI 600-1
Incident response	RTO, RPO, breach-notification window?	Incident-response runbook	ISO/IEC 42001 ops controls

We recommend running this checklist twice during a vendor security review — once at the questionnaire stage and once against the live demo environment. A vendor whose answers move between the two passes is signaling that the marketing surface and the runtime surface disagree.

Where ABV fits without overclaiming

ABV documents regional deployment in US (Virginia), EU (Ireland), and dedicated HIPAA regions on its security and compliance overview, encryption of all data in transit (TLS) and at rest (AES-256), annual third-party audits of its ISO 27001 information security management system, and alignment with ISO/IEC 42001 — phrased as alignment rather than certification because that is what the source surface supports. The GenAI Risk Protection page frames the runtime differentiator as detection and policy enforcement against GenAI-specific threats that traditional security tools miss across third-party AI surfaces.

For regulated buyers, the practical reading is that ABV maps cleanly to the procurement-checklist columns for residency, encryption, ISO 42001 alignment, and runtime policy enforcement, and offers DPA and BAA paths per region. Buyers should still request the live trust-center artifacts under NDA — the audit report, the DPA template, the BAA template, the sub-processor list — rather than treat the public surfaces as the procurement evidence themselves.

FAQ

What is on-prem LLM deployment?

On-prem LLM deployment runs the model, gateway, and policy engine on customer-owned hardware inside the customer data center, so no inference traffic egresses the customer network boundary. It differs from VPC, where the data plane runs inside a customer-controlled cloud account, and from SaaS, where the data plane runs in the vendor's environment. The ENISA Multilayer Framework frames the boundary distinction across cybersecurity foundations, AI-specific controls, and sector overlays.

What is the difference between SaaS, VPC, and on-prem deployment for GenAI governance?

The differences sit on data-plane location, key custody, training-on-customer-data defaults, and audit-log retention. SaaS keeps the data plane and keys with the vendor; VPC moves the data plane into the customer's cloud account and makes BYO keys realistic; on-prem keeps everything in customer infrastructure. The NIST AI Risk Management Framework Map and Govern functions provide the common vocabulary across shapes, and ISO/IEC 42001 provides the supplier and resource control families.

Can a SaaS AI governance platform meet ISO 42001 requirements?

ISO/IEC 42001 is a management-system standard governing AI policies and procedures via the Plan-Do-Check-Act methodology rather than dictating any specific architecture, so SaaS can be ISO 42001-aligned. What constrains procurement is not ISO 42001 itself but the residency, data-handling, and control-effectiveness clauses that each deployment shape carries forward differently — multi-tenant SaaS may clear AIMS scope but still fail on data residency or BAA obligations.

What is a DPA and a BAA for AI vendors?

A DPA (Data Processing Agreement) is a contract under GDPR Article 28 that governs how a processor handles personal data on behalf of a controller. A BAA (Business Associate Agreement) is a contract under the HIPAA Security Rule that governs how a business associate handles ePHI on behalf of a covered entity. Each is contractual, not architectural; both must be obtainable in the deployment shape the buyer selects, or the shape is non-viable for that regulated workload.

How do I evaluate AI vendor data residency claims?

The evidence chain is regional disclosure for the data plane and the control plane, an explicit sub-processor list, the encryption-in-transit and at-rest posture, and the actual data flow per request. The EU AI Act data-governance article and ISO/IEC 42001 data-lifecycle clauses anchor the question. The qualifying test is whether the vendor can name, in writing, the regions the data plane runs in and the sub-processors it transits — claims a buyer cannot verify against a documented surface should be treated as unverifiable.

Next steps

Run the procurement checklist against the vendor's questionnaire response and against a live demo environment. Where the answers diverge, ask why. For ABV-specific evidence, the trust center is the starting surface, with the GenAI Risk Protection, GenAI Guardrails, and Agent Observability pages covering the runtime-control surface in more detail. Bring the live audit report, DPA template, and BAA template into the review under NDA before the final shape decision is signed off.

SaaS vs VPC vs on-prem for regulated GenAI workloads: reference architecture and procurement checklist