Private, sovereign AI — running inside your own cloud

Iftah deploys approved models, agents, and Arabic RAG inside your cloud, data center, or air-gapped network — across AWS, Azure, GCP, OCI, OpenShift, or bare-metal, governed from one central control plane. Sensitive workloads stay where the data must live; prompts, data, and models never leave your boundary.

Book a sovereign AI architecture review See the platform

Built for regulated Gulf enterprises
On-prem · Sovereign cloud · Air-gapped
SAMA · CBUAE · PDPL · NCA review-ready

Every AI cluster pattern, one control plane

Multi-region, multi-cloud, hybrid, air-gapped. Governed as one.

Full Control

Your environment, your models, your keys.

In-Region Data

Residency and sovereignty, enforced by design.

Central Governance

Govern models, policies, and access from one plane.

Multi-Cloud & Hybrid

AWS, Azure, GCP, OCI, OpenShift, or on-prem.

Compliant by Design

Built for GDPR, HIPAA, PDPL, and local mandates.

Iftah Global Control Plane

KSA · UAE · Kuwait · Qatar · Bahrain · Oman

North America

Customer AI Cluster

AWS / Customer Account

Models

RAG · Vector DB

Workloads

Data Plane Agent

United States

HIPAA, SOC 2

Europe

Customer AI Cluster

Azure / Customer Subscription

Models

RAG · Vector DB

Workloads

Data Plane Agent

European Union

GDPR, ISO 27001

Australia

Customer AI Cluster

OCI / Customer Tenancy

Models

RAG · Vector DB

Workloads

Data Plane Agent

Australia

IRAP, ISO 27001

Southeast Asia

Customer AI Cluster

GCP / Customer Project

Models

RAG · Vector DB

Workloads

Data Plane Agent

Singapore

PDPA, ISO 27001

Africa

Customer AI Cluster

OpenShift / On-Prem or Cloud

Models

RAG · Vector DB

Workloads

Data Plane Agent

South Africa

POPIA, ISO 27001

Iftah Global Control Plane

KSA · UAE · Kuwait · Qatar · Bahrain · Oman

North America

Customer AI Cluster

AWS / Customer Account

Models

RAG · Vector DB

Workloads

Data Plane Agent

United States

HIPAA, SOC 2

Europe

Customer AI Cluster

Azure / Customer Subscription

Models

RAG · Vector DB

Workloads

Data Plane Agent

European Union

GDPR, ISO 27001

Australia

Customer AI Cluster

OCI / Customer Tenancy

Models

RAG · Vector DB

Workloads

Data Plane Agent

Australia

IRAP, ISO 27001

Southeast Asia

Customer AI Cluster

GCP / Customer Project

Models

RAG · Vector DB

Workloads

Data Plane Agent

Singapore

PDPA, ISO 27001

Africa

Customer AI Cluster

OpenShift / On-Prem or Cloud

Models

RAG · Vector DB

Workloads

Data Plane Agent

South Africa

POPIA, ISO 27001

Control plane sees metadata only

✓Model registrations
✓Policies & guardrails
✓Deployment configs
✓Access & permissions
✓Audit events & metrics

✕Prompts / queries
✕Customer data
✕Vector databases
✕Model weights
✕Files & documents

Your data never leaves your boundary.

Customer prompts, data, vector stores, and model weights remain in-region. Iftah only governs deployment, policy, access, and observability across the fleet.

Sovereign by Design

AI runs where your data must stay.

Compliant Everywhere

Meets every local law and industry mandate.

One Platform

One governance model across every region.

Total Visibility

Observe, audit, and optimize the entire fleet.

Cost Efficient

Own the infrastructure. Control the spend.

Frontier-model performance, inside your boundary

Iftah serves models on your own accelerators with a disaggregated, cache-aware inference engine — the architecture hyperscalers use to run frontier models at scale. More tokens per GPU, faster first response, and longer context, with every prompt and token staying inside your network.

Request

Smart routing

Prefill

KV transfer

Decode

Stream

Prefill / decode disaggregation

Prompt processing and token generation run on separate, independently-scaled pools, each tuned for the work it does. Neither phase starves the other, and no GPU sits idle.

Cache-aware routing

Requests route to the worker that already holds the relevant context. Repeated prompts and long conversations skip recomputation — faster first token, far less wasted compute.

Tiered KV-cache offload

Hot context stays on the GPU; colder context tiers down to CPU memory, NVMe, and in-region storage. Serve longer contexts and more concurrent users without buying more GPUs.

Multi-modal serving

One serving plane for text, vision, audio, and document models — the same routing, batching, and residency guarantees across every modality.

Elastic GPU scheduling

Capacity shifts between prefill and decode as demand moves through the day. The cluster follows real load instead of a static, over-provisioned split.

Engine-agnostic runtime

Run vLLM, TensorRT-LLM, SGLang, or your own engine behind one serving plane, and swap as the field moves — without re-architecting your stack.

The result: higher throughput per GPU, lower latency, and longer context — without a single prompt leaving your environment.

Built for Gulf security, data, and platform teams

CISO priority

Residency, access, and auditability

Prove that prompts, responses, embeddings, and logs stay inside the approved region or network boundary.

CIO priority

Repeatable operating model

Give data, platform, and app teams one governed access path instead of scattered AI experiments.

Business priority

Pilot decision evidence

Move the first use case from innovation lab to reviewable pilot with owners, evidence, and rollout criteria.

From your boundary to production, one governed path

Deploy in your environment

Install Iftah on any Kubernetes substrate — EKS, AKS, GKE, OKE, OpenShift, or bare-metal — in your cloud, data center, or air-gapped network. The data plane never leaves your control.

Govern at the gateway

Route approved models, agents, and Arabic RAG through one policy-checked gateway — identity, classification, and guardrails on every request.

Observe and prove

Capture signed, content-free audit trails and runtime health that your security and compliance reviewers can export.

Scale pilot to production

Start with one workload and one boundary, then roll out the same operating model across clouds, on-prem sites, and sovereign regions — governed from one control plane.

Multi-cloud and hybrid, governed from one control plane

Run Iftah on any substrate your review process approves, with identity, policy, quota, and audit unified across every environment.

Public cloud

Run inside your own AWS, Azure, Google Cloud, or Oracle Cloud account and region, never a shared tenant.

Private cloud

Deploy on OpenShift or any CNCF Kubernetes substrate inside the private cloud your platform teams already operate.

On-premise & bare metal

Install in your own data center or accelerator cluster, with no public cloud dependency in the path.

Sovereign & air-gapped

Run fully disconnected with a client-local registry and signed offline updates; no prompts, content, or models leave the network.

Hybrid

Keep sensitive workloads on-premise and run others in public cloud, all governed from one control plane.

Multi-cloud

Govern AI across many clusters and clouds from one policy and audit plane, consistent and reviewable everywhere.

Where Iftah fits best

Banking

Public sector

Healthcare

Energy

Telecom

Arabic knowledge

Priority use cases for Gulf buyers

Private knowledge assistant

Search policy, procedures, contracts, and Arabic knowledge bases without sending sensitive prompts or embeddings to a public API.

Buying trigger

Reduce risky ad-hoc AI use

Regulated document review

Give legal, banking, health, or government teams a controlled workflow for summaries, extraction, review, and evidence retention.

Buying trigger

Create auditable AI workflows

Field and operations copilots

Run assistants for energy, telecom, and industrial teams near restricted systems while keeping rollout, access, and health visible.

Buying trigger

Deploy AI near operational data

Plan your use case

Why teams choose Iftah

Multi-cloud, hybrid, customer-controlled

Run on any Kubernetes substrate — public cloud, private cloud, on-prem, or air-gapped — and keep hybrid estates governed from one control plane. Your data plane stays under your control.

Evidence for security review

Gateway policies, request traces, retention choices, model inventory, and runtime health are visible before production expansion.

A path from pilot to production

Start with one workload, one boundary, and one decision report; then scale the same operating model across teams.

Know what you will receive

The first engagement is designed to give your stakeholders concrete artifacts, not vague AI strategy slides.

Architecture fitSecurity evidencePilot plan

Start with the architecture, then the pilot

Tell us your country, first workload, target environment, and review constraints. We will map a practical private AI path with your technical owners.

Book a sovereign AI architecture review