Prefill / decode disaggregation
Prompt processing and token generation run on separate, independently-scaled pools, each tuned for the work it does. Neither phase starves the other, and no GPU sits idle.
Iftah deploys approved models, agents, and Arabic RAG inside your cloud, data center, or air-gapped network — across AWS, Azure, GCP, OCI, OpenShift, or bare-metal, governed from one central control plane. Sensitive workloads stay where the data must live; prompts, data, and models never leave your boundary.
Multi-region, multi-cloud, hybrid, air-gapped. Governed as one.
Full Control
Your environment, your models, your keys.
In-Region Data
Residency and sovereignty, enforced by design.
Central Governance
Govern models, policies, and access from one plane.
Multi-Cloud & Hybrid
AWS, Azure, GCP, OCI, OpenShift, or on-prem.
Compliant by Design
Built for GDPR, HIPAA, PDPL, and local mandates.
KSA · UAE · Kuwait · Qatar · Bahrain · Oman
Customer AI Cluster
AWS / Customer Account
United States
HIPAA, SOC 2
Customer AI Cluster
Azure / Customer Subscription
European Union
GDPR, ISO 27001
Customer AI Cluster
OCI / Customer Tenancy
Australia
IRAP, ISO 27001
Customer AI Cluster
GCP / Customer Project
Singapore
PDPA, ISO 27001
Customer AI Cluster
OpenShift / On-Prem or Cloud
South Africa
POPIA, ISO 27001
Customer prompts, data, vector stores, and model weights remain in-region. Iftah only governs deployment, policy, access, and observability across the fleet.
Sovereign by Design
AI runs where your data must stay.
Compliant Everywhere
Meets every local law and industry mandate.
One Platform
One governance model across every region.
Total Visibility
Observe, audit, and optimize the entire fleet.
Cost Efficient
Own the infrastructure. Control the spend.
Iftah serves models on your own accelerators with a disaggregated, cache-aware inference engine — the architecture hyperscalers use to run frontier models at scale. More tokens per GPU, faster first response, and longer context, with every prompt and token staying inside your network.
Prompt processing and token generation run on separate, independently-scaled pools, each tuned for the work it does. Neither phase starves the other, and no GPU sits idle.
Requests route to the worker that already holds the relevant context. Repeated prompts and long conversations skip recomputation — faster first token, far less wasted compute.
Hot context stays on the GPU; colder context tiers down to CPU memory, NVMe, and in-region storage. Serve longer contexts and more concurrent users without buying more GPUs.
One serving plane for text, vision, audio, and document models — the same routing, batching, and residency guarantees across every modality.
Capacity shifts between prefill and decode as demand moves through the day. The cluster follows real load instead of a static, over-provisioned split.
Run vLLM, TensorRT-LLM, SGLang, or your own engine behind one serving plane, and swap as the field moves — without re-architecting your stack.
The result: higher throughput per GPU, lower latency, and longer context — without a single prompt leaving your environment.
CISO priority
Residency, access, and auditability
Prove that prompts, responses, embeddings, and logs stay inside the approved region or network boundary.
CIO priority
Repeatable operating model
Give data, platform, and app teams one governed access path instead of scattered AI experiments.
Business priority
Pilot decision evidence
Move the first use case from innovation lab to reviewable pilot with owners, evidence, and rollout criteria.
Install Iftah on any Kubernetes substrate — EKS, AKS, GKE, OKE, OpenShift, or bare-metal — in your cloud, data center, or air-gapped network. The data plane never leaves your control.
Route approved models, agents, and Arabic RAG through one policy-checked gateway — identity, classification, and guardrails on every request.
Capture signed, content-free audit trails and runtime health that your security and compliance reviewers can export.
Start with one workload and one boundary, then roll out the same operating model across clouds, on-prem sites, and sovereign regions — governed from one control plane.
Run Iftah on any substrate your review process approves, with identity, policy, quota, and audit unified across every environment.
Run inside your own AWS, Azure, Google Cloud, or Oracle Cloud account and region, never a shared tenant.
Deploy on OpenShift or any CNCF Kubernetes substrate inside the private cloud your platform teams already operate.
Install in your own data center or accelerator cluster, with no public cloud dependency in the path.
Run fully disconnected with a client-local registry and signed offline updates; no prompts, content, or models leave the network.
Keep sensitive workloads on-premise and run others in public cloud, all governed from one control plane.
Govern AI across many clusters and clouds from one policy and audit plane, consistent and reviewable everywhere.
Private knowledge assistant
Search policy, procedures, contracts, and Arabic knowledge bases without sending sensitive prompts or embeddings to a public API.
Buying trigger
Reduce risky ad-hoc AI use
Regulated document review
Give legal, banking, health, or government teams a controlled workflow for summaries, extraction, review, and evidence retention.
Buying trigger
Create auditable AI workflows
Field and operations copilots
Run assistants for energy, telecom, and industrial teams near restricted systems while keeping rollout, access, and health visible.
Buying trigger
Deploy AI near operational data
Run on any Kubernetes substrate — public cloud, private cloud, on-prem, or air-gapped — and keep hybrid estates governed from one control plane. Your data plane stays under your control.
Gateway policies, request traces, retention choices, model inventory, and runtime health are visible before production expansion.
Start with one workload, one boundary, and one decision report; then scale the same operating model across teams.
The first engagement is designed to give your stakeholders concrete artifacts, not vague AI strategy slides.
Tell us your country, first workload, target environment, and review constraints. We will map a practical private AI path with your technical owners.
Book a sovereign AI architecture review