The views in this post are my own and do not represent my employer. Technical details are generalized and do not disclose customer, proprietary, or confidential information.
Why I am building PAIF instead of just calling an API
Most people talking about AI infrastructure today are either building SaaS apps on public cloud APIs or experimenting with consumer GPUs in a homelab.
I am somewhere in the middle.
I work heavily in private cloud infrastructure, specifically around VMware Cloud Foundation, VMware NSX, Kubernetes, and enterprise infrastructure architecture. Over the last year, I have been building and testing what I call a Private AI Foundation environment.
Not a toy demo. Not "run ChatGPT locally on a gaming PC."
The idea is an enterprise-style AI platform architecture designed around private infrastructure, GPU acceleration, Kubernetes, workload isolation, multi-tenancy, data locality, and operational reality.
The goal is not to compete with hyperscalers.
The goal is to answer a more important question:
What does AI infrastructure look like for organizations that cannot, or should not, send everything to a public LLM?
That is the problem space I am interested in.
The hardware changes the conversation
The environments I have been working with are GPU-enabled VMware Cloud Foundation platforms with enterprise virtualization, Kubernetes integration, software-defined networking through VMware NSX, and GPU-aware workload placement.
This matters because AI infrastructure changes the assumptions of traditional virtualization.
For years, enterprise infrastructure optimized around:
- CPU oversubscription
- VM density
- storage consolidation
- east/west VM traffic
- disaster recovery
AI changes the bottleneck.
Now the constraints become:
- GPU scheduling
- VRAM locality
- PCIe topology
- storage throughput for model loading
- inference latency
- container orchestration
- data gravity
- power and cooling
That is a completely different architecture conversation.
Why not just use public AI APIs?
I do use external APIs.
In many cases, using a managed service such as Azure OpenAI is the correct answer. But there are several problems enterprises are already running into.
Data sensitivity
Many organizations are uncomfortable sending financial data, healthcare records, operational documents, internal engineering data, or customer information into external AI systems.
Sometimes that concern is regulatory. Sometimes it is contractual. Sometimes it is simply risk management.
Cost at scale
Calling APIs is easy at small scale.
But once organizations begin embedding documents, running large retrieval pipelines, building agent workflows, processing tickets, summarizing logs, or handling high request volumes, the economics start changing.
Inference becomes infrastructure.
Latency and control
A private AI platform gives you workload control, model choice, networking control, observability, security boundaries, and integration flexibility.
That matters once AI becomes operational infrastructure instead of just an experiment.
The architecture direction
One of the biggest misconceptions I see is the idea that private AI means "run a giant frontier model locally."
That is usually unrealistic.
Even organizations with strong infrastructure teams often do not have enough GPU capacity to efficiently host the newest frontier-scale models internally.
Instead, I think the future looks hybrid.
- Local
- Embeddings, RAG, data ingestion, classification, sensitive processing
- Platform
- Kubernetes, VCF, NSX, GPU scheduling, observability, workload boundaries
- External
- Frontier reasoning, large-scale generation, multimodal models, burst capacity
| Local infrastructure | External AI services |
|---|---|
| Embeddings | Frontier reasoning |
| RAG pipelines | Large-scale generation |
| Data ingestion | Advanced multimodal models |
| Classification | Massive context windows |
| Summarization | Specialized reasoning |
| Security-sensitive processing | Burst capacity |
In other words, the private infrastructure becomes the AI operating platform, not necessarily the location of every large model.
That distinction matters.
Why Kubernetes matters here
Traditional virtualization alone is not enough for modern AI platforms. The orchestration layer matters.
- Classic private cloud
- VM density, storage consolidation, DR, east/west VM traffic
- AI platform layer
- Containers, APIs, pipelines, model endpoints, vector databases, workers
- New constraints
- GPU scheduling, VRAM locality, storage throughput, latency, power, cooling
That is why I have been spending time testing supervisor patterns, Kubernetes integration, containerized AI services, registry flows, networking models, GPU-aware scheduling, vGPU concepts, and workload isolation strategies inside VMware Cloud Foundation environments.
AI workloads increasingly behave more like cloud-native applications than traditional VMs.
The operational model shifts toward:
- containers
- pipelines
- APIs
- workers
- vector databases
- model endpoints
- orchestration systems
That is a very different mindset from classic virtualization.
What this series will cover
Over the next set of posts, I want to document both the technical side and the architectural thinking behind building a private AI platform.
Topics will likely include:
- designing AI infrastructure on VCF
- GPU architecture considerations
- private AI vs. public AI tradeoffs
- Kubernetes for AI workloads
- vGPU and workload partitioning
- AI networking and NSX implications
- storage considerations for AI
- RAG architecture patterns
- operational realities and failure modes
- what enterprises are getting wrong
- hybrid AI design patterns
- AI infrastructure economics
- where I think this is all heading
I am intentionally writing this from the perspective of a Senior Product Architect specializing in infrastructure, not an AI influencer, not a prompt engineering course seller, and not someone pretending enterprise AI is magically simple.
Because it is not.
But it is becoming one of the most important infrastructure transitions we have seen in a long time.
And I think the people who understand both infrastructure and AI systems are going to be in a very interesting position over the next decade.