Private AI Foundation · Part 1

Building a Private AI Platform in VMware Cloud Foundation

Why I am thinking about private AI as an infrastructure platform, not just another way to run a model.

By Brandon Quantz May 2026 7 min read

The views in this post are my own and do not represent my employer. Technical details are generalized and do not disclose customer, proprietary, or confidential information.

Why I am building PAIF instead of just calling an API

Most people talking about AI infrastructure today are either building SaaS apps on public cloud APIs or experimenting with consumer GPUs in a homelab.

I am somewhere in the middle.

I work heavily in private cloud infrastructure, specifically around VMware Cloud Foundation, VMware NSX, Kubernetes, and enterprise infrastructure architecture. Over the last year, I have been building and testing what I call a Private AI Foundation environment.

Not a toy demo. Not "run ChatGPT locally on a gaming PC."

The idea is an enterprise-style AI platform architecture designed around private infrastructure, GPU acceleration, Kubernetes, workload isolation, multi-tenancy, data locality, and operational reality.

The goal is not to compete with hyperscalers.

The goal is to answer a more important question:

What does AI infrastructure look like for organizations that cannot, or should not, send everything to a public LLM?

That is the problem space I am interested in.

The hardware changes the conversation

The environments I have been working with are GPU-enabled VMware Cloud Foundation platforms with enterprise virtualization, Kubernetes integration, software-defined networking through VMware NSX, and GPU-aware workload placement.

This matters because AI infrastructure changes the assumptions of traditional virtualization.

For years, enterprise infrastructure optimized around:

  • CPU oversubscription
  • VM density
  • storage consolidation
  • east/west VM traffic
  • disaster recovery

AI changes the bottleneck.

Now the constraints become:

  • GPU scheduling
  • VRAM locality
  • PCIe topology
  • storage throughput for model loading
  • inference latency
  • container orchestration
  • data gravity
  • power and cooling

That is a completely different architecture conversation.

Why not just use public AI APIs?

I do use external APIs.

In many cases, using a managed service such as Azure OpenAI is the correct answer. But there are several problems enterprises are already running into.

Data sensitivity

Many organizations are uncomfortable sending financial data, healthcare records, operational documents, internal engineering data, or customer information into external AI systems.

Sometimes that concern is regulatory. Sometimes it is contractual. Sometimes it is simply risk management.

Cost at scale

Calling APIs is easy at small scale.

But once organizations begin embedding documents, running large retrieval pipelines, building agent workflows, processing tickets, summarizing logs, or handling high request volumes, the economics start changing.

Inference becomes infrastructure.

Latency and control

A private AI platform gives you workload control, model choice, networking control, observability, security boundaries, and integration flexibility.

That matters once AI becomes operational infrastructure instead of just an experiment.

The architecture direction

One of the biggest misconceptions I see is the idea that private AI means "run a giant frontier model locally."

That is usually unrealistic.

Even organizations with strong infrastructure teams often do not have enough GPU capacity to efficiently host the newest frontier-scale models internally.

Instead, I think the future looks hybrid.

Private AI Foundation operating model
Local
Embeddings, RAG, data ingestion, classification, sensitive processing
Platform
Kubernetes, VCF, NSX, GPU scheduling, observability, workload boundaries
External
Frontier reasoning, large-scale generation, multimodal models, burst capacity
Local infrastructure External AI services
Embeddings Frontier reasoning
RAG pipelines Large-scale generation
Data ingestion Advanced multimodal models
Classification Massive context windows
Summarization Specialized reasoning
Security-sensitive processing Burst capacity

In other words, the private infrastructure becomes the AI operating platform, not necessarily the location of every large model.

That distinction matters.

Why Kubernetes matters here

Traditional virtualization alone is not enough for modern AI platforms. The orchestration layer matters.

Where the architecture shifts
Classic private cloud
VM density, storage consolidation, DR, east/west VM traffic
AI platform layer
Containers, APIs, pipelines, model endpoints, vector databases, workers
New constraints
GPU scheduling, VRAM locality, storage throughput, latency, power, cooling

That is why I have been spending time testing supervisor patterns, Kubernetes integration, containerized AI services, registry flows, networking models, GPU-aware scheduling, vGPU concepts, and workload isolation strategies inside VMware Cloud Foundation environments.

AI workloads increasingly behave more like cloud-native applications than traditional VMs.

The operational model shifts toward:

  • containers
  • pipelines
  • APIs
  • workers
  • vector databases
  • model endpoints
  • orchestration systems

That is a very different mindset from classic virtualization.

What this series will cover

Over the next set of posts, I want to document both the technical side and the architectural thinking behind building a private AI platform.

Topics will likely include:

  • designing AI infrastructure on VCF
  • GPU architecture considerations
  • private AI vs. public AI tradeoffs
  • Kubernetes for AI workloads
  • vGPU and workload partitioning
  • AI networking and NSX implications
  • storage considerations for AI
  • RAG architecture patterns
  • operational realities and failure modes
  • what enterprises are getting wrong
  • hybrid AI design patterns
  • AI infrastructure economics
  • where I think this is all heading

I am intentionally writing this from the perspective of a Senior Product Architect specializing in infrastructure, not an AI influencer, not a prompt engineering course seller, and not someone pretending enterprise AI is magically simple.

Because it is not.

But it is becoming one of the most important infrastructure transitions we have seen in a long time.

And I think the people who understand both infrastructure and AI systems are going to be in a very interesting position over the next decade.