The private AI question moves from model to boundary
The first two PAIF posts were about why private AI exists and how a Tier 3 private AI reference architecture can be designed. This third part is about the point where the architecture starts to act.
A private model that only answers questions is useful. A private model that proposes operational changes is much more useful. It is also much more dangerous if the output is treated as authority.
That distinction is the design center. In the PAIF pattern I am building toward, the model is not a privileged operator. It is an untrusted suggestion engine running inside the environment. The platform around it decides what the suggestion is allowed to become.
Why self-host the model at all
The reason to run the model in-lab is not that every local model is better than every hosted model. It usually is not. The reason is that infrastructure context is sensitive by default.
Inventory, device configuration, customer lab data, topology, alarms, service accounts, and operational history are not generic prompt material. If an application needs that context to be useful, the data boundary becomes a first-class architecture decision.
For PAIF, the model endpoint should sit inside the same trust envelope as the platform. Applications call an internal, authenticated inference endpoint. Data stays in the lab. The model can be smaller and slower if the use case is bounded enough. The architecture is the point, not a leaderboard score.
The output-to-action firewall
The core control is simple: model output never executes directly.
The model returns a structured proposal: action, target, parameters, reason, and expected effect. That proposal is data. It is parsed, checked against a schema, checked against policy, assigned a risk tier, and only then routed to an existing executor.
This matters because prompt injection is not theoretical in operations. Device output, logs, tickets, old configuration comments, and user text can all carry hostile or misleading instructions. If the model can choose its own authority, the system is already lost. If the model can only emit a proposal that policy must accept, the blast radius is controlled.
The handoff should look boring on purpose. The model does not emit a shell command. It emits a narrow object the platform can inspect.
{
"proposal_id": "prop_1047",
"action": "create_segment",
"target": "workload-network",
"parameters": {
"name": "customer-lab-web",
"cidr": "example-private-cidr",
"gateway": "example-private-gateway"
},
"reason": "new lab request needs isolated web tier",
"expected_effect": "segment exists, no firewall rule changed"
}
Risk comes from the action, not from the model
A common mistake is to treat every AI-driven action as equally risky because a model was involved. That is too blunt to operate.
The better rule is that the risk tier comes from what the action touches. A read-only explanation of a failed deployment is not the same class of event as rotating a credential, changing a firewall policy, powering off a host, or merging to a production branch.
Low-risk proposals can be allowed to proceed automatically after validation. Higher-risk proposals require an independent security review or explicit human approval. Irreversible or production-mutating actions stop for a human. The model does not get to downgrade that tier because it sounds confident.
| Proposal | Risk source | Result |
|---|---|---|
| Summarize failed deployment logs | Read-only diagnostic context | Auto-allow, audit only |
| Create a disposable lab segment | Network mutation, reversible | Policy allow + normal change record |
| Rotate a service credential | Identity and secret material | Human approval required |
| Delete production state | Irreversible operational impact | Denied unless separately authorized |
policy_result:
decision: deny
reason: "model proposed credential rotation"
required_gate: "human"
executor_called: false
The reusable pattern
The useful shape is model-agnostic. The application should not care whether the backend is a CPU pilot model, a GPU-hosted open model, or a different internal endpoint later.
The durable interface is an OpenAI-compatible internal endpoint plus a firewall component that every app uses before execution. That lets the platform test the action path before the final inference hardware is ready.
The first pilot does not need to prove model quality. It needs to prove the gate: no proposed action reaches execution without schema validation, policy validation, risk-tier assignment, audit logging, and the existing guardrailed executor.
What this unlocks
Once this boundary exists, the use cases get interesting. A provisioning system can draft a multi-host deployment plan. A lab builder can explain why a workload deployment failed. A network tool can propose the specific VLAN or segment changes needed for a new environment. A troubleshooting assistant can collect evidence and recommend the next bounded action.
The model supplies judgment and synthesis. The platform supplies identity, policy, state, approval, and execution.
That is the difference between putting a chatbot near infrastructure and building private AI into infrastructure operations.