Agentic Operations

Conversational Operations Needs a Platform, Not a Prompt

A useful AI operations layer is not a clever prompt on top of disconnected tools. It needs a source of truth, bounded services, durable telemetry, and a tool surface the agent can actually use.

By Brandon Quantz June 2026 9 min read

The attractive demo is the easy part

It is easy to imagine the conversation: what is wrong with the network, provision four hosts, deploy the lab, tell me what is going end of life.

The hard part is not phrasing the request. The hard part is whether the platform underneath the request can answer it with evidence and act on it safely.

A conversational operations system cannot be built out of vibes. It needs inventory, telemetry, workflow engines, credentials, policy, audit, and rollback paths. The model is the front door. The platform is the building.

Disaggregated services are the shape

The pattern I keep coming back to is a disaggregated platform. Each service owns one bounded domain and exposes that domain through clean interfaces.

Inventory belongs in the source of truth. IP addresses and DNS belong in IPAM. Bare-metal installation belongs in the provisioning engine. Network discovery belongs in topology tooling. Workload deployment belongs in the lab builder. AI access belongs behind a tool surface, not scattered through one-off scripts.

That separation matters because no single service has to understand the whole world. It only has to keep its contract.

Platform underneath the conversation
source of truth Inventory and IP state

Devices, racks, interfaces, addresses, custom operational state.

provisioning Bare metal to platform

Host selection, OS install, bring-up workflow, status writeback.

operations Topology and telemetry

Observed state, drift, alarms, utilization, fabric health.

agent surface Tools and policy

Read broadly, mutate narrowly, audit every meaningful action.

NetBox as the common language

The platform needs a shared language for physical reality. Devices, racks, roles, interfaces, addresses, cables, out-of-band endpoints, and custom operational state have to be somewhere all the other systems can read.

That is why a source of truth is not documentation in this design. It is an integration layer.

Provisioning reads from it. Discovery writes back to it. Hardware inventory enriches it. Topology validation compares observed state to declared state. The AI tool surface uses it to avoid guessing what exists.

The tool surface is where AI becomes operational

An agent cannot operate infrastructure just because it can reason about infrastructure. It needs tools.

The tool surface is where inventory reads, switch queries, virtualization checks, firewall inspection, storage lookups, DNS operations, and provisioning status all become callable in a consistent way.

That is also where governance can attach. Tool calls can be logged. High-risk tools can require approval. Read-only tools can be broadly available. Mutating tools can be constrained. The agent sees one operating surface; the platform still enforces boundaries.

The four flagship questions

A useful way to test the platform is to ask four representative questions.

What is wrong with the network requires topology, drift detection, fabric health, firewall state, NSX alarms, recent changes, and eventually live link utilization. Provision a new VCF instance requires host selection, IP allocation, hypervisor install, installer deployment, bring-up, and verification. Deploy a workload VM set requires a target platform, templates, directory integration, Terraform, Ansible, and post-deploy checks. What is going end of life requires complete inventory, hardware detail, lifecycle data, vendor EOL feeds, and impact ranking.

Those questions are not just demos. They are architecture tests. Every missing answer points at a missing integration.

Question Minimum evidence path
What's wrong with the network? topology drift + fabric health + alarms + recent change context
Provision four hosts candidate selection + IP/DNS allocation + install workflow + validation
Deploy a workload set template + target cluster + identity + Terraform/Ansible result
What is going end of life? inventory + component detail + lifecycle source + impact ranking

Honesty is part of the design

The platform does not need to pretend everything is done. A good agentic operations layer should say which parts are live, which are partial, and which are blocked by missing data or missing tools.

That honesty is more useful than a polished demo. It turns the conversation into an operating roadmap.

Conversational operations becomes real when the answer can be traced back to state, when the proposed action passes through policy, and when the platform can verify what changed afterward.