Forge is an application I am actively developing. This post is the first in a series on the problem space, architecture, implementation direction, and lessons learned while building it.
The problem
Every infrastructure platform has an automation story. Terraform provisions cloud resources. Ansible configures running systems. vLCM manages ESXi at scale. Nutanix Foundation images nodes. VMware Cloud Foundation bring-up orchestrates a full SDDC stack.
Every one of them assumes the operating system is already there.
That assumption is where lab engineers, small infrastructure teams, and anyone who has ever racked a physical server still spend hours they should not have to spend. Booting ISOs. Babysitting installers. Re-running kickstarts. Checking firmware by hand. Running post-install scripts from a terminal. Wondering which host is in which state after a rebuild gets interrupted.
The gap between a powered-off server and a running platform has never had a clean answer. That is the gap Forge is meant to own.
What Forge is
Forge is a bare-metal-to-platform automation system. It owns the physical layer: the part every other tool quietly assumes is already done.
Starting from a powered-off server with an out-of-band management interface, Forge can:
- Pull hardware inventory from the BMC over Redfish.
- Capture CPU, memory, storage topology, NIC state, and firmware details.
- Generate a host-specific OS installer configuration.
- Remaster a source ISO with that configuration embedded.
- Mount the ISO as virtual media.
- Force a one-time boot through the BIOS job queue.
- Track the install as a live job with streamed logs.
- Apply post-install configuration over SSH once the OS is up.
- Hand off to the next platform-specific workflow stage.
No PXE infrastructure. No DHCP reservations. No manually staging ISOs. No hoping a screen session survived. One form submission, one job, one audit trail: from powered-off metal to configured operating system.
- Input
- BMC access, hardware inventory, OS media, host intent
- Forge work
- Install media, virtual media boot, job tracking, post-install config
- Output
- A configured host ready for VCF, Nutanix, Proxmox, Azure Local, or Linux workflows
The workflow idea
An OS install is stage one. The larger idea is a workflow engine for platform standup: an ordered chain of jobs across multiple hosts with dependency tracking, retries, logs, and status rollup.
| Workflow | Stage 1 | Stage 2 | Stage 3 | End state |
|---|---|---|---|---|
| VMware VCF | ESXi on N hosts | Installer appliance | Bring-up API | SDDC ready |
| Nutanix AHV | Phoenix imaging | Cluster formation | Prism configuration | Cluster ready |
| Proxmox | Proxmox VE on N hosts | Cluster join | Ceph initialization | Cluster ready |
| Azure Local | Windows Server on N hosts | Failover cluster | S2D / Arc registration | HCI ready |
| Generic Linux | Distro install | Custom post-install | Optional handoff | Ready host |
Each workflow is a template. Assign hosts, provide platform credentials, submit, and let Forge handle sequencing. Every stage is a job. Every job has a log. The whole standup becomes something you can run again later without depending on memory, notes, or a terminal scrollback that disappeared three rebuilds ago.
- Host layer
- Inventory, virtual media, one-time boot, OS install, post-install config
- Job layer
- State, logs, retries, dependencies, and failure reporting
- Platform layer
- VCF, Nutanix, Proxmox, Azure Local, or Linux handoff workflows
Why it is different
Forge starts from zero. Not from a running operating system. Not from a provisioned VM. Not from a node that a platform installer can already reach. From a powered-off physical server.
That is not a subtle distinction. It is where real infrastructure work begins, and it is where most higher-level automation tools intentionally stop caring.
Forge is also not trying to replace Terraform, Ansible, VCF bring-up, Nutanix Foundation, or platform-native automation. It runs before those tools. The goal is to put the machine into a known, configured, auditable state and then hand off cleanly to whatever comes next.
Hardware-aware by design
The inventory step matters. Forge reads real hardware state from the BMC before writing installer configuration: CPUs, memory, disks, controllers, NICs, link state, firmware versions, and management details.
That turns provisioning from "I hope this host looks like the spreadsheet" into something the system can reason about. If a NIC link is down, storage looks wrong, or firmware does not match the expected profile, that should be visible before the install starts, not discovered halfway through a platform bring-up.
Repeatability is the product
Labs get rebuilt. VCF domains get redeployed. Nutanix clusters get re-imaged. Proxmox clusters get torn down and built again. That is not an edge case. For the environments I work in, rebuilds are part of the operating model.
Forge treats repeatability as the product. Host definitions, installer configs, kickstart libraries, workflow templates, and job history all become part of the system. The tenth rebuild should be as boring as the first.
Why a web app
I want Forge to be visible and auditable from a browser. The work it performs is too important to hide inside a one-off terminal session.
A web app gives the full workflow a surface area: hosts, inventory, job state, live logs, notes, install history, workflow status, and failures that can be reviewed after the fact. If a host fails during install, the question should not be "who had the terminal open?" It should be "what does the job log say?"
Current state
Forge currently handles the full ESXi bare-metal provisioning pipeline on Dell iDRAC hardware:
- Redfish inventory and lifecycle actions.
- Virtual media mount and BIOS boot-once through the job queue.
- Kickstart generation and ISO remastering with host-specific config.
- Post-install SSH configuration for networking, hostname, DNS, NTP, and TLS regeneration.
- Live job tracking with streamed logs, host notes, and a kickstart library.
- Multiple ESXi hosts under management with active installs running in the lab.
VCF Installer appliance deployment is the next active stage, with Nutanix Phoenix ISO support designed in parallel.
Where it goes
The foundation is in place. The Redfish layer is hardware-aware. The job engine exists. The host model is extensible. The next step is to move from individual host provisioning into full platform workflows.
- Appliance deployment with govc-based OVA workflows.
- Multi-host workflow chaining with dependency graphs and status rollup.
- OS pluggability for Phoenix, Windows unattended, Proxmox, and Linux installers.
- Workflow templates for VCF, Nutanix, Proxmox, and Azure Local.
- Workflow definitions as code: version-controlled, shareable, and executable from an API.
The end state is simple to describe and hard to build: one self-hosted system that can take supported infrastructure from powered-off metal to a running platform with logs, state, and repeatability built in.
The opportunity
Every organization running on-premises infrastructure rebuilds environments. Every platform vendor has a bring-up process that assumes someone else handled the hardware. Every lab engineer has spent a weekend reimaging servers that a well-designed system should have handled in an afternoon.
Forge is my attempt to build that system.
The physical layer is the beginning. The workflow engine is next. The platforms are the payload.