Clusters & Node Types

What is a Cluster?

A cluster in VergeOS is a logical grouping of nodes with the same hardware characteristics, forming a resource pool presented as usable assets in the VergeOS user interface. Clusters enable efficient management, scaling, and high availability for virtualized workloads.

Every VergeOS system starts with at least one cluster — the initial two controller nodes form the first cluster during installation. From there, you can add nodes to the existing cluster or create additional clusters with different roles and hardware profiles.

Why Clusters Matter

Clusters serve several purposes:

Resource pooling — Nodes in a cluster share compute and/or storage resources, presented as a unified pool
Workload placement — VMs and tenants are assigned to specific clusters (with optional failover clusters), ensuring workloads run on appropriate hardware
Hardware optimization — Different clusters can have different hardware profiles: high-memory nodes for databases, GPU-equipped nodes for rendering, NVMe-dense nodes for storage-intensive workloads
Independent scaling — Add capacity to specific resource pools without affecting others

Cluster Types

VergeOS supports three distinct cluster types that can be mixed and matched within a single system:

Cluster Type	Provides	vSAN Participation	Typical Use Case
Combined (HCI)	Compute + Storage	Yes — nodes contribute storage disks to vSAN tiers	General-purpose workloads, small-to-medium deployments
Storage-Only	Storage only	Yes — nodes contribute storage only	Dedicated storage expansion in UCI architectures
Compute-Only	Compute only	No — boot-only or PXE boot	High-compute workloads (ML, rendering, data analytics)

Node Types

Every physical or virtual server in a VergeOS system is a node. Nodes differ in how they join the system, what role they play, and which cluster they belong to. VergeOS defines four node types:

Controller Nodes (Node 1 and Node 2)

The first two nodes in every VergeOS system are controller nodes. They are special because:

Node 1 creates a brand-new VergeOS system. It initializes the vSAN, creates the first cluster, and runs post-install configuration (network setup, cluster creation for additional node types, etc.)
Node 2 joins the system created by Node 1 as the second controller, providing redundancy for all system management functions

Controller nodes always belong to Cluster 1. In an HCI topology, they provide both compute and storage. In a UCI topology, they manage the system but delegate storage and compute to dedicated clusters.

The first cluster must include at least two nodes with Tier 0 storage (metadata drives) — this is a hard requirement because Tier 0 holds the vSAN filesystem index and must be redundant.

Scale-Out Nodes

Scale-out nodes expand an existing HCI cluster by adding more compute and storage capacity. Key characteristics:

Identical hardware to the controller nodes in the cluster they join (same CPU generation, similar storage layout, matching NIC configuration)
Join the existing cluster automatically via network auto-detection — the node discovers the VergeOS system on the core fabric and joins without manual cluster assignment
Disks integrate seamlessly into the existing vSAN tiers
Contribute both compute (run VMs) and storage (vSAN participation)

Scale-out nodes are the simplest way to grow an HCI deployment — add a node and the cluster’s compute and storage capacity increases proportionally.

Storage-Only Nodes

Storage-only nodes are dedicated exclusively to expanding vSAN capacity. They:

Contribute disks to vSAN tiers but do not run VM workloads
Belong to a storage-only cluster (e.g., Cluster 2)
Require creating the storage cluster in the VergeOS UI before adding the first storage node
Are used in UCI architectures where storage and compute scale independently

Compute-Only Nodes

Compute-only nodes provide processing power without participating in vSAN storage. They:

Run VM workloads but have no local vSAN storage (boot-only disk or PXE boot)
Belong to a compute-only cluster (e.g., Cluster 3)
Require creating the compute cluster in the VergeOS UI before adding the first compute node
Access storage over the core fabric from nodes in HCI or storage-only clusters

Compute-only nodes are ideal for workloads that need high CPU/RAM/GPU density without proportional storage growth — machine learning, rendering, data analytics, or VDI.

Node Type Summary

Node Type	Role	Cluster	vSAN	Runs VMs	Join Method
Controller (Node 1)	Creates new system	Cluster 1	Yes (Tier 0 + workload tiers)	Yes (HCI) or No (UCI)	New system creation
Controller (Node 2)	Joins as redundant controller	Cluster 1	Yes (Tier 0 + workload tiers)	Yes (HCI) or No (UCI)	Joins Cluster 1
Scale-out	Adds HCI capacity	Cluster 1	Yes (workload tiers)	Yes	Auto-detect on core fabric
Storage-only	Dedicated storage expansion	Cluster 2+	Yes (workload tiers)	No	Joins designated storage cluster
Compute-only	Dedicated compute expansion	Cluster 2+	No (boot-only / PXE)	Yes	Joins designated compute cluster

In VMware vSphere, you add ESXi hosts to a vSphere cluster. All hosts in a cluster are essentially the same type — they all run VMs and, if vSAN is enabled, they all participate in storage. There is no native concept of a “storage-only host” or “compute-only host” within a single vSphere cluster.

VergeOS provides more granular node roles:

Controller nodes ≈ ESXi hosts that also run vCenter services (but without a separate vCenter VM)
Scale-out nodes ≈ Additional ESXi hosts added to an existing vSAN cluster
Storage-only nodes ≈ No direct VMware equivalent — the closest concept would be a dedicated vSAN witness or an external storage array, but VergeOS storage nodes are full vSAN participants
Compute-only nodes ≈ ESXi hosts in a cluster with no local vSAN disks, consuming shared storage (similar to ESXi hosts mounting NFS/iSCSI from an external array, but using vSAN over the core fabric instead)

The key difference: VergeOS clusters can be typed (HCI, storage-only, or compute-only), allowing fine-grained control over what each group of nodes contributes to the system.

How Nodes Join a System

The node joining process follows a strict sequence to prevent race conditions:

Key rules for node joining:

Node 1 must complete installation before Node 2 can join — Node 2 needs an existing system to connect to
Nodes join sequentially within a cluster — Node 3 after Node 2, Node 4 after Node 3, etc. — to prevent race conditions during cluster membership changes
Storage clusters must exist before storage nodes can join — create the cluster in the VergeOS UI first
Compute clusters must exist before compute nodes can join — same prerequisite
If deploying both storage and compute clusters, storage nodes should be added first so compute nodes can immediately access vSAN storage

Cluster Numbering and Naming

Clusters are numbered starting from 1 and can be renamed in the VergeOS UI:

Cluster Number	Default Role	Typical Name
Cluster 1	HCI (controllers + optional scale-out)	“HCI”, “Default”, or “Controllers”
Cluster 2	Storage-only (if UCI) or Compute-only (if hybrid)	“Storage” or “Compute”
Cluster 3	Compute-only (in full UCI with 3 clusters)	“Compute”

In a full UCI deployment with 3 clusters:

Cluster 1: Controllers (system management, Tier 0 metadata)
Cluster 2: Storage nodes (all vSAN workload storage)
Cluster 3: Compute nodes (all VM execution)

Minimum Requirements and High Availability

Requirement	Detail
Minimum nodes per system	2 (one controller pair)
Minimum nodes per cluster	2 (for redundancy during maintenance or failure)
Controller nodes	Exactly 2 per system — must have Tier 0 storage for vSAN metadata
HA behavior	If one node fails, its workloads migrate to the surviving node(s) in the same cluster
Maintenance mode	Nodes can be placed in maintenance mode; workloads are live-migrated to other nodes in the cluster before maintenance begins

Scaling: 2 to 200+ Nodes

VergeOS systems scale from a minimum 2-node HCI cluster to large multi-cluster deployments with 200+ nodes. The scaling strategy depends on your architecture:

HCI Scaling (Simple)

Add scale-out nodes to Cluster 1. Each node adds both compute and storage proportionally.

Best for: Balanced growth where compute and storage needs increase together.

UCI Scaling (Independent)

Add nodes to specific clusters based on which resource is the bottleneck:

Need more storage? Add nodes to the storage cluster
Need more compute? Add nodes to the compute cluster
Need more of both? Add to both clusters independently

Best for: Workloads with unbalanced resource demands (e.g., heavy storage with light compute, or GPU-dense compute with modest storage).

Best Practices for Scaling

Hardware consistency within clusters — Use the same hardware specs for all nodes in a cluster. Mixing different hardware within a cluster can cause performance and reliability issues.
Plan for N+1 redundancy — Size each cluster so that losing one node still leaves enough capacity for all workloads
Monitor before scaling — Use VergeOS dashboard metrics (CPU utilization, RAM usage, vSAN capacity) to identify which resource needs expansion
Scale without downtime — New nodes can be added to a running system without interrupting existing workloads

Deployment Topology Examples

The Terraform playground demonstrates four common topologies that map to real-world deployment patterns:

Topology	Nodes	Clusters	Architecture	When to Use
2-Node HCI	2 controllers	1 (HCI)	HCI	Small sites, edge, PoC, basic evaluation
HCI + Scale-Out	2 controllers + N scale-out	1 (HCI)	HCI	Growing HCI deployments needing balanced scaling
Hybrid (2 clusters)	2 controllers + N compute	2 (HCI + Compute)	Hybrid	Compute-heavy workloads with modest storage
UCI (3 clusters)	2 controllers + N storage + M compute	3 (Controller + Storage + Compute)	UCI	Large deployments needing independent compute/storage scaling

Key Takeaways

Concept	Summary
Cluster	Logical grouping of nodes with same hardware, forming a resource pool
Three cluster types	HCI (compute + storage), Storage-only, Compute-only — mixable within one system
Four node types	Controller, Scale-out, Storage-only, Compute-only — each with a specific role and join method
Minimum 2 nodes	Per cluster for redundancy; controllers require Tier 0 storage
Sequential joining	Nodes join one at a time to prevent race conditions
Hardware consistency	All nodes in a cluster should have matching hardware specifications
Independent scaling	UCI architecture allows adding compute or storage capacity independently
200+ nodes	Systems scale from 2-node HCI to large multi-cluster deployments

Next Steps

You now understand how VergeOS organizes nodes into clusters and how different node types serve different roles. In the hands-on lab, you will explore these concepts using the Terraform playground: Lab: Architecture Exploration →