Lab: Explore the Architecture
Lab Overview
Section titled “Lab Overview”In this lab, you will explore the VergeOS Terraform Playground — an open-source project that deploys virtual VergeOS systems using Terraform. By reading the code and documentation, you will reinforce the architecture concepts covered in this module: core fabric networking, vSAN storage tiers, cluster organization, and HCI vs UCI topologies.
What You Will Do
Section titled “What You Will Do”- Part 1 — Read the playground’s architecture documentation and Terraform code to identify how VergeOS concepts map to infrastructure-as-code
- Part 2 — Given a customer scenario, recommend and diagram a deployment topology
- Part 3 — Compare the four example deployment configurations and analyze their differences
Prerequisites
Section titled “Prerequisites”- A GitHub account (to clone the repository)
- Git installed on your workstation
- A text editor or IDE (VS Code recommended)
- No VergeOS system access is required — this lab is a reading and design exercise
Estimated Time
Section titled “Estimated Time”30 minutes
Part 1: Explore the Architecture
Section titled “Part 1: Explore the Architecture”In this section, you will clone the Terraform playground repository and trace how VergeOS architecture concepts are expressed in infrastructure-as-code.
-
Clone the repository
Terminal window git clone https://github.com/verge-io/vergeos-terraform-playground.gitcd vergeos-terraform-playground -
Read the architecture documentation
Open
docs/architecture.mdand read through the entire document. As you read, identify the answers to these questions:- What are the four deployment scenarios supported by the playground?
- What is an install seed file and how does it enable unattended installation?
- What is the minimum deployment size?
-
Examine the deployment scenario diagrams
Open
docs/deployment-scenarios.mdand study the Mermaid topology diagrams for each scenario. For each one, note:- How many nodes are involved
- How many clusters are created
- Which node types appear (controller, scale-out, storage, compute)
- How all nodes connect to the core fabric and external network
-
Trace the core fabric in Terraform
Open
main.tf(the root module) and find the two core fabric network resources. Answer these questions:- What are the resources named? (
core_fabric_1andcore_fabric_2) - What MTU is configured? (9142 — jumbo frames for vSAN replication)
- Is DHCP enabled on these networks? (No —
dhcp_enabled = false) - What
ipaddress_typeis set? (none— these are Layer 2 transports)
# You should find resources like this in main.tf:resource "vergeio_network" "core_fabric_1" {name = "${var.system_name}-core-fabric-1"enabled = truedhcp_enabled = falseon_power_loss = "power_on"mtu = 9142ipaddress_type = "none"} - What are the resources named? (
-
Examine how Node 1 differs from Node 2
Open
modules/controllers/main.tfand compareverge_node_1andverge_node_2. Key differences to identify:- Cloud-init template — Node 1 uses
user-data-node1.yaml(creates a new system withYC_VSAN_NEW=1). Node 2 usesuser-data-node2.yaml(joins the existing system withYC_VSAN_NEW=0). - Post-install API setup — Node 1’s cloud-init includes a script that configures update sources, enables SSH, and optionally creates storage/compute clusters via the VergeOS API. Node 2 has no post-install script.
- Dependency chain — Node 2 has a
depends_onreference to Node 1, ensuring the system is fully initialized before the second controller attempts to join.
Both nodes share the same VM structure: Linux OS family, nested virtualization enabled, three virtio NICs (external, core fabric 1, core fabric 2), CD-ROM with the VergeOS ISO, and a cloud-init nocloud datasource.
- Cloud-init template — Node 1 uses
-
Answer the comprehension questions
Write your answers to the following (or discuss with your training partner):
# Question Expected Answer 1 Why does the core fabric use two separate switches? Redundancy — if one switch or path fails, the other maintains inter-node connectivity 2 Why is DHCP disabled on the core fabric networks? Core fabric uses static IP addressing; the VergeOS installer configures addresses via the install seed 3 Why must Node 2 wait for Node 1 to complete before starting? Node 1 creates the VergeOS system; Node 2 needs an existing system to join 4 What traffic types flow over the core fabric? vSAN replication, cluster coordination, VM live migration, control plane communication 5 Why is quantity_tier_1_disksset to 0 for controllers when storage nodes are enabled?In UCI mode, dedicated storage nodes provide all tier-1 capacity; controllers only need tier-0 for metadata
Part 2: Design Exercise
Section titled “Part 2: Design Exercise”Now apply what you have learned. Given a customer scenario, recommend a deployment topology and justify your decision.
Customer Scenario
Section titled “Customer Scenario”Midwest Manufacturing Co. is migrating from a VMware vSphere environment with 3 ESXi hosts. They currently run 50 VMs (mix of Windows and Linux), have ~10 TB of usable storage, and expect moderate growth over the next 2 years. They have a small IT team (2 people) and want to minimize operational complexity. Budget is constrained.
-
Choose HCI or UCI
Based on the customer profile, which deployment model do you recommend? Consider:
- Team size — A 2-person IT team favors simplicity
- Growth pattern — “Moderate growth” suggests balanced compute/storage scaling
- Budget — HCI requires fewer total nodes than UCI for the same capacity
- Current environment — 3 ESXi hosts maps well to a small HCI cluster
-
Determine the node count and layout
Sketch or describe your proposed topology:
- How many controller nodes? (Minimum 2 for HA)
- Do you need scale-out nodes? (Consider: 50 VMs on 2 nodes may be tight; 2 scale-out nodes give headroom)
- How many clusters? (1 for HCI)
- What about storage capacity? (10 TB usable means ~20 TB raw with replication across nodes)
A reasonable design:
-
Map to a playground example
Which Terraform playground example file most closely matches your design?
Part 3: Topology Comparison
Section titled “Part 3: Topology Comparison”Compare all four example .tfvars files from the examples/ directory. Fill in the comparison table below.
Instructions
Section titled “Instructions”Open each file and identify the configuration values. Use the table to record your findings.
File: examples/2-node-hci.tfvars
- Scenario: 2-Node HCI (Single Cluster)
- Total nodes: 2
- Clusters: 1
- Node types: 2 controllers (storage + compute)
- Toggle variables: None (all defaults)
- Tier-1 disks on controllers: Yes (2 × 1000 GB each)
- Best for: Basic testing, evaluation, smallest possible deployment
File: examples/4-node-hci.tfvars
- Scenario: HCI + Scale-Out (Single Cluster)
- Total nodes: 4
- Clusters: 1
- Node types: 2 controllers + 2 scale-out
- Toggle variables:
create_scale_out_nodes = true - Tier-1 disks on controllers: Yes (2 × 1000 GB each)
- Best for: Larger HCI clusters, testing scale-out behavior, balanced growth
File: examples/4-node-hybrid-hci-2-cluster.tfvars
- Scenario: Hybrid HCI (2 Clusters)
- Total nodes: 4
- Clusters: 2
- Node types: 2 controllers (storage + compute) + 2 compute-only
- Toggle variables:
create_compute_nodes = true - Tier-1 disks on controllers: Yes (controllers provide all storage)
- Best for: Separating compute scaling from storage, adding compute burst capacity
File: examples/6-node-uci-3-cluster.tfvars
- Scenario: UCI (3 Clusters)
- Total nodes: 6
- Clusters: 3
- Node types: 2 controllers + 2 storage-only + 2 compute-only
- Toggle variables:
create_storage_nodes = true,create_compute_nodes = true - Tier-1 disks on controllers: No (storage nodes provide all tier-1 capacity)
- Best for: Production-like UCI, independent scaling of storage and compute, larger environments
Comparison Summary Table
Section titled “Comparison Summary Table”Complete this table as you review each file:
| Attribute | 2-Node HCI | 4-Node HCI | Hybrid 2-Cluster | UCI 3-Cluster |
|---|---|---|---|---|
| Total nodes | 2 | 4 | 4 | 6 |
| Clusters | 1 | 1 | 2 | 3 |
| Controller nodes | 2 | 2 | 2 | 2 |
| Scale-out nodes | 0 | 2 | 0 | 0 |
| Storage-only nodes | 0 | 0 | 0 | 2 |
| Compute-only nodes | 0 | 0 | 2 | 2 |
| Controllers have tier-1 disks? | Yes | Yes | Yes | No |
| Storage scaling | Add HCI nodes | Add HCI nodes | Add controllers | Add storage nodes |
| Compute scaling | Add HCI nodes | Add HCI nodes | Add compute nodes | Add compute nodes |
| Complexity | Low | Low | Medium | High |
| Ideal use case | Small / eval | Mid-size balanced | Compute burst | Large / independent scaling |
Analysis Questions
Section titled “Analysis Questions”After completing the table, consider these questions:
-
Why do controllers in the UCI scenario have zero tier-1 disks?
In UCI, dedicated storage nodes provide all workload storage. Controllers only need tier-0 disks for vSAN metadata. This is visible in
main.tfwherequantity_tier_1_disksis conditionally set to 0 whencreate_storage_nodes = true. -
What is the dependency chain when both storage and compute nodes are enabled?
Controllers → Storage nodes → Compute nodes. The compute module has an explicit
depends_onto the storage module, ensuring the storage cluster exists before compute nodes attempt to join. This mirrors how VergeOS cluster creation works: storage must be available before compute workloads can run. -
How would you modify the 4-node HCI example to support 6 HCI nodes?
Change
quantity_scale_out_nodesfrom2to4. The Terraform module creates additional scale-out nodes sequentially, each joining the same HCI cluster. No additional toggle variables are needed.
Key Takeaways
Section titled “Key Takeaways”After completing this lab, you should be able to:
- ✅ Navigate the VergeOS Terraform playground and understand its structure
- ✅ Identify how core fabric networks, vSAN storage tiers, and node types are expressed in Terraform
- ✅ Explain the differences between the four deployment scenarios (2-node HCI, 4-node HCI, hybrid 2-cluster, UCI 3-cluster)
- ✅ Recommend an appropriate VergeOS topology for a given customer scenario
- ✅ Trace the dependency chain from controllers through optional node types
Next Steps
Section titled “Next Steps”Proceed to Module 2: Sizing & Design to learn how to translate customer requirements into specific hardware configurations and deployment plans.