Skip to content

vSAN / VergeFS: Software-Defined Storage

vSAN (Virtual Storage Area Network), also known as VergeFS, is the software-defined distributed storage system built into every VergeOS deployment. It pools the physical (or virtual) drives across all storage-participating nodes into a single, shared storage resource for the entire system.

There is no external SAN, NAS, or third-party storage software required. vSAN is integrated directly into the VergeOS kernel and operates at the block level, providing storage for all VM disks, snapshots, ISO images, and system metadata.

Key characteristics:

  • Block-level architecture — VM disks are divided into blocks, each identified by a cryptographic hash
  • Distributed across nodes — Data blocks are spread across all storage-participating nodes in the cluster
  • Tiered storage — Up to 6 tiers (0–5) let you match media type to workload requirements
  • Inline deduplication — Hash-based block identification enables automatic deduplication across all tiers
  • Self-healing — Automatic failure detection, failover, and data rebuild without manual intervention

VergeOS vSAN organizes drives into tiers numbered 0 through 5. Each tier is designed for a different class of storage media and workload profile. During installation, each physical drive is assigned to a specific tier, and that assignment determines how the drive is used by the system.

  • Hardware: High-endurance NVMe SSDs
  • Purpose: Stores the vSAN filesystem index and internal metadata exclusively
  • Key requirement: Every node that participates in storage must have at least one tier-0 drive
  • Best practice: Use the highest-endurance NVMe drives available; maintain at least 10% free space

Tier 0 is not used for workload data. It holds the hash map that tracks every data block’s location, redundancy state, and version. Because every read and write operation begins with a metadata lookup, tier-0 performance directly impacts overall system responsiveness.

TierHardwarePurposeTypical Use Cases
Tier 1High-endurance NVMe SSDsWrite-intensive workloadsHigh-performance databases, transaction logs
Tier 2Mid-range SSDsBalanced read/write workloadsGeneral-purpose VMs, mixed applications, dev environments
Tier 3Read-optimized SSDsRead-intensive workloadsContent delivery, application repos, reference data
Tier 4High-capacity HDDsLess frequently accessed dataFile servers, backup targets
Tier 5Archival-grade HDDsCold storage and long-term retentionCompliance archives, backup archives

Not every deployment uses all five workload tiers. A common production configuration might use only tier 1 (NVMe for performance-sensitive workloads) and tier 4 (HDD for capacity). The Terraform playground uses tier 0 and tier 1 only.

vSAN uses a hash-based distribution algorithm to spread data blocks across all nodes in the cluster. Here is how it works:

  1. When a VM writes data, vSAN divides the write into data blocks
  2. Each block is assigned a cryptographic hash that serves as its unique identifier
  3. The hash determines both the block’s storage location and enables deduplication — if two blocks produce the same hash, only one copy is stored

Data blocks are distributed across multiple nodes in the cluster rather than stored on a single node. This design provides:

  • Balanced performance — I/O load is spread across all storage-participating nodes
  • Fault tolerance — No single node holds all copies of any dataset
  • Efficient scaling — Adding a node automatically expands the storage pool and triggers rebalancing

Reads:

  • The system looks up the block’s location via the tier-0 hash map
  • Reads prioritize the primary copy for efficiency
  • If the VM is running on the same node as a redundant copy, vSAN reads the local copy to minimize network traffic
  • If the primary copy is slow or unresponsive, vSAN automatically fails over to the redundant copy

Writes:

  • New blocks are hashed and placed on the optimal node
  • Both the primary and redundant copies are written simultaneously
  • Write is only acknowledged after both copies are confirmed
  • The tier-0 metadata is updated to track the new block’s location

vSAN maintains multiple copies of every data block to protect against hardware failures. The redundancy level is configured at the system level and applies per tier.

FeatureN+1 (RF2) — DefaultN+2 (RF3)
Copies of data23
Simultaneous failures tolerated1 node2 nodes
Minimum controller nodes23
Recommended nodes35
Storage overhead (before dedup)~2×~3×
  • N+1 (RF2) is the default and is suitable for most production environments
  • N+2 (RF3) is available for ultra-critical workloads or remote sites where hardware replacement is slow
  • Redundancy level is typically set during installation and applies system-wide
  • A failure only affects the tier where the failed drives reside — other tiers remain fully operational

When a node or drive fails, vSAN automatically begins recovery:

  1. Detection — vSAN detects the failure automatically
  2. Failover — Reads and writes are redirected to redundant copies with no VM downtime
  3. Rebuild — Missing data blocks are re-replicated to remaining healthy nodes
  4. Restoration — Full redundancy is restored without manual intervention

During VergeOS installation, each physical drive is assigned to a specific vSAN tier. The installer uses the YC_DRIVE_LIST and YC_VSAN_TIER_LIST variables (whether set interactively or via a seed file) to map drives to tiers.

  • Every storage-participating node needs at least one tier-0 drive for metadata
  • Drives within the same tier should be of similar type and performance characteristics
  • When scaling up (adding drives), you must add equal drives across all nodes in the cluster to maintain balanced distribution
  • When scaling out (adding nodes), new nodes must match the existing cluster’s drive configuration

In the Terraform playground’s simplest deployment, each controller node has:

DriveTierPurpose
1× NVMe (small)Tier 0Metadata — vSAN hash map and filesystem index
1× NVMe (large)Tier 1Workload data — VM disks, snapshots, ISOs

Both nodes contribute their drives to the same vSAN pool. With N+1 redundancy (default), every block written to tier 1 on node 1 has a redundant copy on node 2, and vice versa.

Because every data block is identified by its cryptographic hash, vSAN automatically detects duplicate blocks. If two VMs (or two regions within the same VM disk) write identical data, only one copy of that block is stored. This operates inline — during the write path — with no separate deduplication job or schedule.

vSAN supports AES-256 encryption at rest, configured during initial installation. Encryption keys can be stored on USB drives (plugged into the first two controller nodes) or entered manually at boot time. All data across all tiers is encrypted transparently.

vSAN’s block-level architecture enables space-efficient snapshots — a snapshot records the hash map state at a point in time rather than copying data blocks. Clones similarly reference existing blocks, only consuming additional space when data diverges.

ConceptSummary
vSAN / VergeFSBuilt-in distributed storage — no external SAN/NAS required
Tier 0Metadata only (NVMe). Required on every storage node.
Tiers 1–5Workload data, from high-performance NVMe to archival HDD
Data distributionHash-based, spread across all storage nodes
RedundancyN+1 (2 copies, default) or N+2 (3 copies) — system-wide per tier
Self-healingAutomatic failover and rebuild on failure
DeduplicationInline, hash-based, across all tiers
CompressionNot at rest — only during site-sync replication

Now that you understand how VergeOS stores data, the next topic covers the network fabric that connects all nodes and carries vSAN replication traffic: Core Fabric & Networking →