Skip to content

Storage Monitoring & Troubleshooting

Storage is the foundation of every workload in VergeOS. A degraded drive, a full tier, or an unnoticed integrity error can cascade into VM performance issues, failed snapshots, and tenant complaints. VergeOS provides built-in diagnostic tools at two levels — vSAN diagnostics for the distributed block-level storage engine, and NAS diagnostics for each file-sharing service instance — so you can detect, diagnose, and resolve issues before they impact production.

The vSAN Diagnostics interface provides real-time access to the VergeFS storage engine’s internal state. Every vcmd operation can be executed through the UI or via SSH on a controller node.

  1. Navigate to System → vSAN Diagnostics from the top menu
  2. Alternatively: from the Main Dashboard, click the vSAN Tiers count box → vSAN Diagnostics in the left menu
  3. Select a command from the Query dropdown, configure parameters on the right, and click Send →

The vcmd utility is the CLI interface to the vSAN engine. Every command available in the UI dropdown maps to a vcmd invocation.

CommandCLI SyntaxPurpose
Get Cluster Ratesvcmd cluster ratesCluster-wide throughput and IOPS metrics
Get Cluster Usagevcmd cluster usageOverall storage utilization and capacity
Get Top Usage Ratesvcmd usage top-ratesIdentify top consumers of storage I/O
Get Usagevcmd usage showComprehensive vSAN usage statistics
Get Cache Infovcmd cache infoCache hit/miss ratios and memory usage
Get Read Aheadvcmd readahead statusRead-ahead caching configuration and stats
Get Current Mastervcmd master currentIdentify the current vSAN master node
CommandCLI SyntaxPurpose
Get Device Listvcmd devices listAll storage devices in the vSAN
Get Device Statusvcmd device status [ID]Health and state of a specific device
Get Device Usagevcmd device usage [ID]Per-device utilization and wear data
Get Node Listvcmd nodes listAll nodes participating in the vSAN
Get Node Infovcmd node info [ID]Detailed info for a specific node
Get Node Device Listvcmd node devices [ID]Devices attached to a specific node
CommandCLI SyntaxPurpose
Get Tier Statusvcmd tier statusHealth and capacity per storage tier
Get Tier Device Mapsvcmd tier device-mapsHow devices map across tiers
Get Tier Node Mapsvcmd tier node-mapsHow tiers distribute across nodes
Get Volume Usagevcmd volume usage [ID]Per-volume storage consumption
Summarize Disk Usagevcmd usage summarizeCluster-wide disk usage summary
CommandCLI SyntaxPurpose
Integ Checkvcmd integcheck startStart a full integrity check
Integ Check Devicevcmd integcheck device [ID]Integrity check on a specific device
Get Integ Check Statusvcmd integcheck statusProgress/results of integrity checks
Get Repair Statusvcmd repair statusActive repair and rebuild operations
Get Sync Listvcmd sync listSynchronization operation status
CommandCLI SyntaxPurpose
Find Inodevcmd find --inode=[NUM]Locate a specific inode for analysis
Get Path from Inodevcmd path from-inode [NUM]Resolve an inode number to a path
Get File Statusvcmd file status [PATH]Replication and integrity of a file
Get Fuse Infovcmd fuse infoFUSE mount and operation details
Get Journal Statusvcmd journal statusWrite-ahead journal state
Get Running Confvcmd config showCurrent running vSAN configuration
Get Clientsvcmd clients listActive vSAN client connections

Follow this systematic approach when investigating storage health — start broad and drill down to specific components.

  1. System Overview — Run Get Cluster Usage and Get Cluster Rates to understand overall health, capacity, and throughput
  2. Performance Analysis — Check Get Top Usage Rates to find hot volumes, then Get Cache Info for cache hit ratios
  3. Health Assessment — Verify Get Repair Status shows all zeros (no active repairs) and Get Integ Check Status for recent results
  4. Capacity Planning — Use Summarize Disk Usage for a cluster-wide view and Get Tier Status for per-tier capacity
  5. Targeted Troubleshooting — Drill into specific devices (Get Device Status) or nodes (Get Node Info) based on findings

Symptoms: High VM latency, slow snapshot operations, tenant complaints about disk speed.

  1. Check Get Cluster Rates for aggregate throughput — are rates lower than baseline?
  2. Review Get Cache Info — low hit ratios indicate working set exceeds available cache
  3. Examine Get Top Usage Rates to identify which volumes are consuming the most I/O
  4. Check Get Device Usage on individual drives for uneven load distribution
  5. Verify Get Journal Status — a backed-up journal indicates sustained write pressure

Symptoms: “Low space” alerts, inability to create snapshots, slow writes due to throttling.

  1. Run Get Cluster Usage and Summarize Disk Usage for overall space analysis
  2. Check Get Tier Status for per-tier capacity — a single full tier can cause issues even if others have space
  3. Use Get Volume Usage on the largest volumes to identify growth candidates
  4. Review snapshot retention policies — old snapshots referencing changed blocks consume significant space

Symptoms: Checksum warnings in logs, suspected corruption after hardware events.

  1. Check Get Integ Check Status for recent integrity check results
  2. Review Get Repair Status for any active data reconstruction
  3. Use Get File Status on specific files to verify their replication state
  4. If needed, run Integ Check to initiate a full scan (schedule during maintenance windows)

Symptoms: Node offline alerts, unexpected master failover, split-brain concerns.

  1. Verify Get Current Master to confirm which node leads the vSAN
  2. Check Get Node List and Get Node Info for each node’s status
  3. Review Get Sync List for out-of-sync replicas that indicate a node was temporarily disconnected
  4. Examine Get Clients for unexpected connection patterns

Symptoms: SMART warnings, individual drive errors, uneven performance across nodes.

  1. Run Get Device List and Get Device Status to identify degraded or failed devices
  2. Check Get Node Device List for the affected node’s full device inventory
  3. Run Integ Check Device on suspected drives
  4. Cross-reference with SMART data in the System Diagnostics bundle (smart/*.txt files)

Proactive monitoring prevents surprises. Establish a regular cadence for these checks:

Daily

  • Review cluster usage and tier capacity - Check for active repairs (should be zero in steady state) - Verify no storage-related alerts in the dashboard

Weekly

  • Run cluster rates to establish performance baselines - Review top usage rates for growth trends - Check cache hit ratios and tune if needed

Monthly

  • Schedule integrity checks during maintenance windows - Review device health and SMART data - Analyze capacity trends for procurement planning

VergeOS vSAN supports Fibre Channel (FC) LUNs as storage devices within its tiered architecture, enabling integration with existing SAN infrastructure.

  • FC LUNs are treated identically to physical disks — VergeOS makes no distinction once they are assigned to a tier
  • Each node must receive its own unique LUNs — do not present the same LUN to multiple nodes (unlike traditional shared-storage clustering)
  • Tier 0 still requires physical NVMe/SSD drives in the nodes for metadata storage
  • Disable RAID and automatic tiering on the SAN for LUNs used by VergeOS — vSAN handles redundancy natively
  • Active/passive multipath with 7-second failover timeout; VergeOS automatically selects the optimal path

Verge recommends physical disks directly attached to nodes for most deployments. FC integration is appropriate when you have existing SAN investments or specific compliance requirements. Physical disks provide simpler configuration, better performance (no SAN overhead), fewer failure points, and lower cost.

Each NAS service instance has its own diagnostic interface, separate from vSAN diagnostics. NAS diagnostics focus on file-sharing protocols, network connectivity, and authentication.

  1. Navigate to NAS → List from the top menu
  2. Double-click the desired NAS service
  3. Click Diagnostics in the left menu
  4. Select a command from the Diagnostics Query dropdown and click Send →
CategoryCommandsPurpose
SMB/CIFSsmbstatus, testparm, smbclient -L localhostActive connections, config validation, share listing
NFSexportfs -v, rpcinfo -p, showmount -eExport list, RPC services, mount verification
Active Directorywbinfo -t, wbinfo -u, wbinfo -gTrust check, domain users, domain groups
NetworkPing, Trace Route, ARP Scan, DNS LookupConnectivity, routing, neighbor discovery
PerformanceTop CPU Usage, Top Network Usage, TCP DumpResource utilization and traffic analysis
LoggingLogs (journalctl, samba logs)Error analysis and event monitoring
  1. Service Status — Check Services and protocol-specific status (Samba/NFS)
  2. Network Connectivity — Verify with Ping and interface configuration
  3. Authentication — Test Users, Groups, and Winbind (for AD environments)
  4. Performance — Monitor Top CPU Usage and Top Network Usage
  5. Logs — Review service logs for errors or warnings

Cause: Windows 10/11 and Server 2016+ disable insecure guest logons by default, blocking access to shares that allow anonymous access.

Fix: Enable insecure guest logons via Group Policy:

  1. Open gpedit.mscComputer Configuration → Administrative Templates → Network → Lanman Workstation
  2. Set Enable insecure guest logons to Enabled
  3. Restart the Windows device

Cause: macOS defaults may conflict with the server’s SMB protocol version or lack Apple-specific extensions.

Fix (client-side): Force SMB3 in /etc/nsmb.conf:

[default]
smb_neg=smb3_only

Clear the SMB cache (sudo rm -rf /var/db/samba/* /var/db/smb/*) and restart.

Fix (server-side): Add Apple/Samba fruit module settings under NAS → CIFS → Advanced Configuration Options:

ea support = yes
vfs objects = fruit streams_xattr
fruit:metadata = stream
fruit:model = MacSamba
fruit:veto_appledouble = no
fruit:nfs_aces = no
fruit:posix_rename = yes
fruit:zero_file_id = yes
fruit:wipe_intentionally_left_blank_rfork = yes
fruit:delete_empty_adfiles = yes

Cause: Incorrect user/group permissions or share configuration.

Fix:

  1. Verify user has access in the NAS service’s user list
  2. Check share settings: ensure Browseable is enabled and the user is in the Valid Users list
  3. If using Force User or Force Group options, confirm the forced identity has the correct filesystem permissions

Cause: SMB protocol version mismatch, network issues, or suboptimal configuration.

Fix:

  1. Verify network stability with the NAS Diagnostics Ping and Trace Route commands
  2. Check the SMB protocol version under NAS → Volumes → Advanced Configuration Options — ensure SMB2 or SMB3 is in use
  3. Monitor with Top Network Usage in NAS Diagnostics for bandwidth saturation
  4. Contact VergeOS Support for advanced Samba tuning parameters if standard optimizations are insufficient

Explore vSAN Diagnostics

Navigate to System → vSAN Diagnostics, run Get Cluster Usage and Get Cluster Rates to establish your baseline. Enable “Show Command” to learn the vcmd syntax.

Check NAS Health

Open each NAS service’s Diagnostics page. Run Samba to see active connections and NFS to verify exports. Review Logs for recent errors.