Storage Monitoring & Troubleshooting

Why Storage Monitoring Matters

Storage is the foundation of every workload in VergeOS. A degraded drive, a full tier, or an unnoticed integrity error can cascade into VM performance issues, failed snapshots, and tenant complaints. VergeOS provides built-in diagnostic tools at two levels — vSAN diagnostics for the distributed block-level storage engine, and NAS diagnostics for each file-sharing service instance — so you can detect, diagnose, and resolve issues before they impact production.

vSAN Diagnostics

The vSAN Diagnostics interface provides real-time access to the VergeFS storage engine’s internal state. Every vcmd operation can be executed through the UI or via SSH on a controller node.

Accessing vSAN Diagnostics

Navigate to System → vSAN Diagnostics from the top menu
Alternatively: from the Main Dashboard, click the vSAN Tiers count box → vSAN Diagnostics in the left menu
Select a command from the Query dropdown, configure parameters on the right, and click Send →

vcmd Command Reference

The vcmd utility is the CLI interface to the vSAN engine. Every command available in the UI dropdown maps to a vcmd invocation.

Cluster & Performance Commands

Command	CLI Syntax	Purpose
Get Cluster Rates	`vcmd cluster rates`	Cluster-wide throughput and IOPS metrics
Get Cluster Usage	`vcmd cluster usage`	Overall storage utilization and capacity
Get Top Usage Rates	`vcmd usage top-rates`	Identify top consumers of storage I/O
Get Usage	`vcmd usage show`	Comprehensive vSAN usage statistics
Get Cache Info	`vcmd cache info`	Cache hit/miss ratios and memory usage
Get Read Ahead	`vcmd readahead status`	Read-ahead caching configuration and stats
Get Current Master	`vcmd master current`	Identify the current vSAN master node

Device & Node Commands

Command	CLI Syntax	Purpose
Get Device List	`vcmd devices list`	All storage devices in the vSAN
Get Device Status	`vcmd device status [ID]`	Health and state of a specific device
Get Device Usage	`vcmd device usage [ID]`	Per-device utilization and wear data
Get Node List	`vcmd nodes list`	All nodes participating in the vSAN
Get Node Info	`vcmd node info [ID]`	Detailed info for a specific node
Get Node Device List	`vcmd node devices [ID]`	Devices attached to a specific node

Tier & Volume Commands

Command	CLI Syntax	Purpose
Get Tier Status	`vcmd tier status`	Health and capacity per storage tier
Get Tier Device Maps	`vcmd tier device-maps`	How devices map across tiers
Get Tier Node Maps	`vcmd tier node-maps`	How tiers distribute across nodes
Get Volume Usage	`vcmd volume usage [ID]`	Per-volume storage consumption
Summarize Disk Usage	`vcmd usage summarize`	Cluster-wide disk usage summary

Integrity & Repair Commands

Command	CLI Syntax	Purpose
Integ Check	`vcmd integcheck start`	Start a full integrity check
Integ Check Device	`vcmd integcheck device [ID]`	Integrity check on a specific device
Get Integ Check Status	`vcmd integcheck status`	Progress/results of integrity checks
Get Repair Status	`vcmd repair status`	Active repair and rebuild operations
Get Sync List	`vcmd sync list`	Synchronization operation status

File System & Configuration Commands

Command	CLI Syntax	Purpose
Find Inode	`vcmd find --inode=[NUM]`	Locate a specific inode for analysis
Get Path from Inode	`vcmd path from-inode [NUM]`	Resolve an inode number to a path
Get File Status	`vcmd file status [PATH]`	Replication and integrity of a file
Get Fuse Info	`vcmd fuse info`	FUSE mount and operation details
Get Journal Status	`vcmd journal status`	Write-ahead journal state
Get Running Conf	`vcmd config show`	Current running vSAN configuration
Get Clients	`vcmd clients list`	Active vSAN client connections

Health Monitoring Workflow

Follow this systematic approach when investigating storage health — start broad and drill down to specific components.

Step-by-Step Workflow

System Overview — Run Get Cluster Usage and Get Cluster Rates to understand overall health, capacity, and throughput
Performance Analysis — Check Get Top Usage Rates to find hot volumes, then Get Cache Info for cache hit ratios
Health Assessment — Verify Get Repair Status shows all zeros (no active repairs) and Get Integ Check Status for recent results
Capacity Planning — Use Summarize Disk Usage for a cluster-wide view and Get Tier Status for per-tier capacity
Targeted Troubleshooting — Drill into specific devices (Get Device Status) or nodes (Get Node Info) based on findings

Troubleshooting Patterns

Performance Issues

Symptoms: High VM latency, slow snapshot operations, tenant complaints about disk speed.

Check Get Cluster Rates for aggregate throughput — are rates lower than baseline?
Review Get Cache Info — low hit ratios indicate working set exceeds available cache
Examine Get Top Usage Rates to identify which volumes are consuming the most I/O
Check Get Device Usage on individual drives for uneven load distribution
Verify Get Journal Status — a backed-up journal indicates sustained write pressure

Capacity Issues

Symptoms: “Low space” alerts, inability to create snapshots, slow writes due to throttling.

Run Get Cluster Usage and Summarize Disk Usage for overall space analysis
Check Get Tier Status for per-tier capacity — a single full tier can cause issues even if others have space
Use Get Volume Usage on the largest volumes to identify growth candidates
Review snapshot retention policies — old snapshots referencing changed blocks consume significant space

Data Integrity Concerns

Symptoms: Checksum warnings in logs, suspected corruption after hardware events.

Check Get Integ Check Status for recent integrity check results
Review Get Repair Status for any active data reconstruction
Use Get File Status on specific files to verify their replication state
If needed, run Integ Check to initiate a full scan (schedule during maintenance windows)

Cluster Health Issues

Symptoms: Node offline alerts, unexpected master failover, split-brain concerns.

Verify Get Current Master to confirm which node leads the vSAN
Check Get Node List and Get Node Info for each node’s status
Review Get Sync List for out-of-sync replicas that indicate a node was temporarily disconnected
Examine Get Clients for unexpected connection patterns

Device Problems

Symptoms: SMART warnings, individual drive errors, uneven performance across nodes.

Run Get Device List and Get Device Status to identify degraded or failed devices
Check Get Node Device List for the affected node’s full device inventory
Run Integ Check Device on suspected drives
Cross-reference with SMART data in the System Diagnostics bundle (smart/*.txt files)

Preventive Maintenance

Proactive monitoring prevents surprises. Establish a regular cadence for these checks:

Daily

Review cluster usage and tier capacity - Check for active repairs (should be zero in steady state) - Verify no storage-related alerts in the dashboard

Weekly

Run cluster rates to establish performance baselines - Review top usage rates for growth trends - Check cache hit ratios and tune if needed

Monthly

Schedule integrity checks during maintenance windows - Review device health and SMART data - Analyze capacity trends for procurement planning

Fibre Channel Integration

VergeOS vSAN supports Fibre Channel (FC) LUNs as storage devices within its tiered architecture, enabling integration with existing SAN infrastructure.

Key Principles

FC LUNs are treated identically to physical disks — VergeOS makes no distinction once they are assigned to a tier
Each node must receive its own unique LUNs — do not present the same LUN to multiple nodes (unlike traditional shared-storage clustering)
Tier 0 still requires physical NVMe/SSD drives in the nodes for metadata storage
Disable RAID and automatic tiering on the SAN for LUNs used by VergeOS — vSAN handles redundancy natively
Active/passive multipath with 7-second failover timeout; VergeOS automatically selects the optimal path

When to Use FC vs. Physical Disks

Verge recommends physical disks directly attached to nodes for most deployments. FC integration is appropriate when you have existing SAN investments or specific compliance requirements. Physical disks provide simpler configuration, better performance (no SAN overhead), fewer failure points, and lower cost.

NAS Diagnostics

Each NAS service instance has its own diagnostic interface, separate from vSAN diagnostics. NAS diagnostics focus on file-sharing protocols, network connectivity, and authentication.

Accessing NAS Diagnostics

Navigate to NAS → List from the top menu
Double-click the desired NAS service
Click Diagnostics in the left menu
Select a command from the Diagnostics Query dropdown and click Send →

Key NAS Diagnostic Commands

Category	Commands	Purpose
SMB/CIFS	`smbstatus`, `testparm`, `smbclient -L localhost`	Active connections, config validation, share listing
NFS	`exportfs -v`, `rpcinfo -p`, `showmount -e`	Export list, RPC services, mount verification
Active Directory	`wbinfo -t`, `wbinfo -u`, `wbinfo -g`	Trust check, domain users, domain groups
Network	Ping, Trace Route, ARP Scan, DNS Lookup	Connectivity, routing, neighbor discovery
Performance	Top CPU Usage, Top Network Usage, TCP Dump	Resource utilization and traffic analysis
Logging	Logs (journalctl, samba logs)	Error analysis and event monitoring

NAS Health Monitoring Workflow

Service Status — Check Services and protocol-specific status (Samba/NFS)
Network Connectivity — Verify with Ping and interface configuration
Authentication — Test Users, Groups, and Winbind (for AD environments)
Performance — Monitor Top CPU Usage and Top Network Usage
Logs — Review service logs for errors or warnings

Common NAS Issues & Resolutions

Windows: Unable to Connect to CIFS Shares

Cause: Windows 10/11 and Server 2016+ disable insecure guest logons by default, blocking access to shares that allow anonymous access.

Fix: Enable insecure guest logons via Group Policy:

Open gpedit.msc → Computer Configuration → Administrative Templates → Network → Lanman Workstation
Set Enable insecure guest logons to Enabled
Restart the Windows device

macOS: SMB Connection Failures

Cause: macOS defaults may conflict with the server’s SMB protocol version or lack Apple-specific extensions.

Fix (client-side): Force SMB3 in /etc/nsmb.conf:

[default]
smb_neg=smb3_only

Clear the SMB cache (sudo rm -rf /var/db/samba/* /var/db/smb/*) and restart.

Fix (server-side): Add Apple/Samba fruit module settings under NAS → CIFS → Advanced Configuration Options:

ea support = yes
vfs objects = fruit streams_xattr
fruit:metadata = stream
fruit:model = MacSamba
fruit:veto_appledouble = no
fruit:nfs_aces = no
fruit:posix_rename = yes
fruit:zero_file_id = yes
fruit:wipe_intentionally_left_blank_rfork = yes
fruit:delete_empty_adfiles = yes

Permission Denied on CIFS Shares

Cause: Incorrect user/group permissions or share configuration.

Fix:

Verify user has access in the NAS service’s user list
Check share settings: ensure Browseable is enabled and the user is in the Valid Users list
If using Force User or Force Group options, confirm the forced identity has the correct filesystem permissions

Slow CIFS Performance

Cause: SMB protocol version mismatch, network issues, or suboptimal configuration.

Fix:

Verify network stability with the NAS Diagnostics Ping and Trace Route commands
Check the SMB protocol version under NAS → Volumes → Advanced Configuration Options — ensure SMB2 or SMB3 is in use
Monitor with Top Network Usage in NAS Diagnostics for bandwidth saturation
Contact VergeOS Support for advanced Samba tuning parameters if standard optimizations are insufficient

VMware Bridge

Nutanix Bridge

Getting Started

Explore vSAN Diagnostics

Navigate to System → vSAN Diagnostics, run Get Cluster Usage and Get Cluster Rates to establish your baseline. Enable “Show Command” to learn the vcmd syntax.

Check NAS Health

Open each NAS service’s Diagnostics page. Run Samba to see active connections and NFS to verify exports. Review Logs for recent errors.

Review the Diagnostics Guide

Read the full vSAN Diagnostics Guide and NAS Diagnostics Guide in the official docs.