Module 9: Monitoring & Troubleshooting

Learning Objectives

By the end of this module, you will be able to:

Navigate the VergeOS dashboard for real-time system health monitoring
Configure alerts and log forwarding to proactively detect and respond to issues
Diagnose common issues using system logs, diagnostics, and built-in tools
Know when and how to escalate to Verge support with the right diagnostic information

Completion of Module 1: Architecture Fundamentals — understanding of VergeOS platform architecture
Completion of Module 4: Networking — understanding of VergeOS networking concepts
Completion of Module 5: Storage — understanding of vSAN/VergeFS storage
Familiarity with the VergeOS UI dashboard

3 hours (1.5 hours reading + 1.5 hours lab)

Dashboard & System Health

Navigating the VergeOS dashboard for real-time visibility into node status, resource utilization, and cluster health.

Alerts & Notifications

Configuring system alerts, notification rules, and email/webhook alerting for proactive issue detection.

Log Management & Forwarding

System log access, filtering, search, and syslog forwarding to external SIEM and monitoring platforms.

Prometheus & Grafana

Exposing VergeOS metrics via the built-in Prometheus endpoint and building Grafana dashboards for advanced monitoring.

Diagnostics Toolkit

Built-in diagnostic tools, network troubleshooting utilities, storage health checks, and system diagnostics bundles.

Common Issues & Resolution

Troubleshooting runbooks for the most frequently encountered VergeOS issues across compute, storage, and networking.

Support Escalation

Knowing when to escalate issues, gathering the right diagnostic data, and working effectively with Verge support.

Lab: Monitoring & Troubleshooting

Explore the VergeOS dashboard, configure alerts, analyze logs, and practice diagnosing simulated issues.

Dashboard & System Health — Node status indicators, resource utilization graphs, cluster health overview, storage pool status, and network connectivity monitoring
Alerts & Notifications — Built-in alert types, configuring notification rules, email/webhook alerting, and alert escalation policies
Log Management & Forwarding — Accessing system logs, filtering and searching log entries, syslog forwarding to external SIEM/monitoring systems
Prometheus & Grafana Integration — Built-in Prometheus endpoint, metrics catalog, Grafana dashboard setup, and advanced monitoring workflows
Diagnostics Toolkit — Network troubleshooting utilities, storage health diagnostics, system diagnostics bundles, and node-level tools
Common Issues & Resolution — Troubleshooting runbooks for frequent compute, storage, networking, and tenant issues
Support Escalation — When to escalate, gathering diagnostic bundles, support contact options, and ticket best practices
Lab: Monitoring & Troubleshooting — Hands-on monitoring and troubleshooting exercises using the VergeOS dashboard, alert configuration, and log analysis