Skip to content

Module 9: Monitoring & Troubleshooting

By the end of this module, you will be able to:

  • Navigate the VergeOS dashboard for real-time system health monitoring
  • Configure alerts and log forwarding to proactively detect and respond to issues
  • Diagnose common issues using system logs, diagnostics, and built-in tools
  • Know when and how to escalate to Verge support with the right diagnostic information

3 hours (1.5 hours reading + 1.5 hours lab)

Dashboard & System Health

Navigating the VergeOS dashboard for real-time visibility into node status, resource utilization, and cluster health.

Alerts & Notifications

Configuring system alerts, notification rules, and email/webhook alerting for proactive issue detection.

Log Management & Forwarding

System log access, filtering, search, and syslog forwarding to external SIEM and monitoring platforms.

Prometheus & Grafana

Exposing VergeOS metrics via the built-in Prometheus endpoint and building Grafana dashboards for advanced monitoring.

Diagnostics Toolkit

Built-in diagnostic tools, network troubleshooting utilities, storage health checks, and system diagnostics bundles.

Common Issues & Resolution

Troubleshooting runbooks for the most frequently encountered VergeOS issues across compute, storage, and networking.

Support Escalation

Knowing when to escalate issues, gathering the right diagnostic data, and working effectively with Verge support.

Lab: Monitoring & Troubleshooting

Explore the VergeOS dashboard, configure alerts, analyze logs, and practice diagnosing simulated issues.

  1. Dashboard & System Health — Node status indicators, resource utilization graphs, cluster health overview, storage pool status, and network connectivity monitoring
  2. Alerts & Notifications — Built-in alert types, configuring notification rules, email/webhook alerting, and alert escalation policies
  3. Log Management & Forwarding — Accessing system logs, filtering and searching log entries, syslog forwarding to external SIEM/monitoring systems
  4. Prometheus & Grafana Integration — Built-in Prometheus endpoint, metrics catalog, Grafana dashboard setup, and advanced monitoring workflows
  5. Diagnostics Toolkit — Network troubleshooting utilities, storage health diagnostics, system diagnostics bundles, and node-level tools
  6. Common Issues & Resolution — Troubleshooting runbooks for frequent compute, storage, networking, and tenant issues
  7. Support Escalation — When to escalate, gathering diagnostic bundles, support contact options, and ticket best practices
  8. Lab: Monitoring & Troubleshooting — Hands-on monitoring and troubleshooting exercises using the VergeOS dashboard, alert configuration, and log analysis