Metrics and Observability
The Orkes Conductor Dashboard gives a quick overview of the metrics & alerts on your Conductor console. It provides a centralized intuitive interface to track and get insights on the behavior and performance of tasks and workflows that can aid in troubleshooting errors.
Orkes Conductor uses the popular platform Prometheus for recording a rich set of metrics that will be available automatically in your deployment and pushes the metrics to Grafana/Datadog on request to dedicated clusters.
Accessing Dashboard from Conductor Console
In this document, we’ve included a sample dashboard set using Prometheus & Grafana.
- To access your dashboard, navigate to Metrics from your Conductor cluster. If you cannot see this option on your Conductor cluster, please reach out to our team.
- It takes you to the Conductor dashboard set using Grafana. A sample one looks like this:
Conductor Metrics
The server publishes the following metrics. You can use these metrics to configure alerts for your workflows and tasks.
Workflow and Task Metrics
Metrics | Sample Visualization | Purpose | Tags |
---|---|---|---|
Workflow Latencies (Name and Percentile) workflow_completed_seconds | Timer indicating the time taken for completing the workflows. | workflowName, quantile | |
Workflow completion/sec workflow_completed_seconds_count | Counter indicating the number of workflows completed per second. | workflowName | |
Workflow failures/sec workflow_completed_seconds_count (Ensure to add the filter "FAILED" to get the failed list) | Counter indicating the number of workflows failed per second. | workflowName | |
No of workflows currently Running workflow_running | Gauge for the number of running workflows. | workflowName | |
Workflow Start Rate/sec workflow_start_request_seconds_count | Counter for no. of workflows started. | workflowName | |
Total no. of workflows started in the time period workflow_start_request_seconds_count | Counter for no. of workflows started in the time period. | workflowName | |
Workflow Search Latency Percentile http_server_requests_seconds | Indicates the latency values for the search operation in workflows. | quantile | |
Task Latencies (Name and Percentile) task_completed_seconds | Timer for completing the tasks. | taskType, quantile | |
Task completion/sec task_completed_seconds_count | Counter indicating the number of completed tasks per second. | taskType | |
Task failures/sec task_completed_seconds_count | Counter indicating the number of failed tasks per second. | taskType |