Orkes logo image
Product
Platform
Orkes Platform thumbnail
Orkes Platform
Orkes Agentic Workflows
Orkes Conductor Vs Conductor OSS thumbnail
Orkes vs. Conductor OSS
Orkes Cloud
How Orkes Powers Boat Thumbnail
How Orkes Powers BOAT
Try enterprise Orkes Cloud for free
Enjoy a free 14-day trial with all enterprise features
Start for free
Capabilities
Microservices Workflow Orchestration icon
Microservices Workflow Orchestration
Enable faster development cycles, easier maintenance, and improved user experiences.
Realtime API Orchestration icon
Realtime API Orchestration
Enable faster development cycles, easier maintenance, and improved user experiences.
Event Driven Architecture icon
Event Driven Architecture
Create durable workflows that promote modularity, flexibility, and responsiveness.
Human Workflow Orchestration icon
Human Workflow Orchestration
Seamlessly insert humans in the loop of complex workflows.
Process orchestration icon
Process Orchestration
Visualize end-to-end business processes, connect people, processes and systems, and monitor performance to resolve issues in real-time
Use Cases
By Industry
Financial Services icon
Financial Services
Secure and comprehensive workflow orchestration for financial services
Media and Entertainment icon
Media and Entertainment
Enterprise grade workflow orchestration for your media pipelines
Telecommunications icon
Telecommunications
Future proof your workflow management with workflow orchestration
Healthcare icon
Healthcare
Revolutionize and expedite patient care with workflow orchestration for healthcare
Shipping and logistics icon
Shipping and Logistics
Reinforce your inventory management with durable execution and long running workflows
Software icon
Software
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean leo mauris, laoreet interdum sodales a, mollis nec enim.
Docs
Developers
Learn
Blog
Explore our blog for insights into the latest trends in workflow orchestration, real-world use cases, and updates on how our solutions are transforming industries.
Read blogs
Check out our latest blog:
Conductor CLI Guide: Register, Run, Retry, and Recover Durable Workflows Without Leaving Your Terminal đź’»
Customers
Discover how leading companies are using Orkes to accelerate development, streamline operations, and achieve remarkable results.
Read case studies
Our latest case study:
Twilio Case Study Thumbnail
Orkes Academy New!
Master workflow orchestration with hands-on labs, structured learning paths, and certification. Build production-ready workflows from fundamentals to Agentic AI.
Explore courses
Featured course:
Orkes Academy Thumbnail
Events icon
Events
Videos icons
Videos
In the news icon
In the News
Whitepapers icon
Whitepapers
About us icon
About Us
Pricing
Get a demo
Signup
Slack FaviconDiscourse Logo icon
Get a demo
Signup
Slack FaviconDiscourse Logo icon
Orkes logo image

Company

Platform
Careers
HIRING!
Partners
About Us
Legal Hub
Security

Product

Cloud
Platform
Support

Community

Docs
Blogs
Events

Use Cases

Microservices Workflow Orchestration
Realtime API Orchestration
Event Driven Architecture
Agentic Workflows
Human Workflow Orchestration
Process Orchestration

Compare

Orkes vs Camunda
Orkes vs BPMN
Orkes vs LangChain
Orkes vs Temporal
Twitter or X Socials linkLinkedIn Socials linkYouTube Socials linkSlack Socials linkGithub Socials linkFacebook iconInstagram iconTik Tok icon
© 2026 Orkes. All Rights Reserved.
Back to Blogs

Table of Contents

Share on:Share on LinkedInShare on FacebookShare on Twitter
Worker Code Illustration

Get Started for Free with Dev Edition

Signup
Back to Blogs
PRODUCT

Task-Level Resilience in Orkes Conductor: Timeouts and Retries in Action

Karl Goeltner
Karl Goeltner
Software Engineer
Last updated: May 12, 2025
May 12, 2025
5 min read

Related Blogs

Fail Fast, Recover Smart: Timeouts, Retries, and Recovery in Orkes Conductor

May 12, 2025

Fail Fast, Recover Smart: Timeouts, Retries, and Recovery in Orkes Conductor

Workflow-Level Resilience in Orkes Conductor: Timeouts and Failure Workflows

May 12, 2025

Workflow-Level Resilience in Orkes Conductor: Timeouts and Failure Workflows

Control the Flow: Building Dynamic Workflows with Orkes Operators

Apr 28, 2025

Control the Flow: Building Dynamic Workflows with Orkes Operators

Ready to Build Something Amazing?

Join thousands of developers building the future with Orkes.

Start for free

In distributed systems, individual task failure is not a matter of if, but when. APIs go down, services stall, and workers disappear. What matters is how your system responds. With Orkes Conductor, you don’t just handle these failures—you design for them.

Conductor provides fine-grained control over how each task behaves under failure. With customizable timeouts and retries, you can recover from transient issues without human intervention, ensure critical steps don’t hang indefinitely, and build workflows that fail gracefully instead of catastrophically.

In this blog, we’ll explore three core capabilities that enable resilient task execution in Orkes Conductor:

  • Task Retries
  • Task Timeouts
  • System Task Timeouts

Task retries: Recovering from flaky failures

One of the most common failure scenarios is a transient error—momentary service unavailability, network hiccups, or throttling by an external API. Conductor lets you retry failed tasks automatically, using configurable backoff strategies to avoid overwhelming downstream services.

Retry parameters

ParameterDescription
retryCountThe maximum number of retry attempts. Default is 3.
retryLogicRetry strategy for the tasks. Supports:
  • FIXED–Retries after a fixed interval defined by retryDelaySeconds.
  • LINEAR_BACKOFF–Retries occur with a delay that increases linearly based on retryDelaySeconds x backoffScaleFactor x attempt_number.
  • EXPONENTIAL_BACKOFF–Retries occur with a delay that increases exponentially based on retryDelaySeconds x (backoffScaleFactor ^ attempt_number.
retryDelaySecondsThe delay between retries. This combines with the backoff logic to calculate the actual wait time.
Note: The actual duration depends on the retry policy set in retryLogic.
backoffScaleFactorMultiplier applied to retryDelaySeconds to adjust how fast delays increase. Default is 1.

Use case: Flaky email provider

Imagine your email provider fails intermittently. The first request sends a 500 error, but the second or third might succeed. This is a perfect scenario for retries.

python
from conductor.client.configuration.configuration import Configuration
from conductor.client.http.models import TaskDef
from conductor.client.orkes_clients import OrkesClients


def main():
    api_config = Configuration()
    clients = OrkesClients(configuration=api_config)
    metadata_client = clients.get_metadata_client()

    task_def = TaskDef()
    task_def.name = 'send_email_task'
    task_def.description = 'Send an email with retry on intermittent failures'
    task_def.retry_count = 3
    task_def.retry_logic = 'EXPONENTIAL_BACKOFF'
    task_def.retry_delay_seconds = 2
    task_def.backoff_scale_factor = 2

    metadata_client.register_task_def(task_def=task_def)

    print(f'Registered the task -- view at {api_config.ui_host}/taskDef/{task_def.name}')


if __name__ == '__main__':
    main()

Check out the full sample code for the send email task.

Here, if the email task fails, it will automatically retry up to 3 times with increasing delays—2s, 4s, and 8s—allowing time for the service to recover between attempts.

Email validation workflow

Email validation workflow using a Send Email task with exponential-backoff retries

Task timeouts: Preventing workflow stalls

Retries help you recover, but timeouts prevent you from getting stuck in the first place. Whether a worker goes offline or an external service hangs, task-level timeouts ensure your workflow doesn’t wait forever.

Timeout parameters

ParameterDescription
pollTimeoutSecondsThe time to wait for a worker to poll this task before marking it as TIMED_OUT.
responseTimeoutSecondsThe time to wait for a worker to send a status update (like IN_PROGRESS) after polling.
timeoutSecondsTotal time allowed for the task to reach a terminal state.
timeoutPolicyAction to take when a timeout occurs:
  • RETRY–Retries the task using retry settings.
  • TIME_OUT_WF–Marks the whole workflow as TIMED_OUT.
  • ALERT_ONLY–Logs an alert but lets the task continue.

Use case: Slow inventory API

Say you're calling a third-party inventory API that sometimes takes too long to respond. You don't want to wait forever, but you also don’t want to fail immediately. Here's how you'd configure a balanced timeout with retries:

python
from conductor.client.configuration.configuration import Configuration
from conductor.client.http.models import TaskDef
from conductor.client.orkes_clients import OrkesClients


def main():
    api_config = Configuration()
    clients = OrkesClients(configuration=api_config)
    metadata_client = clients.get_metadata_client()

    task_def = TaskDef()
    task_def.name = 'inventory_check_task'
    task_def.description = 'Check inventory status with timeout and retry settings'

    # Retry settings
    task_def.retry_count = 2
    task_def.retry_logic = 'FIXED'
    task_def.retry_delay_seconds = 5

    # Timeout settings
    task_def.timeout_seconds = 30
    task_def.poll_timeout_seconds = 10
    task_def.response_timeout_seconds = 15
    task_def.timeout_policy = 'RETRY'

    metadata_client.register_task_def(task_def=task_def)

    print(f'Registered the task -- view at {api_config.ui_host}/taskDef/{task_def.name}')


if __name__ == '__main__':
    main()

Check out the full sample code for the check inventory task.

This setup gives your worker 30 seconds to complete the task. If it doesn’t respond or fails, Conductor will retry it twice, waiting 5 seconds between each attempt.

Inventory workflow

Inventory workflow with a Check Inventory task with a 30-second timeout

Together, these retry and timeout configurations help you build workflows that are not just reactive, but resilient by design.

Next, we’ll look at how the same principles apply at the workflow level, giving you end-to-end control over your system’s behavior.

System task resilience

A key part of building resilient workflows is defining how long system tasks should wait on external services or heavy computations. In Orkes Conductor, each system task has default or configurable timeout settings to control this behavior.

HTTP task resilience

In Orkes Conductor, HTTP timeouts are defined by two parameters:

ParameterDescriptionDefault Values
connectionTimeoutThe maximum time (in milliseconds) to establish a TCP connection to the remote server.30 sec
readTimeoutThe maximum time (in milliseconds) to wait for a response after the connection is established and the request is sent.60 sec

In Orkes Conductor, these defaults are enforced to ensure platform stability.

HTTP timeouts

Orkes Conductor making an HTTP request to an external server, illustrating timeout applications.

Internal timeouts resilience

Some system tasks have implicit timeout behaviors based on internal implementation. If you're designing workflows around system tasks, it's critical to understand and respect these limits.

System TaskConnection TimeoutRead Timeout
HTTP30 sec60 sec
LLM60 sec60 sec
Opsgenie30 sec60 sec
Inlinen/a4 sec (max execution time)
Business Rule10 sec120 sec

Wrap up

Task failures are unavoidable, but with proper retry and timeout configurations, they don’t have to break your workflows. Conductor’s task-level resilience features help you avoid cascading failures, handle transient issues gracefully, and prevent workflows from hanging indefinitely.

In the next article, we’ll scale this approach up and explore workflow-level failure handling strategies like timeout policies and compensation flows that give you end-to-end resilience.

Next up:

  • Workflow-Level Resilience

—

Orkes Conductor is an enterprise-grade orchestration platform for process automation, API and microservices orchestration, agentic workflows, and more. Check out the full set of features, or try it yourself using our free Developer Edition.