Orkes logo image
Product
Platform
Orkes Platform thumbnail
Orkes Platform
Orkes Agentic Workflows
Orkes Conductor Vs Conductor OSS thumbnail
Orkes vs. Conductor OSS
Orkes Cloud
How Orkes Powers Boat Thumbnail
How Orkes Powers BOAT
Try enterprise Orkes Cloud for free
Enjoy a free 14-day trial with all enterprise features
Start for free
Capabilities
Microservices Workflow Orchestration icon
Microservices Workflow Orchestration
Enable faster development cycles, easier maintenance, and improved user experiences.
Realtime API Orchestration icon
Realtime API Orchestration
Enable faster development cycles, easier maintenance, and improved user experiences.
Event Driven Architecture icon
Event Driven Architecture
Create durable workflows that promote modularity, flexibility, and responsiveness.
Human Workflow Orchestration icon
Human Workflow Orchestration
Seamlessly insert humans in the loop of complex workflows.
Process orchestration icon
Process Orchestration
Visualize end-to-end business processes, connect people, processes and systems, and monitor performance to resolve issues in real-time
Use Cases
By Industry
Financial Services icon
Financial Services
Secure and comprehensive workflow orchestration for financial services
Media and Entertainment icon
Media and Entertainment
Enterprise grade workflow orchestration for your media pipelines
Telecommunications icon
Telecommunications
Future proof your workflow management with workflow orchestration
Healthcare icon
Healthcare
Revolutionize and expedite patient care with workflow orchestration for healthcare
Shipping and logistics icon
Shipping and Logistics
Reinforce your inventory management with durable execution and long running workflows
Software icon
Software
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean leo mauris, laoreet interdum sodales a, mollis nec enim.
Docs
Developers
Learn
Blog
Explore our blog for insights into the latest trends in workflow orchestration, real-world use cases, and updates on how our solutions are transforming industries.
Read blogs
Check out our latest blog:
Conductor CLI Guide: Register, Run, Retry, and Recover Durable Workflows Without Leaving Your Terminal 💻
Customers
Discover how leading companies are using Orkes to accelerate development, streamline operations, and achieve remarkable results.
Read case studies
Our latest case study:
Twilio Case Study Thumbnail
Orkes Academy New!
Master workflow orchestration with hands-on labs, structured learning paths, and certification. Build production-ready workflows from fundamentals to Agentic AI.
Explore courses
Featured course:
Orkes Academy Thumbnail
Events icon
Events
Videos icons
Videos
In the news icon
In the News
Whitepapers icon
Whitepapers
About us icon
About Us
Pricing
Get a demo
Signup
Slack FaviconDiscourse Logo icon
Get a demo
Signup
Slack FaviconDiscourse Logo icon
Orkes logo image

Company

Platform
Careers
HIRING!
Partners
About Us
Legal Hub
Security

Product

Cloud
Platform
Support

Community

Docs
Blogs
Events

Use Cases

Microservices Workflow Orchestration
Realtime API Orchestration
Event Driven Architecture
Agentic Workflows
Human Workflow Orchestration
Process Orchestration

Compare

Orkes vs Camunda
Orkes vs BPMN
Orkes vs LangChain
Orkes vs Temporal
Twitter or X Socials linkLinkedIn Socials linkYouTube Socials linkSlack Socials linkGithub Socials linkFacebook iconInstagram iconTik Tok icon
© 2026 Orkes. All Rights Reserved.
Back to Blogs

Table of Contents

Share on:Share on LinkedInShare on FacebookShare on Twitter
Worker Code Illustration

Get Started for Free with Dev Edition

Signup
Back to Blogs
SOLUTIONS

Upgrade EKS Clusters across Multiple Versions in Less Than a Day — using Automated Workflows

Liv Wong
Liv Wong
Technical Writer
Last updated: April 8, 2024
April 8, 2024
7 min read

Related Blogs

How to Build a Simple, Modular, and AI-Powered Fraud Detection Workflow

Jul 8, 2025

How to Build a Simple, Modular, and AI-Powered Fraud Detection Workflow

Automating Serialization/Deserialization Tests with Orkes Conductor and LLMs

May 29, 2025

Automating Serialization/Deserialization Tests with Orkes Conductor and LLMs

Automating Insurance Claims Processing with AI and Conductor

Apr 24, 2025

Automating Insurance Claims Processing with AI and Conductor

Ready to Build Something Amazing?

Join thousands of developers building the future with Orkes.

Start for free

Upgrading your Kubernetes clusters to the latest version can be a time-consuming and laborious process, even with a managed Kubernetes service like Amazon Elastic Kubernetes Service (EKS). Amazon EKS does the heavy lifting of implementing the upgrade, such as creating new control planes and initiating rollbacks in case of failure. But to ensure a successful update, cloud engineers still need to spend days or weeks to orchestrate several high-level tasks behind the scenes:

  • Manually initiate upgrades for the cluster, its node groups and add-ons iteratively until each component is updated to the desired version
  • Troubleshoot critical errors that result in upgrade failures or cluster downtime
  • Conduct health checks and custom checks to verify that the newly-updated cluster works as expected

With three Kubernetes releases every year and only 14-months’ standard support for each release, the technical overhead to maintain your cloud infrastructure ramps up rapidly. An enterprise that uses a single cluster with several node groups may be able to handle the technical overhead with some effort. But organizations with tens or hundreds of clusters, each with different configurations, may soon find themselves overwhelmed trying to keep up.

Using Orkes as an example, let’s take a look at the difficulties we faced during a manual upgrade, and how we used an automated workflow in Conductor to update our clusters from 1.25 all the way to 1.29 in under 7 hours.

Limitations of the default upgrade process in CLI or Amazon console

The case study

At Orkes, we deploy and manage numerous clusters for our customers. In this scenario, our EKS clusters are significantly outdated and reaching the end of Amazon’s standard support in less than two months.

List of clusters displayed in the Amazon console, with information about the cluster name, status, Kubernetes version, support type, and provider.

Multiple EKS clusters that are significantly outdated and reaching end of life.

Time-consuming

Kubernetes only allows upgrades from one minor version to another (for example, 1.25 to 1.26). To go from version 1.25 to 1.29, our engineers would have to upgrade each cluster, each underlying node group, and each associated add-on in multiple iterations. Using the CLI or Amazon console, this would be a very tedious process of entering command after command, clicking button after button, with tons of time spent waiting for each update task to be completed before starting the next task.

Selection screen in the Amazon console, displaying the available Kubernetes version that the selected cluster can be updated to.

Kubernetes clusters can only be upgraded in sequence. This means going from version 1.25 to 1.29 requires four full upgrades.

Furthermore, to reduce the risk of downtime during the upgrade process, it is best practice to upgrade each node or node group one by one, rather than all at once. Twenty node groups would amount to twenty iterative updates using the CLI or Amazon console, significantly extending the time spent to manually update the EKS cluster.

Automated scripting tools can help resolve this issue. However, these tools only automate one part of the entire upgrade process. Without workflow orchestration, there’s no easy way to automate additional steps into a single workflow, such as conducting custom checks and sending status notifications, all of which are vital tasks enterprises would require. Which brings us to the next problem:

Difficult to integrate custom checks into the upgrade process

Beyond the time-consuming effort of updating each cluster manually, every upgrade also requires pre- and post-upgrade checks to ensure that the cluster is fully functioning: safe to be updated prior, and works as expected after.

Managed Kubernetes services like Amazon EKS and Google Kubernetes Engine (GKE) provide some degree of pre-upgrade checks, such as evaluating the upgrade compatibility. However, additional health checks are often critical in ensuring that the cluster is up, running, and ready to receive traffic. This would require configuring and running observability tools, such as probes, in your applications.

In our case at Orkes, we also needed to run custom checks to ensure that our container images can be reached post-upgrade and that any failure would be flagged. For example, if there are changes to the cluster’s security group or firewall settings, which may create issues with rescheduling the container images onto the new nodes, our engineers can detect such issues and fix them before commencing the cluster upgrade.

Limited visibility into upgrade status, errors, or cluster health

With the EKS cluster upgrade fragmented into disparate steps, there is limited visibility into the progress – are the pre-upgrade checks still ongoing, or has the upgrade commenced proper?

Of course, our cloud engineers could track the progress by watching the Amazon console or listening to an event. But we needed an effective way for all relevant teams in the company to be kept in the loop as well. This means global visibility into the entire process, even beyond the upgrade step on Amazon itself, and automatic notifications at critical junctures for human intervention, informing us about success, failure, and reason for failure.

Upgrading EKS clusters with a Conductor workflow

Automating the upgrade

Here at Orkes, instead of manually upgrading our clusters, we leverage our workflow orchestration platform, Conductor, to execute the upgrade quickly and efficiently. Orkes Conductor is an enterprise-grade platform that enables you to hook together any task or process into an automated workflow, thus simplifying development and operations at scale. With Conductor, every step – custom checks, upgrade commands, notifications, and so on – is unified into a single upgrade workflow, which is further powered by in-built features for scheduling, failure handling, global visibility, and more.

Let’s dive into how we used Conductor to upgrade our EKS clusters.

Planning the upgrade

Every upgrade comes with careful research and planning, from identifying impacted areas in Kubernetes’ release notes to planning out what needs to be done. Building out an upgrade workflow for the first time in Conductor is no different. At Orkes, our workflow included these key tasks:

  • Health verification check
  • Custom checks for image reachability
  • Control plane update
  • Add-on updates
  • Node updates
  • Success notification on Slack
  • Failure workflow

Flow chart, showing the key tasks in Orkes’ cluster upgrade workflow.

Key tasks in Orkes’ cluster upgrade workflow.

Build once, execute ad infinitum

Once our team has pinned down what to do, the upgrade process becomes an algorithmic operation that can be scaled. With a single Conductor workflow, our team at Orkes manages upgrades for hundreds of clusters all year round.

Here is the full Conductor workflow, and the features we utilized to make the upgrade process seamless for us:

Visual editor screen in Conductor, displaying Orkes’ EKS cluster upgrade workflow.

EKS cluster upgrade workflow in Conductor.
  1. Task sequencing: Tasks can be arranged in sequence, in parallel, or in iteration – just like any code. These options enabled us to go fast where possible and slow where needed. For example, we run our custom checks in parallel to speed up the workflow, but opted to iterate through the node upgrades to avoid overloading the cluster.

Visual editor screen in Conductor, displaying parallel and iterative tasks.

Examples of parallel and iterative tasks in Conductor.
  1. Planned delays: The Wait task in Conductor allowed us to add planned delays to the upgrade workflow. These delays provide breathing space to check that the add-on upgrades are successful before continuing.

Visual editor screen in Conductor, displaying the Wait task and its configuration options.

Example of a Wait task in Conductor.
  1. Slack notifications: The Simple task in Conductor serves as an external worker, allowing us to add custom code that sends Slack notifications at relevant times – alerting everyone when the upgrade has begun, completed, or failed.

Notification message in Slack, announcing that a node group has been successfully upgraded.

Slack notification upon successful upgrade.
  1. Scheduling: With Conductor’s Scheduler feature, we can set up cluster upgrades within planned maintenance windows or schedule consecutive workflow executions to upgrade across multiple versions. The flexibility allows us to slot cluster upgrades according to our scheduling needs, unbeholden to in-person availability.

Visual editor screen in Conductor, displaying a Schedule and its configuration options.

Example of a Schedule in Conductor.
  1. Failure handling: With Conductor’s built-in support for handling failure and errors, we configured the timeout and retry logic for each workflow task to suit our needs. We also included a failure workflow that is automatically triggered when the entire workflow fails to complete. This minimizes the need for human intervention when transient errors occur, while ensuring that critical failures are swiftly rolled back and escalated for further investigation.

Visual editor screen in Conductor, displaying a failure workflow.

Failure workflow that is automatically triggered if the cluster upgrade workflow fails.

In summary, we overcame the problems that came with manual cluster upgrades by using an automated workflow in Conductor. By leveraging Conductor, engineering teams can streamline cloud infrastructure activities and transform it into a highly efficient, automated procedure with these benefits:

  • Time-saving
  • Minimize manual errors
  • Customizable based on needs and best practices
  • Comprehensive supervision and failure handling
  • Global visibility

Curious to learn more about building workflows in Conductor? Check out our documentation or GitHub repository.

Conductor is an enterprise-grade orchestration platform for process automation, API and microservices orchestration, agentic workflows, and more. Check out the full set of features, try it yourself using our Developer Edition sandbox, or get a demo of Orkes Cloud, a fully managed and hosted Conductor service.