We are really excited to announce the latest feature to Orkes' cloud hosted version of Netflix Conductor. It is now no longer a secret - we support the use of secrets in your workflow definitions! Now you can be certain that your secret keys, tokens and values that you use in your workflows are secure!
Know Your Customer (KYC) workflows are really important for banks and financial services as well as other industries. In the banking industry in most countries, having a KYC workflow is enforced by the regulators that provide the banking license—the banks are required to implement a KYC workflow and a risk-based approach to fight money laundering.
In this article, you will learn about KYC use cases and workflows, including their requirements and distinguishing features. You will also learn about using Conductor, an open source microservice and workflow orchestration framework, and the Orkes Conductor Playground (as a Software as a Service option to host Conductor workflow) to build and test your own KYC workflow within minutes! You will build an example workflow in Conductor that you can easily run on the Orkes Conductor Cloud.
The idea of reduce, reuse and recycle is reverberated around the world as a conservation technique - if we use fewer materials, and reuse or recycle what we already are using, we lower our burden on the earth and its ecosystem.
As developers, we love the idea of reducing, reusing and recycling code. Just look at the prevalent use of StackOverflow, and the huge use of open source and libraries - if someone else has built it well - why not recycle the code and reuse it?
In this post, we'll apply the 3 R's of reduce, reuse and recycling to the topic of Conductor workflows - helping us create workflows that are compact, and easier to follow, and complete the desired task. Through our simplification of the workflow we'll also move from a workflow that is hardcoded to one specific task to a workflow that is more easily adapted to other similar uses - making the workflow more useful to the organization.
The microservice architecture pattern has been steadily gaining popularity in recent years. This architecture decomposes larger applications into smaller, more easily managed components.
While this can eliminate some of the challenges of working with large monolithic applications, breaking applications down into multiple decoupled pieces also presents some new challenges, such as determining how the microservices will communicate with each other.
This article compares two different approaches that offer solutions to this problem. These approaches are workflow orchestration and workflow choreography. While these concepts are similar in some regards, there are also key differences. This article highlights these differences by comparing the two concepts using the following criteria:
- Definition: How is each concept defined?
- Scalability: How well does each approach scale as applications increase in size and scope?
- Communication: How do microservices communicate and transact data under each approach?
- Strengths: What are the benefits afforded by each approach?
- Limitations: What are the limitations of each approach?
- Tools: What tools, if any, are there to help you facilitate each approach?
Before delving into the specific differences between these two approaches, it is good to have a high-level understanding of the definitions and goals of each.
Workflow orchestration describes an approach in which a single, centralized service—commonly known as the “orchestrator”—is responsible for invoking other services and handling and combining their responses to execute a composite business workflow.
In this approach, the orchestrator is aware of the big picture and the role played by each service. However, the services are not aware of anything beyond their interactions with the orchestrator.
On the other hand, workflow choreography is a decentralized approach in which each service is responsible for invoking and responding to adjacent services.
This decentralization means that each service is aware of a small piece of the big picture, but only those parts in which the service plays an active role. The services are otherwise unaware of their overall position and relevance concerning the business workflow under execution.
One of the key benefits of decomposing a system into microservices is that it enables better scalability. Whether your microservices are running in containers or dedicated virtual machines, there’s almost always a way to scale the number of instances of a given microservice up or down to meet demand at any given time.
With this in mind, it’s essential to consider the potential impact on scalability when it comes to either orchestration or choreography.
One immediate concern is whether the scalability of the services themselves is affected. In both approaches, the services can be abstracted away behind load balancers, such as those offered by AWS, or the load balancing functionality in Kubernetes.
Behind this abstraction, individual services can theoretically scale independently of any other concerns. In light of this, the next consideration is whether the orchestration and choreography patterns are scalable.
When considering orchestration, you need to account for a centralized component. This component—the orchestrator—will vary depending on your implementation, but one example is Netflix Conductor, an open source workflow orchestration platform.
Conductor is inherently scalable in this instance, claiming to support workloads “from a single workflow to millions of concurrent processes,” which would suggest that orchestration can be entirely scalable; that said, the degree to which this is the case will be somewhat affected by whichever tool is used to fill the role of orchestrator.
On the other hand, choreography has fewer considerations when it comes to scalability. The entire system should inherit this scalability as long as the services themselves are scalable, along with any other “connective pieces,” such as message brokers.
How the services communicate with each other is another key consideration when differentiating between orchestration and choreography. While the choice between these two approaches doesn’t necessarily dictate which mechanisms your services can use to communicate, it does help inform the specifics of how you would use these mechanisms in a given scenario.
Firstly, in orchestration, as you know, a central process is responsible for when and how services are invoked. In the case of a synchronous system where the orchestrator makes HTTP calls to services in series, the communication might look something like the following diagram.
Alternatively, you might wish to take an asynchronous approach, in which a message broker is used to store the information about jobs that the services must complete. In this case, your communication would look something like the following diagram.
The orchestrator is now responsible for reading messages pushed by individual services and pushing messages so that other individual services can act on them.
In contrast, in workflow choreography, there is no central orchestrator and, thus, no central process that decides how services should be invoked. A given service may receive a request and act upon it, directly invoking whatever other services it needs. In a synchronous approach, this might look something like the following diagram.
As you can see, each service is responsible for invoking and responding to any adjacent services as needed. This behavior is also true for asynchronous communication, with the main difference being the inclusion of a message broker instead of direct HTTP calls.
In this asynchronous approach to workflow choreography, each service subscribes to and publishes specific message types directly, rather than an orchestrator being responsible for mediating communication between services.
As with most architectural patterns, each approach has strengths and limitations. The orchestration pattern reduces point-to-point communication between services by shifting the contextual awareness of the workflow to the orchestrator.
With this awareness, the orchestrator can be more resilient when individual services fail. Suppose a given service fails to respond as expected. In that case, the orchestrator can elegantly handle the error in several ways, whether by retrying immediately, re-queuing the task for later, or even just logging information about the error in greater detail than would otherwise be possible.
Workflow choreography also offers some benefits. Because each service is only concerned with other adjacent services and not with the overall shape of the system, it can be somewhat easier to add, change, and remove individual services frequently without disrupting other parts of the system.
Eliminating the orchestrator from your architecture also removes a potential bottleneck or point of failure. Choreography is also typically well-aligned with the serverless architecture pattern, as it supports scalable, short-lived services without the need for a long-running orchestrator.
There are some limitations to each approach that need to be considered when comparing orchestration and choreography.
In orchestration, you need to account for a potential single point of failure, which is the orchestrator. If the orchestrator suffers from degraded performance or an outage, the entire system will be affected, even if the other microservices are still operational.
Because of this, it’s important to ensure that the orchestrator has redundancy and failover capabilities where possible. Similarly, having an orchestrator means that all of your services are tightly coupled to that orchestrator when it comes to execution.
On the other hand, when using choreography, rather than having a single point of failure, responsibility for the system’s resilience is now distributed. Any given service could fail at any time, and without a centralized orchestrator, recovery and diagnostics can be a lot harder.
In some cases, it may be possible to push a job to a queue to be retried, but in many cases, it might be necessary to abort the workflow and log as much information as possible. Because choreographed workflows lack a holistic context, the breadth of information you can log at this stage is typically somewhat diminished.
Workflow orchestration and choreography are both architectural patterns and, as such, can be implemented in many ways. Orchestration, in particular, has the added requirement of the orchestrator itself. There are numerous orchestration tools that can fulfill this role, such as Netflix Conductor and the fully managed, cloud-based version of Conductor, Orkes.
On the choreography side, there aren’t necessarily any specific tools, as choreography doesn’t require any specialized components like an orchestrator. Instead, you would do well to ensure that all of your services communicate over clearly defined, well-known APIs and that you have a robust logging and error management solution in place, such as those offered by Sentry or Datadog.
Both approaches still rely heavily on individual microservices, so tools and techniques that make microservices easier to manage could be beneficial, regardless of the approach you decide to take. These include things like load balancers and container orchestration (not to be confused with workflow orchestration) tools like Kubernetes.
This article explained the key differences between workflow orchestration and workflow choreography. You’ve seen how these two approaches differ and where they’re similar. The strengths and weaknesses of each have been touched upon, as well as some tools you can consider to help implement either approach.
Both approaches are technically valid and can work for your solution if implemented correctly. If you’re interested in learning more about orchestration, consider Orkes, a fully managed, cloud-based version of Netflix Conductor.
Microservice architecture is an architecture where an application is split into separate services, and each service is run and managed independently. In a microservice architecture, every service is focused on handling one major function and is solely responsible for its own data management.
Microservice architecture is often recommended for larger applications because it allows services to be managed by dedicated teams. Testing and deployment also become easier, as they can be carried out independently for each service without affecting the overall application.
In this article, you'll learn what a microservice architecture is and when to use it, and about workflow orchestration, its benefits, and how it can be utilized in a microservice architecture. You'll also look at an example use case of microservice architecture so you can better understand the benefits, strengths, and weaknesses of this style of architecture.
As the Introduction documentation shows, Netflix Conductor can be used for a variety of uses cases, solving complex workflow problems that plague many companies worldwide.
Let's now try to understand how we can use Conductor to solve a Lending problem that exists in the Banking and Fintech sector.
Use Case / Problem
In the new modern era of Fintech, bank customers are moving from traditional banking to digital banking. With that, there is an expectation that the processes that are run will be faster and more streamlined. Hence, in order to keep up with the customer demands various banks are trying to automate their banking processes. One common (and complicated) process that many banks are automating to its customers is the loan banking (lending) process.
Lending workflows can be very common and potential problem that can be solved by Conductor.
In our initial image processing workflow using Netflix Conductor, we initially built a workflow that takes one image, resizes it and uploads it to S3.
In our 2nd post, we utilized a fork to create two images in parallel. When building this workflow, we reused all of the tasks from the first workflow, connecting them in a way that allowing for parallel processing of two images at once.
In both of these workflows, two tasks are reused:
upload_toS3. This is one great advantage of using microservices - we create the service once, and reuse it many times in different ways.
In this post, we'll take that abstraction a step further, and replace the tasks in the two forks with a
SUB_WORKFLOW. This allows us to simplify the full workflow by abstracting a frequently used set of tasks into a single task.
In recent posts, we have built several image processing workflows with Conductor. In our first post, we created an image processing workflow for one image - where we provide an image along with the desired output dimensions and format. The workflow output a link on Amazon S3 to the desired file.
In the 2nd example, we used the FORK System task to create multiple images in parallel. The number of images was hardcoded in the workflow - as FORK generates exactly as many paths as are coded into the workflow.
As number of images is hardcoded in the workflow - only 2 images are created. When it comes to image generation, there is often a need for more images (as new formats become popular) or sizes - as more screens are supported.
Luckily, Conductor supports this flexibility, and has a feature to specify the number of tasks to be created at runtime. In this post, we'll demonstrate the use of dynamic forks, where the workflow splitting is done at runtime.
Learn how to create a dynamic fork workflow in this post!
In our previous post on image processing workflows, we built a Netflix Conductor workflow that took an image input, and then ran 2 tasks: The first task resizes and reformats the image, and the second task uploads the image to an AWS S3 bucket.
With today's varied screen sizes, and varied browser support, it is a common requirement that the image processing pipeline must create multiple images with different sizes and formats of each image.
To do this with a Conductor workflow, we'll utilize the FORK operation to create parallel processes to generate multiple versions of the same image. The FORK task creates multiple parallel processes, so each image will be created asynchronously - ensuring a fast and efficient process.
In this post, our workflow will create 2 versions of the same image - a jpg and webp.
What is Conductor
Conductor is a Microservices orchestration platform from Netflix, released under Apache 2.0 Open Source License.
Design for failures
Failures and service degradation are the fact of any system, this is especially true with large interconnected systems running in cloud. Conductor is designed with principles that systems can and will go down, degrade in performance and any dependencies should be able to handle such failures.
Tasks in Conductor
Conductor workflows are orchestration of many activities known as
task. Each task represents a (ideally) stateless worker that given a specific input does the work and produces output. The tasks are typically running outside of Conductor server and there are many factors that could effect their availability.
Designing for failures
Conductor allows you to define your stateful applications that can handle failures and temporary degradation of services and without having to write code for that.
Configuring tasks to handle failures
Each task in Conductor can be configured how it responds to availability events such as: 1) Failures 2) Timeouts and 3) Rate limits.
Here is a sample task definition:
"description": "This is a sample task for demo",
retry* parameters specify how to handle cases where the task execution fails and retries can be configured to be with fixed delay or exponential backoff. Similarly
timeout* parameters specify how much time to give for task to complete execution and if the task should be marked as 'Timed Out' if it runs longer than that.
Follow us on https://github.com/Netflix/conductor/ for the source code and updates.