We are really excited to announce the latest feature to Orkes' cloud hosted version of Netflix Conductor. It is now no longer a secret - we support the use of secrets in your workflow definitions! Now you can be certain that your secret keys, tokens and values that you use in your workflows are secure!
Know Your Customer (KYC) workflows are really important for banks and financial services as well as other industries. In the banking industry in most countries, having a KYC workflow is enforced by the regulators that provide the banking license—the banks are required to implement a KYC workflow and a risk-based approach to fight money laundering.
In this article, you will learn about KYC use cases and workflows, including their requirements and distinguishing features. You will also learn about using Conductor, an open source microservice and workflow orchestration framework, and the Orkes Conductor Playground (as a Software as a Service option to host Conductor workflow) to build and test your own KYC workflow within minutes! You will build an example workflow in Conductor that you can easily run on the Orkes Conductor Cloud.
The idea of reduce, reuse and recycle is reverberated around the world as a conservation technique - if we use fewer materials, and reuse or recycle what we already are using, we lower our burden on the earth and its ecosystem.
As developers, we love the idea of reducing, reusing and recycling code. Just look at the prevalent use of StackOverflow, and the huge use of open source and libraries - if someone else has built it well - why not recycle the code and reuse it?
In this post, we'll apply the 3 R's of reduce, reuse and recycling to the topic of Conductor workflows - helping us create workflows that are compact, and easier to follow, and complete the desired task. Through our simplification of the workflow we'll also move from a workflow that is hardcoded to one specific task to a workflow that is more easily adapted to other similar uses - making the workflow more useful to the organization.
The microservice architecture pattern has been steadily gaining popularity in recent years. This architecture decomposes larger applications into smaller, more easily managed components.
While this can eliminate some of the challenges of working with large monolithic applications, breaking applications down into multiple decoupled pieces also presents some new challenges, such as determining how the microservices will communicate with each other.
This article compares two different approaches that offer solutions to this problem. These approaches are workflow orchestration and workflow choreography. While these concepts are similar in some regards, there are also key differences. This article highlights these differences by comparing the two concepts using the following criteria:
- Definition: How is each concept defined?
- Scalability: How well does each approach scale as applications increase in size and scope?
- Communication: How do microservices communicate and transact data under each approach?
- Strengths: What are the benefits afforded by each approach?
- Limitations: What are the limitations of each approach?
- Tools: What tools, if any, are there to help you facilitate each approach?
Before delving into the specific differences between these two approaches, it is good to have a high-level understanding of the definitions and goals of each.
Workflow orchestration describes an approach in which a single, centralized service—commonly known as the “orchestrator”—is responsible for invoking other services and handling and combining their responses to execute a composite business workflow.
In this approach, the orchestrator is aware of the big picture and the role played by each service. However, the services are not aware of anything beyond their interactions with the orchestrator.
On the other hand, workflow choreography is a decentralized approach in which each service is responsible for invoking and responding to adjacent services.
This decentralization means that each service is aware of a small piece of the big picture, but only those parts in which the service plays an active role. The services are otherwise unaware of their overall position and relevance concerning the business workflow under execution.
One of the key benefits of decomposing a system into microservices is that it enables better scalability. Whether your microservices are running in containers or dedicated virtual machines, there’s almost always a way to scale the number of instances of a given microservice up or down to meet demand at any given time.
With this in mind, it’s essential to consider the potential impact on scalability when it comes to either orchestration or choreography.
One immediate concern is whether the scalability of the services themselves is affected. In both approaches, the services can be abstracted away behind load balancers, such as those offered by AWS, or the load balancing functionality in Kubernetes.
Behind this abstraction, individual services can theoretically scale independently of any other concerns. In light of this, the next consideration is whether the orchestration and choreography patterns are scalable.
When considering orchestration, you need to account for a centralized component. This component—the orchestrator—will vary depending on your implementation, but one example is Netflix Conductor, an open source workflow orchestration platform.
Conductor is inherently scalable in this instance, claiming to support workloads “from a single workflow to millions of concurrent processes,” which would suggest that orchestration can be entirely scalable; that said, the degree to which this is the case will be somewhat affected by whichever tool is used to fill the role of orchestrator.
On the other hand, choreography has fewer considerations when it comes to scalability. The entire system should inherit this scalability as long as the services themselves are scalable, along with any other “connective pieces,” such as message brokers.
How the services communicate with each other is another key consideration when differentiating between orchestration and choreography. While the choice between these two approaches doesn’t necessarily dictate which mechanisms your services can use to communicate, it does help inform the specifics of how you would use these mechanisms in a given scenario.
Firstly, in orchestration, as you know, a central process is responsible for when and how services are invoked. In the case of a synchronous system where the orchestrator makes HTTP calls to services in series, the communication might look something like the following diagram.
Alternatively, you might wish to take an asynchronous approach, in which a message broker is used to store the information about jobs that the services must complete. In this case, your communication would look something like the following diagram.
The orchestrator is now responsible for reading messages pushed by individual services and pushing messages so that other individual services can act on them.
In contrast, in workflow choreography, there is no central orchestrator and, thus, no central process that decides how services should be invoked. A given service may receive a request and act upon it, directly invoking whatever other services it needs. In a synchronous approach, this might look something like the following diagram.
As you can see, each service is responsible for invoking and responding to any adjacent services as needed. This behavior is also true for asynchronous communication, with the main difference being the inclusion of a message broker instead of direct HTTP calls.
In this asynchronous approach to workflow choreography, each service subscribes to and publishes specific message types directly, rather than an orchestrator being responsible for mediating communication between services.
As with most architectural patterns, each approach has strengths and limitations. The orchestration pattern reduces point-to-point communication between services by shifting the contextual awareness of the workflow to the orchestrator.
With this awareness, the orchestrator can be more resilient when individual services fail. Suppose a given service fails to respond as expected. In that case, the orchestrator can elegantly handle the error in several ways, whether by retrying immediately, re-queuing the task for later, or even just logging information about the error in greater detail than would otherwise be possible.
Workflow choreography also offers some benefits. Because each service is only concerned with other adjacent services and not with the overall shape of the system, it can be somewhat easier to add, change, and remove individual services frequently without disrupting other parts of the system.
Eliminating the orchestrator from your architecture also removes a potential bottleneck or point of failure. Choreography is also typically well-aligned with the serverless architecture pattern, as it supports scalable, short-lived services without the need for a long-running orchestrator.
There are some limitations to each approach that need to be considered when comparing orchestration and choreography.
In orchestration, you need to account for a potential single point of failure, which is the orchestrator. If the orchestrator suffers from degraded performance or an outage, the entire system will be affected, even if the other microservices are still operational.
Because of this, it’s important to ensure that the orchestrator has redundancy and failover capabilities where possible. Similarly, having an orchestrator means that all of your services are tightly coupled to that orchestrator when it comes to execution.
On the other hand, when using choreography, rather than having a single point of failure, responsibility for the system’s resilience is now distributed. Any given service could fail at any time, and without a centralized orchestrator, recovery and diagnostics can be a lot harder.
In some cases, it may be possible to push a job to a queue to be retried, but in many cases, it might be necessary to abort the workflow and log as much information as possible. Because choreographed workflows lack a holistic context, the breadth of information you can log at this stage is typically somewhat diminished.
Workflow orchestration and choreography are both architectural patterns and, as such, can be implemented in many ways. Orchestration, in particular, has the added requirement of the orchestrator itself. There are numerous orchestration tools that can fulfill this role, such as Netflix Conductor and the fully managed, cloud-based version of Conductor, Orkes.
On the choreography side, there aren’t necessarily any specific tools, as choreography doesn’t require any specialized components like an orchestrator. Instead, you would do well to ensure that all of your services communicate over clearly defined, well-known APIs and that you have a robust logging and error management solution in place, such as those offered by Sentry or Datadog.
Both approaches still rely heavily on individual microservices, so tools and techniques that make microservices easier to manage could be beneficial, regardless of the approach you decide to take. These include things like load balancers and container orchestration (not to be confused with workflow orchestration) tools like Kubernetes.
This article explained the key differences between workflow orchestration and workflow choreography. You’ve seen how these two approaches differ and where they’re similar. The strengths and weaknesses of each have been touched upon, as well as some tools you can consider to help implement either approach.
Both approaches are technically valid and can work for your solution if implemented correctly. If you’re interested in learning more about orchestration, consider Orkes, a fully managed, cloud-based version of Netflix Conductor.
Businesses must be able to provide high-quality, innovative services to clients quickly in order to meet market demand. That can be difficult if an organization’s internal architecture doesn’t offer the needed agility and speed. The tightly coupled nature of monolithic architecture can block an IT team’s ability to make changes, separate team responsibilities, and perform frequent deployments. Microservices can provide a better alternative.
In microservices architecture, an application is built as a collection of separate, independently deployable services that are loosely coupled and more easily maintained.
In this article, you’ll learn about the benefits of switching to microservices and what factors to consider as you migrate your monolithic application toward microservices architecture.
Why Use Microservices Architecture?
Structuring your application as microservices offers you a range of benefits. AWS cites several of them, below.
Monoliths are systems developed as one homogeneous unit. The architecture revolves around a single focal point that contains the system’s entire functionality. Distinct logical areas such as the client-side UI and backend APIs are all developed within one unit.
Breaking up monoliths is one of the most common objectives of modern software refactoring. Untamed monoliths can become bloated beasts full of interlinked functionality that’s difficult to reason about and maintain. Breaking the architecture up to reflect the system's logical areas makes for a more manageable codebase and can accelerate the implementation of new features.
This article looks at what monoliths are, how they differ from modern microservice-based approaches, and how you can start to break up a monolithic system. As we'd be remiss to claim microservices are a perfect solution, we'll also assess the situations where this migration might not make sense.
What's a Monolith?
"Monolith" has become a widely used term in the software industry, but it can mean slightly different things depending on who you ask. You’re probably dealing with a monolith if the system has multiple distinct units of functionality, but the project's codebase structure doesn’t mirror these. This results in little or no modularity within the system.
Monolith-based development strategies involve everyone working in the same repository irrespective of the type of feature they're building. User interface components sit side-by-side with business logic, API gateway integrations, and database routines. There’s little separation of concerns; components may directly interface with each other, resulting in fragility that makes it hard to make safe changes.
Here are some more common problems associated with monoliths:
Tight coupling: When all your components sit alongside each other, it can be difficult to enforce rigid separation of concerns. Over time, pieces become tightly coupled to each other, preventing you from replacing components and making the accurate mapping of control flows more difficult.
Fragility: The tight coupling observed in monolith systems leads to innate fragility. Making a change could have unforeseen consequences across the application, creating a risk of new problems each time you deploy a feature.
Cognitive burden: Putting all your components into one functional unit makes it harder for developers to find the pieces they need and understand how these relate to each other. You need to keep the entire system’s operation in your mind, creating a cognitive burden that only grows over time. Eventually, the monolith becomes too complex to understand; at this point, more errors can start creeping in.
Longer build and deployment times: Having everything in one codebase often leads to longer CI pipeline durations. Tools such as source vulnerability scanners, linters, and stylizers will take much longer to run when they have to look at all the code in your system each time they're used. Longer builds mean reduced throughput, limiting the amount of code you can ship each day. Developers can end up sitting idly while the automation runs to completion.
If you're experiencing any of the above, it might be time to start breaking up your monolith.
How Did We Get Here? Or, Why Monoliths Prevail
Monoliths aren't without their benefits. Here are a few good reasons to use a monolith that help to explain why the strategy remains so pervasive:
Reduced overhead: Not having to juggle multiple projects and manage the lifecycles of individual components does have advantages. You can focus more on functionality and get to work on each new feature straight away, without needing to set up a new service component. Please note that the simplicity of the monolith strategy is being considered here, not its impact on understanding the system you're encapsulating. As we've already seen, monoliths can make it harder to reason about characteristics of your system because everything is coupled together.
Easier to debug: When problems occur in a monolith, you know they can only derive from one source. As your whole system is a single unit, log aggregation is simple, and you can quickly jump between different areas to inspect complex problem chains. Determining the root cause of issues can be trickier when using microservices because faults may ultimately lie outside the service that sent the error report.
Straightforward deployment: Monoliths are easy to deploy because everything you need exists within one codebase. In many cases, web-based applications can be uploaded straight to a web server or packaged into an all-in-one container for the cloud. This is a double-edged sword: as shown above, your deployments will be rigid units with no opportunity for granular per-component scaling.
Though monoliths aren't all bad, it's important to recognize where things fall apart. Trouble often stems from teams not realizing they’re dealing with a monolith. This speaks to a disorganized development approach fixated on code, new features, and forward motion at the expense of longevity and developer experience.
Despite these pitfalls, monoliths can still be effectively used by intentionally adopting a similar structure. The Monorepo approach, for example, uses one source control repository to encapsulate multiple logical components. You still break your system into loosely coupled units, but they can sit alongside each other in a single overarching project. This approach forces you to be deliberate in your design while offering some of the benefits of both monoliths and microservices. Many large organizations opt for a monolith-like approach, including Google and Microsoft.
Why Should You Break up a Monolith?
Monoliths often develop organically over many years. Your codebase's silently growing scale may go unnoticed or be disregarded as a necessary by-product of the system's growth. The challenges associated with monoliths tend to become apparent when you need to scale your application or integrate a new technology.
A system treated as one all-encompassing unit makes it difficult to scale individual parts to meet fluctuations in demand. If your database layer starts to perform poorly, you'll need to "scale" by starting new instances of the entire application. Replacing specific components is similarly complex; they may be referenced in hundreds of places throughout the codebase, with no defined public interface.
Separating the pieces allows you to develop each one independently. This shields individual components from bugs in the broader system, helps developers focus on their specific areas, and unlocks the ability to scale your deployments flexibly. Now you can run three instances of your login gateway, two instances of your web UI, and a single replica of your little-used social media synchronization tool. This makes your system more efficient, lowering infrastructure costs.
Breaking up a monolith also gives you greater opportunities to integrate additional technologies into your stack. New integrations can be developed as standalone modules plugged in to your system. Other components can access the modules by making network calls over well-defined APIs, specifying what functionality is needed and how it will be used.
Monolith destruction often enhances the developer experience too, particularly in the case of new hires getting to grips with your codebase for the first time. Interacting with an unfamiliar monolith is usually a daunting experience that requires bridging different disciplines. Seemingly straightforward day-one tasks like adding a new UI component might need knowledge of your backend framework and system architecture, just to be able to pull data out of the tightly coupled persistence layer.
An Alternative Approach: Microservice Architectures
Microservice architectures are the effective antithesis to the monolithic view of a system as a single unit. The microservice strategy describes an approach to software development where your distinct functional units are spun out to become their own self-contained services. The capabilities of individual services are kept as small as possible, adhering to the single-responsibility principle and creating the "micro" effect.
Services communicate with each other through clearly defined interfaces. These usually take the form of HTTP APIs; services will make network calls to each other when they need to exchange data or trigger an external action. This decoupled approach is straightforward to extend, replace, and maintain. New implementations of a service have no requirements imposed on them other than the need to offer the same API surface. The replacement service can be integrated into your system by reconfiguring the application to call it instead of the deprecated version.
Microservices let you reason about logical parts of your stack in isolation. If you're working on a backend login system, you can concentrate on the parts that belong to it, without the distractions of your UI code. Changes are much less likely to break disparate parts of the system as each component can only be accessed by the API it provides. As long as that API remains consistent, you can be reasonably confident the broader application will stay compatible.
This architecture also solves the scalability challenges of monoliths. Splitting your application into self-contained pieces lets you treat each one as its own distinct deployment. You can allocate free resources to the parts of the system that most need them, reducing waste and enhancing overall performance.
Microservices do have some drawbacks, especially for people accustomed to a monolith approach. The initial setup of a distributed system tends to be more complex: you need to start each individual component, then configure the inter-component connections so services can reach each other. These steps require an understanding of your deployment platform's networking and service discovery capabilities.
Microservices can also be hard to reason about at the whole-system level. Fault sources are not always immediately clear. New classes of error emerge when the links between services are broken by flaky networking or misconfiguration. Setting up resilient monitoring and logging for each of your services is vital for tracing issues through the layers of your application. Microservice monitoring and log aggregation are distinct skills which have helped shape the modern operations engineer role, which is focused on the day-to-day deployment and maintenance of complex distributed systems.
Using an orchestration tool like Netflix Conductor - and using Orkes as a cloud based version of Conductor simplify many of these issues.
Monolithic systems contain all their functionality within a single unit, which initially seems like an approachable and efficient way to add functionality and evolve a system over time.
In practice, monoliths are often unsuitable for today’s applications. Breaking up monoliths is an important task for software teams to guarantee stable, ongoing development at a steady pace. Separating a system into its logical constituent parts forces you to acknowledge, understand, and document the connections and dependencies within its architecture.
This article explored the problems with monoliths and looked at how microservice approaches help to address them. You also learned how monolith-like systems can still be effective when microservices aren't suitable. Deciding whether you should break up a monolith comes down to one key question: Is your architecture holding you back, making it harder to implement the changes you need? If the answer is yes, your system's probably outgrown its foundations and would benefit from being split into self-contained functional units.
When you are looking at breaking up your monolith into microservices, look at Conductor as a tool to orchestrate your microservices. Try it for free in the Orkes Playground!
In large applications consisting of loosely coupled microservices, it makes sense to design the internal architecture of each microservice to suit its function rather than adhere to a single top-down architectural approach.
By design, each microservice is an independent entity that has its own data as well as business logic. So it’s intuitive to use a design approach and architecture that’s best suited to its requirements, irrespective of high-level microservices architecture. However, detractors would like you to believe that using multiple languages should be avoided as it adds unnecessary complexity and overheads to microservices operations.
But there are multiple use cases where multilanguage architecture makes sense, and technology can be used to efficiently manage the overheads introduced. In this article we will unpack:
- When to build multilanguage microservices.
- The challenges introduced in microservices communication due to the use of multiple languages.
- Some tools and techniques to make multilanguage microservices implementation easier.
In our previous post on Using Conductor to Parse Data, we discussed a Netflix Conductor workflow that extracts data from GitHub, transforms it, and then uploads the results to Orbit. This basically describes an ETL (Extract, Transform, Load) process - automated as a Conductor workflow. In this post, we'll go in-depth as to how the workflow is constructed - examining what each task does. This workflow will run daily at midnight GMT, ensuring that the data in our Orbit instance is always up to date with the data on GitHub.
Data processing and data workflows are amongst the most critical processes for many companies today. Many hours are spent collecting, parsing and analyzing data sets. There is no need for these to repetitive processes to be manual - we can automate them. In this post, we'll build a Conductor workflow that handles ETL (Extraction, Transformation and Loading) data for a mission critical process here at Orkes.
As a member of the Developer Relations team here at Orkes, we use a tool called Orbit to better understand our users and our community. Orbit has a number of great integrations that allow for easy connections into platforms like Slack, Discord, Twitter and GitHub. By adding API keys from the Orkes implementations of these social media platforms, the integrations automatically update the community data from these platforms into Orbit.
This is great, but it does not solve all of our needs. Orkes Cloud is based on top of Netflix Conductor, and we'd like to also understand who is interacting and using that GitHub repository. However, since Conductor is owned by Netflix, our team is unable to leverage the automated Orbit integration.
However, our API keys do allow us to extract the data from GitHub, and our Orbit API key can allow us to upload the extracted data into our data collection. We could do this manually, but why not build a COnductor workflow to do this process for us automatically?
In this post, I'll give a high level view of the automation required for Extraction of the data from Github, Transformation the data to a form that Orbit can accept, and then to Load the data into our Orbit collection.
Conductor is a workflow orchestration engine developed and open-sourced by Netflix. At Netflix, Conductor is the de facto orchestration engine, used by a number of teams to orchestrate workflows at scale. If you are new to Conductor, we highly recommend taking a look at our GitHub repository and documentation.
Conductor was designed by Netflix to be extensible, making it easy to add or change components - even major components like queues or storage. This extendability makes adding new features and tasks easy to incorporate, without affecting the Conductor core. These implementations are based on well-defined interfaces and contracts that are defined in the core conductor repository.
Since open sourcing Conductor, we have seen huge community adoption and active interest from the community providing patches, features and extensions to Conductor.
Some of the key features that have been developed and contributed by the community into Conductor open source repository today are:
- Support for Postgres and MySQL backends
- Elasticsearch 7 support
- Integration with AMQP and NATS queues
- GRPC support
- Support for Azure blob stores
- Postgres based external payload storage
- Do While loops
- External Metrics collectors such as Prometheus and Datadog
- Support for Kafka and JQ tasks
- Various bug fixes, patches including most recent the fix for log4j vulnerabilities and many other more features and fixes
The number of community contributions, especially newer implementations of the core contracts in Conductor has increased over the past few years. We love that Conductor is finding use in many other organizations (link to the list), and that these organizations are submitting their changes back to the community version of Conductor,
This increase in engagement and growth of the community, while incredible, is a double edged sword. By no means does the Conductor team want to slow or limit these contributions, but the integration of third party implementations has been slower than we would like due to the team’s bandwidth.
In order to encourage (and to speed up the integration of) community-contributions to Conductor, we are announcing a new repository dedicated to supporting community contributions. The repository will be hosted at https://github.com/Netflix/conductor-community and will be seeded with the existing community contributed modules. Further, we are partnering with Orkes (https://orkes.io/) to co-manage the community repository along with Netflix, helping us with code reviews and creating releases for the community contributions.
We think this new structure will enable us to review the PRs and community contributions in a timely manner and allow the community to be more autonomous longer term.
We will continue to publish artifacts from the community repository at the same maven coordinates under com.netflix.conductor group and the artifact names will remain the same with full binary compatibility. This means that there is no change to users of Conductor: install, updates and usage remain the same.
Please see https://github.com/Netflix/conductor-community#readme for the details on the modules and release details. You can also find FAQs that address the most common questions.
We look forward to continued engagement with the community making Conductor the best open source orchestration engine.