In our previous post on Using Conductor to Parse Data, we discussed a Netflix Conductor workflow that extracts data from GitHub, transforms it, and then uploads the results to Orbit. This basically describes an ETL (Extract, Transform, Load) process - automated as a Conductor workflow. In this post, we'll go in-depth as to how the workflow is constructed - examining what each task does. This workflow will run daily at midnight GMT, ensuring that the data in our Orbit instance is always up to date with the data on GitHub.
Data processing and data workflows are amongst the most critical processes for many companies today. Many hours are spent collecting, parsing and analyzing data sets. There is no need for these to repetitive processes to be manual - we can automate them. In this post, we'll build a Conductor workflow that handles ETL (Extraction, Transformation and Loading) data for a mission critical process here at Orkes.
As a member of the Developer Relations team here at Orkes, we use a tool called Orbit to better understand our users and our community. Orbit has a number of great integrations that allow for easy connections into platforms like Slack, Discord, Twitter and GitHub. By adding API keys from the Orkes implementations of these social media platforms, the integrations automatically update the community data from these platforms into Orbit.
This is great, but it does not solve all of our needs. Orkes Cloud is based on top of Netflix Conductor, and we'd like to also understand who is interacting and using that GitHub repository. However, since Conductor is owned by Netflix, our team is unable to leverage the automated Orbit integration.
However, our API keys do allow us to extract the data from GitHub, and our Orbit API key can allow us to upload the extracted data into our data collection. We could do this manually, but why not build a COnductor workflow to do this process for us automatically?
In this post, I'll give a high level view of the automation required for Extraction of the data from Github, Transformation the data to a form that Orbit can accept, and then to Load the data into our Orbit collection.