Note: Tech talk is a section for Dutchie Engineers to share their experiences with other tech enthusiasts. This piece explains data orchestration and code implementation through Dagster. Find out why our Engineers love it below.
In 2020, one of my professional projects involved making a decision on what our data orchestration would look like. Similar to many young data teams, our process consisted of a bunch of cron jobs scattered throughout our stack without a cohesive system holding them together. While surveying the landscape, I became interested in Dagsterβthe framework solved a lot of the challenges surrounding data code implementation. It wasn't long before the other Engineers started to realize the value of Dagster; after all, they understood the benefits of pipelines and were deeply knowledgeable of their nodes.Β
Dagster ultimately allowed for a much easier path to production where we could frontload finding issues. Our analysts also liked that they could leverage reusable ops and configure workflows for their particular needs. However, even after embracing Dagster, we had to wait before we could fully benefit from running it in production. This is because when people ask me about Dagster and using it as a company, I always say there are two parts to Dagster: the framework and the infrastructure. One is defined by Dagster, and the other by your organization.
A great thing about Dagster is that the abstractions are versatile and you can deploy Dagster to fit your infrastructure. Whatever cloud and whatever core services you use, you can deploy Dagster. This flexibility is a big benefit but at the time when we were adopting Dagster, our infrastructure team was going through a lot of changes. The data team was ahead of many other teams in already adopting containers on ECS but we were not yet on Kubernetes which is the most popular way of deploying Dagster (though ECS support is no longer experimental). Having to coordinate the infrastructure to stand up Dagster added some additional time before we could really utilize it.
When I joined Dutchie last year, we once again had the opportunity to stand up the data platform for the organization. Getting to work with another Dagster advocate, David Wallace, we knew we wanted to keep with Dagster if possible. However this time around when planning out the infrastructure to stand up the service, wanted to get the most out of Dagster as quickly as possible and decided to go with Dagster Cloud.
The biggest benefit for choosing Cloud was how quickly we could set up all the infrastructure we needed to run production ready Dagster. While we evaluated data orchestration at Dutchie we were in the process of consolidating infrastructure across 3 companies. Standardizing IaC and Cloud accounts is a big challenge and we wanted to have infrastructure that was lightweight in case we needed to change anything as our organization settled on its infrastructure patterns. The infrastructure to stand up Dagster is minimal. Itβs simple enough, but still provides the same level of control and security as if you had deployed all the resources yourself.
Dagster Cloud utilizes a hybrid approach where you establish an agent within your Cloud account, as well as the roles and permissions necessary to give it access to other services. The agent communicates with Elementl to send metadata about job runs and waits for instructions to launch tasks within your infrastructure.
Maintaining a stack that is simple and stateless was a big driver for us. As we sorted through AWS accounts as an organization, we could easily tear down and redeploy our agent without losing any run history or assets. By the time we had finalized our data specific AWS accounts, we had the Dagster agent as the first service running in each account.
The other concern we could offload to Dagster Cloud was managing authentication. This is a decision that Dagster leaves up to the organization. When I needed to self host Dagster in my previous role, the first attempt was to set up AWS Cognito associated with the load balancer for dagit. Using Cognito worked in development but after more discussions with DevOps we pivoted to putting our implementation behind our VPN. This was a fine solution but configuring the VPN with all our AWS environments led to some additional back and forth conversations and added time to our eventual deployment. With Dagster Cloud we did not have to worry about this. Authentication is provided out of the box for all your deployment environments. Even better, we could integrate Dagster Cloud with our SSO. Coupled with the minimalist infrastructure we had a secured agent running our first day.
A big differentiator with Dagster is being able to define independent repositories and define specific dependencies and pipelines within them. I have always taken advantage of this with Dagster but did not always use it to the fullest. When we were responsible for both the framework and the infrastructure there was always a cost and some degree of coordination in creating a new repository micro service. This was another problem solved by Cloud.
Within Cloud, spinning up a new repository involves running a single command with the dagster-cloud cli. At Dutchie we created a template repository cookiecutter to encourage people to create as many new repositories as needed. Anyone can now create their own repository and easily integrate it into our core Dagster platform. Other Engineering teams are now hearing about Dagster and have expressed interest in having their own workspace and we can confidently give a place to experiment and try Dagster for themselves.
Having set up Cloud agents and defined our best practices we are getting the full value out of Dagster. We are at the point where our team of Engineers are committing and deploying multiple times per day across a number of workspaces. This is a dream state for data engineering and something I couldnβt have imagined years ago. We get to build and deploy like application engineers without having to work about infrastructure decisions with every PR.
Want to join our team of tech-sperts? Apply today!