Create a Dev/Test Environment for Data Pipelines Using Spark and Python

Delivering high-quality data requires strict testing of pipelines before deploying them into production.

Today, in order to test ETLs, one either needs to use a subset of the data, or is forced to create multiple copies of the entire data. Testing against sample data is not good enough. The alternative, however, is costly and time consuming.

In this webinar we will demonstrate how to develop and test on the entire production data set with zero-copy.

We'll explore:

  1. How to set up your environment in under 5 minutes
  2. How to create multiple isolated testing environments without copying data
  3. How to easily run multiple tests on your environment using git-like operations (commit, branch, revert, etc.)

Speakers

VP Customer Success
lakeFS

Mobile_Gradient_image

Watch recording

Gradient_image